EN FR
EN FR


Section: New Results

Static Analysis and Abstract Interpretation

Participants : Alain Girault, Bertrand Jeannet [contact person] , Lies Lakhdar-Chaouch, Peter Schrammel, Pascal Sotin.

Numerical and logico-numerical abstract acceleration

Acceleration methods are used for computing precisely the effects of loops in the reachability analysis of counter machine models. Applying these methods to synchronous data-flow programs with Boolean and numerical variables, e.g., Lustre programs, firstly requires the enumeration of the Boolean states in order to obtain a control graph with numerical variables only. Secondly, acceleration methods have to deal with the non-determinism introduced by numerical input variables.

Concerning the latter problem, we pushed further the work presented in [90] that extended the concept of abstract acceleration of Gonnord et al. [69] , [68] to numerical input variables, and we wrote a journal version [13] . The original contributions of [13] compared to [91] is abstract backward acceleration (for backward analysis) and a detailed comparison of the abstract acceleration approach with the derivative closure approach of [39] , which is related to methods based on transitive closures of relations.

We then worked more on the first point, which is to apply acceleration techniques to data-flow programs without resorting to an exhaustive enumeration of Boolean states. To this end, we introduced (1) logico-numerical abstract acceleration methods for CFGs with Boolean and numerical variables and (2) partitioning techniques that make logical-numerical abstract acceleration effective. Experimental results showed that incorporating these methods in a verification tool based on abstract interpretation provides not only significant advantage in terms of accuracy, but also a gain in performance in comparison to standard techniques. This work was published in [28] .

This line of work is part of the PhD thesis of Peter Schrammel.

Improving dynamic approximations in static analysis

Abstract interpretation [51] formalizes two kind of approximations that can be done in the static analysis of programs:

  • Static approximations, defined by the choice of an abstract domain of abstract properties (for instance, intervals or convex polyhedra that approximates set of points in numerical spaces), and the definition of sound approximations in this domain of concrete operations (variable assignments, tests, ...). These abstract properties and operations are substitutes to the concrete properties and operations defined by the semantics of the analyzed program. This stage results into a abstract fixpoint equation Y=G(Y),YA, where A is the abstract domain. The best (least) solution of this equation can be obtained by Kleene iteration, which consists in computing the sequence Y 0 = A ,Y n+1 =G(Y n ), where A is the least element of the lattice A.

  • Dynamic approximations, that makes the Kleene iteration sequence converge in finite time by applying an extrapolation operator called widening and denoted with . This results in a sequence Z 0 = A ,Z n+1 =Z n G(Z n ) that converges to a post-fixpoint Z G(Z ). For instance, for many numerical abstract domains (like octagons [86] or convex polyhedra [75] ) the “standard” widening :A×AA consists in keeping in the result R=PQ the numerical constraints of P that are still satisfied by Q.

The problem addressed here is that the extrapolation performed by widening often loses crucial information for the analysis goal.

Widening with thresholds.

A classical technique for improving the precision is “widening with thresholds”, which bounds the extrapolation. The idea is to parameterize with a finite set 𝒞 of threshold constraints, and to keep in the result R=P 𝒞 Q those constraints c𝒞 that are still satisfied by Q: P 𝒞 Q=(PQ){c𝒞|Qc}. In practice, one extrapolates up to some threshold; in the next iteration, either the threshold is still satisfied and the result is better than with the standard widening, or it is violated and one extrapolates up to the remaining thresholds.

The benefit of this refinement strongly depends on the choice of relevant thresholds. In [33] , [26] we proposed a semantic-based technique for automatically inferring such thresholds, which applies to any control graph, be it intraprocedural, interprocedural or concurrent, without specific assumptions on the abstract domain. Despite its technical simplicity, we showed that our technique is able to infer the relevant thresholds in many practical cases.

Policy Iteration.

Another direction we investigated for solving the fix-point equation Y=G(Y),YA is the use of Policy Iteration, which is a method for the exact solving of optimization and game theory problems, formulated as equations on min max affine expressions. In this context, a policy π is a strategy for the min-player, which gives rise to a simplified equation X=F π (X),F π F,X n which is easier to solve that the initial equation X=F(X),X n . Policy iteration iterates on policies rather than iterating the application of F (as in Kleene iteration), using the property that the least fixpoint of F corresponds to the least fixpoint of F π for some π.

[50] showed that the problem of finding the least fixpoint of semantic equations on some abstract domains can be reduced to such equations on min max affine expressions, that can then be solved using Policy Iteration instead of the traditional Kleene iteration with widening described above.

We first investigated the integration of the concept of Policy Iteration in a generic way into existing numerical abstract domains. We implemented it in the Apron library (see module  5.4 ). This allows the applicability of Policy Iteration in static analysis to be considerably extended.

In particular we considered the verification of programs manipulating Boolean and numerical variables, and we provided an efficient method to integrate the concept of policy in the logico-numerical abstract domain BddApron that mixes Boolean and numerical properties (see module  5.4 ). This enabled the application of the policy iteration solving method to much more complex programs, that are not purely numerical any more. This work was published in [30] .

Analysis of imperative programs

We also studied the analysis of imperative programs. Even if it is preferable to analyze embedded systems described in higher-level languages such as synchronous languages, it is also useful to be able to analyze C programs. Moreover, it enables a wider diffusion of the analysis techniques developed in the team.

Inferring Effective Types for Static Analysis of C Programs

This work is a step in the project of connecting the C language to our analysis tool Interproc /ConcurInterproc (see section  5.5.4 ). The starting point is the connection made by the industrial partner EADS-IW in the context of the ANR project ASOPT (§ 8.1.2 ) from a subset of the C language to Interproc . This translation uses the Newspeak intermediate language promoted by EADS [77] .

Figure 3. Inferring finite types in C programs

 

 

 

typedef struct {

  int n;

} t;

int main()

{

  t x; t* y;

  int *p,*q;

  y = alloc(t); p = &(y->n);

  y = &x; q = &(y->n);

  *p = 1; *q = 2; *p = *p < 1;

  return *p;

}

typedef enum {

  l0=0,l1=1,l2=2

} e;

typedef struct {

  e n;

} t;

int main()

{

  t x; t* y;

  e *p,*q;

  y = alloc(t); p = &(y->n);

  y = &x; q = &(y->n);

  *p = l1; *q = l2; *p = (*p==l0)?l1:l0;

  return *p;

}

Initial program.Transformed program.

The problem addressed here is that the C language does not have a specific Boolean type: Boolean values are encoded with integers. This is also true for enumerated types, that may be freely and silently cast to and from integers. On the other hand, our verification tool Interproc that infers the possible values of variables at each program point may benefit from the information that some integer variables are used solely as Boolean or as enumerated type variables, or more generally as finite type variables with a small domain. Indeed, specialized and efficient symbolic representations such as BDDs are used for representing properties on such variables, whereas approximated representations like intervals and octagons are used for larger domain integers and floating-points variables.

Driven by this motivation, we proposed in [25] a static analysis for inferring more precise types for the variables of a C program, corresponding to their effective use. The analysis addresses a subset of the C99 language, including pointers, structures and dynamic allocation. The principle of the method is very different from type inference techniques used in functional programming languages such as ML, where the types are inferred from the context of use. Instead, our analysis can be seen as a simple points-to analysis, followed by a disjunction version of a constant propagation analysis, and terminated by a program transformation that generates a strongly typed program. Fig. 3 illustrates this process. On this example, we discover that the program is a finite-state one, to which exact analysis technique can be applied.

Interprocedural analysis with pointers to the stack

This work addressed the problem of interprocedural analysis when side-effect are performed on the stack containing local variables. Indeed, in any language with procedures calls and pointers as parameters (C, Ada) an instruction can modify memory locations anywhere in the call-stack. The presence of such side effects breaks most generic interprocedural analysis methods, which assume that only the top of the stack may be modified. In [29] we presented a method that addresses this issue, based on the definition of an equivalent local semantics in which writing through pointers has a local effect on the stack. Our second contribution in this context is an adequate representation of summary functions that models- the effect of a procedure, not only on the values of its scalar and pointer variables, but also on the values contained in pointed memory locations. Our implementation in the interprocedural analyzer PInterproc (see § 5.5.4 ) results in a verification tool that infers relational properties on the value of Boolean, numerical, and pointer variables.