The goal of the Celtiqueproject is to improve the security and reliability of software through software certificates that attest to the well-behavedness of a given software. Contrary to certification techniques based on cryptographic signing, we are providing certificates issued from semantic software analysis. The semantic analyses extract approximate but sound descriptions of software behaviour from which a proof of security can be constructed. The analyses of relevance include numerical data flow analysis, control flow analysis for higher-order languages, alias and points-to analysis for heap structure manipulation and data race freedom of multi-threaded code.
Existing software certification procedures make extensive use of systematic test case generation. Semantic analysis can serve to improve these testing techniques by providing precise software models from which test suites for given test coverage criteria can be manufactured. Moreover, an emerging trend in mobile code security is to equip mobile code with proofs of well-behavedness that can then be checked by the code receiver before installation and execution. A prominent example of such proof-carrying code is the stack maps for Java byte code verification. We propose to push this technique much further by designing certifying analyses for Java byte code that can produce compact certificates of a variety of properties. Furthermore, we will develop efficient and verifiable checkers for these certificates, relying on proof assistants like Coq to develop provably correct checkers. We target two application domains: Java software for mobile devices (in particular mobile telephones) and embedded C programs.
Celtiqueis a joint project with the CNRS, the University of Rennes 1 and ENS Cachan.
Celtiquehas developed a new type system system for Java to track dangerous uses of initialised objects. This contribution makes part of the Javasec project, commissioned by the national information security agency (ANSSI), in order to propose extensions to enhance the security of a Java virtual machine. The work provides a practical solution, develops a supporting type system, and formally proves its soundness in Coq (for a simplified formal subset of Java). The work was presented at this year's European Symposium on Research in Computer Security. Its take part of the phd work of Laurent Hubert that will defend this thesis at the end of the year.
Static program analysis is concerned with obtaining information about the run-time behaviour of a program without actually running it. This information may concern the values of variables, the relations among them, dependencies between program values, the memory structure being built and manipulated, the flow of control, and, for concurrent programs, synchronisation among processes executing in parallel. Fully automated analyses usually render approximate information about the actual program behaviour. The analysis is correct if the information includes all possible behaviour of a program. Precision of an analysis is improved by reducing the amount of information describing spurious behaviour that will never occur.
Static analysis has traditionally found most of its applications in the area of program optimisation where information about the run-time behaviour can be used to transform a program so that it performs a calculation faster and/or makes better use of the available memory resources. The last decade has witnessed an increasing use of static analysis in software verification for proving invariants about programs. The Celtiqueproject is mainly concerned with this latter use. Examples of static analysis include:
Data-flow analysis as it is used in optimising compilers for imperative languages. The properties can either be approximations of the values of an expression (“the value of variable is greater than 0” or is equal to at this point in the program” ) or more intensional information about program behaviour such as “this variable is not used before being re-defined” in the classical “dead-variable” analysis .
Analyses of the memory structure includes shape analysis that aims at approximating the data structures created by a program. Alias analysis is another data flow analysis that finds out which variables in a program addresses the same memory location. Alias analysis is a fundamental analysis for all kinds of programs (imperative, object-oriented) that manipulate state, because alias information is necessary for the precise modelling of assignments.
Control flow analysis will find a safe approximation to the order in which the instructions of a program are executed. This is particularly relevant in languages where parameters or functions can be passed as arguments to other functions, making it impossible to determine the flow of control from the program syntax alone. The same phenomenon occurs in object-oriented languages where it is the class of an object (rather than the static type of the variable containing the object) that determines which method a given method invocation will call. Control flow analysis is an example of an analysis whose information in itself does not lead to dramatic optimisations (although it might enable in-lining of code) but is necessary for subsequent analyses to give precise results.
Static analysis possesses strong semantic foundations, notably abstract interpretation , that allow to prove its correctness. The implementation of static analyses is usually based on well-understood constraint-solving techniques and iterative fixpoint algorithms. In spite of the nice mathematical theory of program analysis and the solid algorithmic techniques available one problematic issue persists, viz., the gapbetween the analysis that is proved correct on paper and the analyser that actually runs on the machine. While this gap might be small for toy languages, it becomes important when it comes to real-life languages for which the implementation and maintenance of program analysis tools become a software engineering task. A certified static analysisis an analysis that has been formally proved correct using a proof assistant.
In previous work we studied the benefit of using abstract interpretation for developing certified static analyses , . The development of certified static analysers is an ongoing activity that will be part of the Celtique project. We use the Coq proof assistant which allows for extracting the computational content of a constructive proof. A Caml implementation can hence be extracted from a proof of existence, for any program, of a correct approximation of the concrete program semantics. We have isolated a theoretical framework based on abstract interpretation allowing for the formal development of a broad range of static analyses. Several case studies for the analysis of Java byte code have been presented, notably a memory usage analysis . This work has recently found application in the context of Proof Carrying Code and have also been successfully applied to particular form of static analysis based on term rewriting and tree automata .
Precise context-sensitive control-flow analysis is a fundamental prerequisite for precisely analysing Java programs. Bacon and Sweeney's Rapid Type Analysis (RTA) is a scalable algorithm for constructing an initial call-graph of the program. Tip and Palsberg have proposed a variety of more precise but scalable call graph construction algorithms e.g.,MTA, FTA, XTA which accuracy is between RTA and 0'CFA. All those analyses are not context-sensitive. As early as 1991, Palsberg and Schwartzbach , proposed a theoretical parametric framework for typing object-oriented programs in a context-sensitive way. In their setting, context-sensitivity is obtained by explicit code duplication and typing amounts to analysing the expanded code in a context-insensitive manner. The framework accommodates for both call-contexts and allocation-contexts.
To assess the respective merits of different instantiations, scalable implementations are needed. For Cecil and Java programs, Grove et al., , have explored the algorithmic design space of contexts for benchmarks of significant size. Latter on, Milanova et. al., have evaluated, for Java programs, a notion of context called object-sensitivitywhich abstracts the call-context by the abstraction of the thispointer. More recently, Lhotak and Hendren have extended the empiric evaluation of object-sensitivity using a BDD implementation allowing to cope with benchmarks otherwise out-of-scope. Besson and Jensen proposed to use datalogin order to specify context-sensitive analyses. Whaley and Lam have implemented a context-sensitive analysis using a BDD-based datalogimplementation.
Control-flow analyses are a prerequisite for other analyses. For instance, the security analyses of Livshits and Lam and the race analysis of Naik, Aiken and Whaley both heavily rely on the precision of a control-flow analysis.
Control-flow analysis allows to statically prove the absence of certain run-time errors such as "message not understood" or cast exceptions. Yet it does not tackle the problem of "null pointers". Fahnrich and Leino propose a type-system for checking that after object creation fields are non-null. Hubert, Jensen and Pichardie have formalised the type-system and derived a type-inference algorithm computing the most precise typing . The proposed technique has been implemented in a tool called NIT . Null pointer detection is also done by bug-detection tools such as FindBugs . The main difference is that the approach of findbugs is neither sound nor complete but effective in practice.
Static analyses yield qualitative results, in the sense that they compute a safe over-approximation of the concrete semantics of a program, w.r.t. an order provided by the abstract domain structure. Quantitative aspects of static analysis are two-sided: on one hand, one may want to express and verify (compute) quantitative properties of programs that are not captured by usual semantics, such as time, memory, or energy consumption; on the other hand, there is a deep interest in quantifying the precision of an analysis, in order to tune the balance between complexity of the analysis and accuracy of its result.
The term of quantitative analysis is often related to probabilistic models for abstract computation devices such as timed automata or process algebras. In the field of programming languages which is more specifically addressed by the Celtiqueproject, several approaches have been proposed for quantifying resource usage: a non-exhaustive list includes memory usage analysis based on specific type systems , , linear logic approaches to implicit computational complexity , cost model for Java byte code based on size relation inference, and WCET computation by abstract interpretation based loop bound interval analysis techniques .
We have proposed an original approach for designing static analyses computing program costs: inspired from a probabilistic approach , a quantitative operational semantics for expressing the cost of execution of a program has been defined. Semantics is seen as a linear operator over a dioid structure similar to a vector space. The notion of long-run cost is particularly interesting in the context of embedded software, since it provides an approximation of the asymptotic behaviour of a program in terms of computation cost. As for classical static analysis, an abstraction mechanism allows to effectively compute an over-approximation of the semntics, both in terms of costs and of accessible states . An example of cache miss analysis has been developed within this framework .
The semantic analysis of programs can be combined with efficient constraint solving techniques in order to extract specific information about the program, e.g., concerning the accessibility of program points and feasibility of execution paths , . As such, it has an important use in the automatic generation of test data. Automatic test data generation received considerable attention these last years with the development of efficient and dedicated constraint solving procedures and compositional techniques .
We have made major contributions to the development of constraint-based testing, which is a two-stage process consisting of first generating a constraint-based model of the program's data flow and then, from the selection of a testing objective such as a statement to reach or a property to invalidate, to extract a constraint system to be solved. Using efficient constraint solving techniques allows to generate test data that satisfy the testing objective, although this generation might not always terminate. In a certain way, these constraint techniques can be seen as efficient decision procedures and so, they are competitive with the best software model checkers that are employed to generate test data.
The term "software certification" has a number of meanings ranging from the formal proof of program correctness via industrial certification criteria to the certification of software developers themselves! We are interested in two aspects of software certification:
industrial, mainly process-oriented certification procedures
software certificates that convey semantic information about a program
Semantic analysis plays a role in both varieties.
Criteria for software certification such as the Common criteria or the DOA aviation industry norms describe procedures to be followed when developing and validating a piece of software. The higher levels of the Common Criteria require a semi-formal model of the software that can be refined into executable code by traceable refinement steps. The validation of the final product is done through testing, respecting criteria of coverage that must be justified with respect to the model. The use of static analysis and proofs has so far been restricted to the top level 7 of the CC and has not been integrated into the aviation norms.
The testing requirements present in existing
certification procedures pose a challenge in terms of the
automation of the test data generation process for
satisfying functional and structural testing requirements.
For example, the standard document which currently governs
the development and verification process of software in
airborne system (DO-178B) requires the coverage of all the
statements, all the decisions of the program at its higher
levels of criticality and it is well-known that DO-178B
structural coverage is a primary cost driver on avionics
project. Although they are widely used, existing marketed
testing tools are currently restricted to test coverage
monitoring and measurements
Static analysis tools are so far not a part of the approved certification procedures. For this to change, the analysers themselves must be accepted by the certification bodies in a process called “Qualification of the tools” in which the tools are shown to be as robust as the software it will certify. We believe that proof assistants have a role to play in building such certified static analysis as we have already shown by extracting provably correct analysers for Java byte code.
The particular branch of information security called "language-based security" is concerned with the study of programming language features for ensuring the security of software. Programming languages such as Java offer a variety of language constructs for securing an application. Verifying that these constructs have been used properly to ensure a given security property is a challenge for program analysis. One such problem is confidentiality of the private data manipulated by a program and a large group of researchers have addressed the problem of tracking information flow in a program in order to ensure that e.g., a credit card number does not end up being accessible to all applications running on a computer , . Another kind of problems concern the way that computational resources are being accessed and used, in order to ensure that a given access policy is being implemented correctly and that a given application does not consume more resources that it has been allocated. Members of the Celtiqueteam have proposed a verification technique that can check the proper use of resources of Java applications running on mobile telephones . Semantic software certificateshave been proposed as a means of dealing with the security problems caused by mobile code that is downloaded from foreign sites of varying trustworthiness and which can cause damage to the receiving host, either deliberately or inadvertently. These certificates should contain enough information about the behaviour of the downloaded code to allow the code consumer to decide whether it adheres to a given security policy.
Proof-Carrying Code(PCC) is a technique to download mobile code on a host machine while ensuring that the code adheres to a specified security policy. The key idea is that the code producer sends the code along with a proof (in a suitably chosen logic) that the code is secure. Upon reception of the code and before executing it, the consumer submits the proof to a proof checker for the logic. Our project focus on two components of the PCC architecture: the proof checker and the proof generator.
In the basic PCC architecture, the only components that have to be trusted are the program logic, the proof checker of the logic, and the formalization of the security property in this logic. Neither the mobile code nor the proposed proof—and even less the tool that generated the proof—need be trusted.
In practice, the proof checkeris a complex tool which relies on a complex Verification Condition Generator (VCG). VCGs for real programming languages and security policies are large and non-trivial programs. For example, the VCG of the Touchstone verifier represents several thousand lines of C code, and the authors observed that "there were errors in that code that escaped the thorough testing of the infrastructure" . Many solutions have been proposed to reduce the size of the trusted computing base. In the foundational proof carrying codeof Appel and Felty , , the code producer gives a direct proof that, in some "foundational" higher-order logic, the code respects a given security policy. Wildmoser and Nipkow , . prove the soundness of a weakest preconditioncalculus for a reasonable subset of the Java bytecode. Necula and Schneck extend a small trusted core VCG and describe the protocol that the untrusted verifier must follow in interactions with the trusted infrastructure.
One of the most prominent examples of software certificates and proof-carrying code is given by the Java byte code verifier based on stack maps. Originally proposed under the term “lightweight Byte Code Verification” by Rose , the techniques consists in providing enough typing information (the stack maps) to enable the byte code verifier to check a byte code in one linear scan, as opposed to inferring the type information by an iterative data flow analysis. The Java Specification Request 202 provides a formalization of how such a verification can be carried out.
Inspired by this, Albert et al. have proposed to use static analysis (in the form of abstract interpretation) as a general tool in the setting of mobile code security for building a proof-carrying code architecture. In their abstraction-carrying codeframework, a program comes equipped with a machine-verifiable certificate that proves to the code consumer that the downloaded code is well-behaved.
In spite of the nice mathematical theory of program analysis (notably abstract interpretation) and the solid algorithmic techniques available one problematic issue persists, viz., the gapbetween the analysis that is proved correct on paper and the analyser that actually runs on the machine. While this gap might be small for toy languages, it becomes important when it comes to real-life languages for which the implementation and maintenance of program analysis tools become a software engineering task.
A certified static analysisis an analysis whose implementation has been formally proved correct using a proof assistant. Such analysis can be developed in a proof assistant like Coq by programming the analyser inside the assistant and formally proving its correctness. The Coq extraction mechanism then allows for extracting a Caml implementation of the analyser. The feasibility of this approach has been demonstrated in .
We also develop this technique through certified reachability analysis over term rewriting systems. Term rewriting systems are a very general, simple and convenient formal model for a large variety of computing systems. For instance, it is a very simple way to describe deduction systems, functions, parallel processes or state transition systems where rewriting models respectively deduction, evaluation, progression or transitions. Furthermore rewriting can model every combination of them (for instance two parallel processes running functional programs).
Depending on the computing system modelled using rewriting, reachability (and unreachability) permits to achieve some verifications on the system: respectively prove that a deduction is feasible, prove that a function call evaluates to a particular value, show that a process configuration may occur, or that a state is reachable from the initial state. As a consequence, reachability analysis has several applications in equational proofs used in the theorem provers or in the proof assistants as well as in verification where term rewriting systems can be used to model programs.
For proving unreachability, i.e. safety properties, we already have some results based on the over-approximation of the set of reachable terms , . We defined a simple and efficient algorithm for computing exactly the set of reachable terms, when it is regular, and construct an over-approximation otherwise. This algorithm consists of a completionof a tree automaton, taking advantage of the ability of tree automata to finitely represent infinite sets of reachable terms.
To certify the corresponding analysis, we have defined a checker guaranteeing that a tree automaton is a valid fixpoint of the completion algorithm. This consists in showing that for all term recognised by a tree automaton all his rewrites are also recognised by the same tree automaton. This checker has been formally defined in Coq and an efficient Ocaml implementation has been automatically extracted . This checker is now used to certify all analysis results produced by the regular completion tool as well as the optimised version of .
Javalib/Sawja is an OCaml platform for the development of static analyses of Java bytecode programs.
Javalibis a library to parse Java .class file into OCaml data structure, thus enabling the OCaml programmer to extract informations from class files, to manipulate and to generate valid class files. The library is maintained by the CELTIQUE team. It is distributed under the GNU General Public License.
On top of this library, we have developed the Sawjalibrary that provides a high level representation of Java bytecode programs. Whereas Javalib is dedicated to isolated classes, Sawja handles bytecode programs with their class hierarchy and with control flow algorithms. Sawja provides some stackless intermediate representations of code. The transformation algorithm, common to these representations, has been formalized and proved to be semantics-preserving (see paragraph ). This software is distributed under the GNU General Public License.
Timbuk is a library of Ocamlfunctions for manipulating tree automata. More precisely Timbuk deals with finite bottom-up tree automata (deterministic or not). This library provides the classical operations over tree automata, viz, the boolean operations (intersection, union, complement), emptiness and inclusion checking, renaming, determinisation, transition normalisation, and a mechanism for building the tree automaton recognizing the set of irreducible terms for a left-linear TRS. This library also implements some more specific algorithms that we use for verification of cryptographic protocols and Java bytecode programs:
exact computation of reachable terms for most of the known decidable classes of term rewriting systems,
approximation of reachable terms and normal forms for any term rewriting system,
matching in tree automata,
the checker for approximations of reachable terms extracted from the Coq specification .
This software is distributed under the Gnu Library General Public License and is freely available at http://www.irisa.fr/lande/genet/timbuk/. Timbuk has been registered at the APP with number IDDN.FR.001.20005.00.S.P.2001.000.10600.
Timbuk is now in version 3.0 and provides tree automata completion with equational abstractions as proposed in .
Timbuk is used by other research groups to achieve cryptographic protocol verification. Frédéric Oehl and David Sinclair of Dublin University use it in an approach combining a proof assistant (Isabelle/HOL) and approximations (done with Timbuk) , . Pierre-Cyrille Heam, Yohan Boichut and Olga Kouchnarenko of the Cassis Inria project use Timbuk as a verification back-end for AVISPA . AVISPA is a tool for verifying cryptographic protocols defined in high level protocol specification format. More recently, Timbuk was also used at LIAFA by Gael Patin, Mihaela Sighireanu and Tayssir Touili to design the SPADEtool whose purpose is to model-check multi-threaded and recursive programs.
Euclide is an open source prototype tool that can help testing and verifying critical C programs. The prototype takes a C program as input, optionally annotated with assertions or post-conditions, and generates input test data that can reach specified locations within the code. Additionally, it can either prove that the assertions or post-conditions are verified, or proposes counter-examples to these properties. The core of the tool includes a powerful constraint solver based on constraint propagation, integer linear relaxations and labelling, that was built specifically for this purpose. Euclide is mainly developed in Prolog and is accessible online through a web interface .Euclide has been registered at the APP with number IDDN.FR.001.250011.000.S.P.2009.000.10600. This software is distributed under the CECILL-C licence. A. Gotlieb received the best poster award at the Annual National Days of the GDR-GPL for a presentation of the Euclide tool. In the context of the CAVERN project, we recently developed a constraint solver dedicated to modular integer computations . This solver should be integrated soon within the Euclide platform.
The Celtique group continues its investigation in various techniques for the static analysis of Object-Oriented Languages like Java.
The initialization of an information system is usually a critical phase where essential defense mechanisms are being installed and a coherent state is being set up. In object-oriented software, granting access to partially initialized objects is a delicate operation that should be avoided. We propose a modular type system to formally specify the initialization policy of libraries or programs and a type checker to statically check at load time that all loaded classes respect the policy. This allows to prove the absence of bugs which have allowed some famous privilege escalations in Java. Our experimental results show that our safe default policy allows to prove 91% of classes of java.lang, java.securityand javax.securitysafe without any annotation and by adding 57 simple annotations we proved all classes but four safe. The type system and its soundness theorem have been formalized and machine checked using Coq .
The Java virtual machine executes stack-based bytecode. The intensive use of an operand stack has been identified as a major obstacle for static analysis and it is now common for static analysis tools to manipulate a stackless intermediate representation (IR) of bytecode programs. Several algorithms have been proposed to achieve such a transformation, but only little attention has been paid to their formal semantic properties. In , we provide such a bytecode transformation, describes its semantic correctness and evaluates its performance with respect to the transformation time, the compactness of the obtained code and the impact on static analysis precision.
The Java programming language has been put forward as a language with strong security and several aspects of the language are definite improvements over languages such as C and C++. However, the security architecture is complex and it is not straightforward for a Java developer to identify the security risks that a particular piece of code may imply. We provide in an in-depth analysis of Java, its security architecture, its language features relevant to security and the pertinence of formal methods for enhancing the security of Java applications.
We have designed a security model for programming applications in which the access control to resources can employ user interaction to obtain the necessary permissions. Our work is inspired by and improves on the current Java security architecture used in Java-enabled mobile smart phones. We consider access control permissions with multiplicities in order to allow to use a permission a certain number of times and reduce the number of user interactions. To support our security model, a static analysis is enforcing, at load-time, that resources are accessed correctly.
We describe in the Sawja library: a static analysis workshop fully compliant with Java 6 which provides OCaml modules for efficiently manipulating Java bytecode programs. We present the main features of the library, including i) efficient functional data-structures for representing a program with implicit sharing and lazy parsing, ii) an intermediate stack-less representation, and iii) fast computation and manipulation of complete programs. We provide experimental evaluations of the different features with respect to time, memory and precision.
Proving the correctness of an analyzer is based on semantic properties, and becomes difficult to ensure when complex analysis techniques are involved. In we propose to adapt the general theory of static analysis by abstract interpretation to the framework of constructive logic. Implementing this formalism into the Coq proof assistant then allows for automatic extraction of certified analyzers. We focus in this work on a simple imperative language and present the computation of fixpoints by widening/narrowing and syntax-directed iteration techniques.
In we develop a certified checker in Coq that is able to certify the results of a polyhedral array-bound analysis for an imperative, stack-oriented bytecode language with procedures, arrays and global variables. The checker uses, in addition to the analysis result, certificates which at the same time improve efficiency and make correctness proofs much easier. In particular, our result certifier avoids complex polyhedral computations such as convex hulls and is using easily checkable inclusion certificates based on Farkas lemma. Benchmarks demonstrate that our approach is effective and produces certificates that can be efficiently checked not only by an extracted Caml checker but also directly in Coq.
In , we show how to generate checkable certificate for linear arithmetic using an inexact inexact LP solver. Off-the-shelf linear programming (LP) solvers trade soundness for speed: for efficiency, the arithmetic is not exact rational arithmetic but floating-point arithmetic. As a side-effect the results come without any formal guarantee and cannot be directly used for deciding linear arithmetic. In this work we explain how to design a sound procedure for linear arithmetic built upon an inexact floating-point LP solver. Our approach relies on linear programming duality to instruct a black-box off-the-shelf LP solver to output, when the problem is not satisfiable, an untrusted proof certificate. We present a heuristic post- processing of the certificate which accommodates for certain numeric inaccuracies. Upon success it returns a provably correct proof witness that can be independently checked. Our preliminary results are promis- ing. For a benchmark suite extracted from SMT verification problems the floating-point LP solver returns a result for which proof witnesses are successfully and efficiently generated. The proof witnesses are used by our Certified Polyhedral Analysis.
Iterated Register Coalescing (IRC) is a widely used heuristic for performing register allocation via graph coloring. In , we present a formal verification of the whole IRC algorithm, that can be used as a reference for IRC. The automatic extraction of our IRC algorithm yields a program with competitive performance. This work has been integrated into the CompCert verified compiler.
In 2010, Airbus evaluated the CompCert compiler and tested it on critical flight control software. A WCET (Worst-Case Execution Time) analysis was performed by Airbus to estimate the performance of the generated code. The results were very encouraging. A promising way to improve these results is to give extra information to the WCET analysis.
Since the recent beginning of André Oliveira Maronèze Ph.D. thesis's, and in cooperation with Isabelle Puaut (ALF project team), we are designing an annotation language dedicated to WCET properties of C programs that will be integrated in the CompCert compiler. We are also studying how to generate some of these properties from the CompCert compiler and how to compile them.
We have developed our linear model of cost computations based on dioid theory, in order to show the deep connections between this model and the classical interpretation approach. The main difficulties come from the fact that abstraction has to take two distinct notions of order into account: the order on costs and the order on states . A detailed paper collecting our results on this approach, including a new case study on power consumption estimation has been published in .
In , we introduce a constraint-based reasoning approach to automatically generate test input for Java bytecode programs. Our goal-oriented method aims at building an input state of the Java Virtual Machine (JVM) that can drive program execution towards a given location within the bytecode. An innovative aspect of the method is the definition of a constraint model for each bytecode that allows backward exploration of the bytecode program, and permits to solve complex constraints over the memory shape (e.g., p == p.nextenforces the creation of a cyclic data structure referenced by p). We implemented this constraint-based approach in a prototype tool called JAUT, that can generate input states for programs written in a subset of JVM including integers and references, dynamic-allocated structures, objects inheritance and polymorphism by virtual method call, conditional and backward jumps. Experimental results show that JAUT can generate test input for executing locations not reached by other state-of-the-art code-based test input generators such as jCUTE, JTEST and Pex.
Recent code-based test input generators based on dynamic symbolic executionincrease path coverage by solving path condition with a constraint or an SMT solver. When the solver considers path condition produced from an infeasible path, it tries to show unsatisfiability, which is a useless time-consuming process. In , we propose a new method that takes opportunity of the detection of a single infeasible path to generalize to a (possibly infinite) family of infeasible paths, which will not have to be considered in further path conditions solving. The method exploits non-intrusive constraint-based explanations, a technique developed in Constraint Programming to explain unsatisfiability. Experimental results obtained with our prototype tool IPEG show that, whatever is the underlying constraint solving procedure (IC, Colibri and the SMT solver Z3), this approach can save considerable computational time. This is a joint work with Bernard Botella from CEA.
The success of several constraint-based modeling languages such as OPL, ZINC, or COMET, appeals for better software engineering practices, particularly in the testing phase. In , , we introduce a testing framework enabling automated test case generation for constraint programming. We propose a general framework of constraint program development which supposes that a first declarative and simple constraint model is available from the problem specifications analysis. Then, this model is refined using classical techniques such as constraint reformulation, surrogate and global constraint addition, or symmetry-breaking to form an improved constraint model that must be thoroughly tested before being used to address real-sized problems. We think that most of the faults are introduced in this refinement step and propose a process which takes the first declarative model as an oracle for detecting non-conformities. We derive practical test purposes from this process to generate automatically test data that exhibit non-conformities. We implemented this approach in a new tool called CPTEST that was used to automatically detect non-conformities on two classical benchmark programs, namely the Golomb rulers and the car-sequencing problem. This is a joint work with Yahia Lebbah from University of Oran.
We have proposed a new language for defining regular approximations of set of reachable terms. Approximations are defined using equations which define equivalence classes of terms “similar” w.r.t. the approximation. The idea is close to the one developped with Valérie Viet Triem Tong and more recently by José Meseguer, Miguel Palomino and Narciso Martí-Oliet . With regards to this last work, the interest of our approach is that it imposes fewer restriction on the equations used to define approximations. Our only syntactical constraint is that equations have to be linear though imposes that the term rewriting system and the set of equations have to be coherent which is a more drastic restriction. Our proposition, published in , consists in using the equations to detect equivalent terms recognized by the tree automata and merge the recognizing states so as to mimic the construction of equivalence classes. We have also proven a precision result showing that, under some retrictions on the initial language, our algorithm builds no more than terms reachable by rewriting modulo the set of equations.
We extended this static analysis framework based on term rewriting systems and tree automata with Counterexample Example Guided Automatic Refinement (CEGAR ). The refinement of approximations on tree automata has already been investigated in , where semantics of programs is encoded using tree transducers. With Axel Legay (S4 team) and Yohan Boichut (LIFO), we defined a CEGAR approach of completion with automatic approximation refinement, where semantics is encoded using term rewriting systems . We chose to stick to term rewriting systems because it permits a more straightforward encoding of program semantics than tree transducers. Furthermore, our completion based CEGAR avoids a lot of forward and backward computations that are necessary in . This approach is currently being implemented in Timbuk .
The ASCERT project (2009–20012) is founded by the Fondation de Recherche pour l'Aéronautique et l'Espace. It aims at studying the formal certification of static analysis using and comparing various approaches like certified programming of static analysers, checking of static analysis result and deductive verification of analysis results. It is a joint project with the INRIA teams Abstraction, Galliumand POP-ART.
The CERTLOGS project (2009–20012) is funded by the CREATE action of the Région Bretagne. The objective of this project is to develop new kinds of program certificates and innovating certifying verification techniques using static analysis as the fundamental tool and combine this with techniques coming from probabilistic algorithms and cryptography.
The DECERT project (2009–2011) is funded by the call Domaines Emergents 2008, a program of the Agence Nationale de la Recherche.
The objective of the DECERT project is to design an architecture for cooperating decision procedures, with a particular emphasis on fragments of arithmetic, including bounded and unbounded arithmetic over the integers and the reals, and on their combination with other theories for data structures such as lists, arrays or sets. To ensure trust in the architecture, the decision procedures will either be proved correct inside a proof assistant or produce proof witnesses allowing external checkers to verify the validity of their answers.
This is a joint project with Systeral, CEA List and INRIA teams Mosel, Cassis, Marelle, Proval and Celtique (coordinator).
The RAVAJ ANR ( http://www.irisa.fr/lande/genet/RAVAJ/) started on january 2007, for 3 years. RAVAJ means “Rewriting and Approximation for the Verification of Java Applications”. Thomas Genet is the coordinator of this project that concerns partners from LORIA (Nancy), LIFC (Besançon) and IRISA (Rennes). The goal of this project is to propose a general purpose verification technique based based on approximations and reachability analysis over term rewriting systems. To tackle this goal, the tree automata completion method has to be refined in two different ways. First, though the Timbuk tool is efficient enough to verify cryptographic protocols, it is not the case for more complex software systems. In that direction, we aim at using some results obtained in rewriting to bring the efficiency of our tool closer to what has been obtained in the model-checking domain. Second, automation of approximation has to be enhanced. At present, the approximation automaton construction is guided by a set of approximation rules very close to the tree automata formalism and given by the user of the tool. On the one hand, we plan to replace approximation rules, which are difficult to define by a human, by approximation equations which are more natural. Approximation equations define equivalence classes of terms equal modulo the approximation as in . On the other hand, we will automatically generate approximation equations from the property to be proved, using , and also provide an automatic approximation refinement methodology adapted to the equational approximation framework.
The ParSecproject (2007–2010) intends to study concurrent programming techniques for new computing architectures like multicore processors or multiprocessor machines, focusing on the security issues that arise in multi-threaded systems. In this project the CELTIQUE team focuses on static analysis of multi-threaded Java programs and specially on data race checkers. The other members of the project are INRIA Sophia-Antipolis, INRIA Rocquencourt and PPS (Université Paris 7).
The Java programming language has been put forward as a language with strong security and several aspects of the language are definite improvements over languages such as C and C++. However, the security architecture is complex and it is not straightforward for a Java developer to identify the security risks that a particular piece of code may imply. The French National Information Security Agency ( Agence Nationale de la Sécurité de Systèmes Informatiques (ANSSI)) commissioned the JAVASEC project with the double aim of providing secure programming guidelines to Java developers and to build a security-enhanced Java virtual machine whose security can be evaluated and certified according to industrial standards and that can serve as a secure platform for executing Java applications. The results have been an in-depth analysis of Java, its security architecture, its language features relevant to security and the pertinence of formal methods for enhancing the security of Java applications. This analysis has lead to a “Secure Java development guide”, that provides a series of guidelines for what to do an not to do when developing security-critical applications in Java. As a complement to the guidelines, we have identified a series of program properties that can be verified by static analysis of Java byte code in order to improve further the security checks offered by the Java byte code verifier.
The project is conducted in collaboration with two Rennes located SMEs: Silicom and Amossys.
The ANR U3CAT project (2009–2012) is built upon the results of the RNTL CAT project, which delivered the Frama-C platform for the analysis of C programs and the ACSL assertion language. The ANR U3CAT project focuses on providing a unified interface that would allow to perform several analyses on a same code and to study how these analyses can cooperate in order to prove properties that culd not have been established by one single technique. The other members of the project are the CEA LIST laboratory (project leader), Proval (Inria Futurs), Gallium (Inria Paris-Rocquencourt), Cedric (CNAM), Atos Origin, CS, Dassault-Aviation, Sagem Defense and Airbus Industries.
The CAVERN project (Constraints and Abstractions for program VERificatioN) aims to enhance the potential of Constraint Programming for the automated verification of imperative programs. The classic approach consists in building a constraint system representating the objective to meet.
Constraint solving is currently delegated to "generic" constraint propagation based solvers developed for other applications (combinatorial optimization, planning, etc.). The originality of the project lies in the design of abstraction-based constraint solver dedicated to the automated testing of imperative programs. In Static Analysis, the last few years have seen the development of powerful techniques over various abstract domains (polyhedra, congruence, octagons, etc.) and this project aims to explore results obtained in this area to develop constraint solvers with improved deductive capabilities. The main scientific outcome of the project will be a profound understanding of the benefit of using abstraction techniques in constraint solvers for the automated testing of imperative programs.
The CAVERN project includes four partners involved in the development of constraint-based testing tools:
the Celtique team of INRIA Rennes - coordinator
the "Constraints and Proofs" team from CNRS I3S laboratory in Sophia-Antipolis(CeP)
the CEA-LIST laboratory in Saclay (CEA)
the ILOG Company in Gentilly (ILOG)
In addition, the project will include a foreign associate partner: Andy King from the University of Kent.
Concretely, the CAVERN project partners will study the integration of selected abstractions in their own constraint libraries, as currently used in their testing tools, in order to improve the treatment of loops, memory accesses (references and dynamic structures) and floating-point computations. Dealing efficiently with these constructs will allow us to scale-up constraint-based testing techniques for imperative programs. This should open the way to more automated testing processes which will facilitate software dependability assessment.
The CAVERN project will last until december 2011.
COST Action IC0701is a European scientific cooperation. The Action aims at developing verification technology with the power to ensure dependability of object-oriented programs on industrial scale. The action is composed of 15 countries. The COST action has been a forum for presenting our results concerning the data race analysis and our proposal for an intermediate language into which Java byte code can be transformed in order to faciliate the static analysis of byte code programs.
Thomas Jensen is member of the executive bureau of the French network GDR GPL on software engineering and formal methods in programming. Arnaud Gotlieb is co-president of the MTV2 project of GDR-GPL.
Arnaud Gotlieb served in the program committees of several international conferences, including IEEE ICST'10, TAP'10 and QSIC'10. He also co-organized the CSTVA'10 workshop, a satellite event of ICST'10. For the organization of the future editions of CSTVA, he received a support from Microsoft Research under the banner of the Verified Software Initiative.
David Pichardie served as program chair of the BYTECODE workshop (ETAPS 2010 satellite event) and in the program committees of ITP 2010, VERIFY 2010 and IFM 2010. He also gave an invited talk in the workshop NSAD 2010 (SAS 2010 satellite workshop).
Thomas Jensen was on the program committee for FOPARA (Foundational and practical aspects of Resource Analysis) workshop and the TGC (Trustworthy Global Computing) conference.
Florence Charreteur defended her PhD thesis, entitled “Modélisation par contraintes de programmes en bytecode Java pour la génération automatique de tests” on March 9 .
David Cachera defended his Habilitation thesis “Analyses statiques : certifier et quantifier” on August 30 .
Laurent Hubert defended his PhD thesis, entitled “Foundations and Implementation of a Tool Bench for Static Analysis of Java Bytecode Programs” on December 17 .
Benoît Boyer defended his PhD thesis, entitled “Réécriture d'automates certifiée pour la vérification de modèle” on December 13 .
Thomas Jensen and David Pichardie taught semantics, type systems and abstract interpretation at Master 2 level.
David Pichardie also taught algorithmics at École normale supérieure de Cachan and formal methods for software engineering (the B method) at the 4th year of Insa Rennes in collaboration with Mireille Ducassé.
David Cachera teaches logics, computability, algorithmics and formal languages at École normale supérieure de Cachan, and semantics of programing languages at Master 1 level at University of Rennes.
Arnaud Gotlieb is responsible of two teaching master-level modules at Insa Rennes: “Compilation” at the 4th year level and “Validation, Verification and Test” at the 5 year level. He also taught software testing at the Ecole des Mines de Nantes at the 5 year level. He was invited to give a conference at the Master 2 Alma of University of Nantes.
Thomas Genet teaches Cryptographic Protocols and their verification for M2 level (5th university year). He also teaches formal methods for software verification and model driven design at M1 level (4th university year).
Thomas Genet gave a lecture on “Cryptographic protocols: principles, attacks and verification tools” at the summer school “École Jeune Chercheurs en Programmation” (Rennes, may 2010).
Sandrine Blazy taught 2 modules for M2 level (Software Vulnerabiblities, Software testing). She also taught formal methods for program proof at M1 level.
Sandrine Blazy gave a lecture on certified compilation at the 2nd Asian-Pacific Summer School (Beijing, China, August 2010).
Arnaud Gotlieb participated as a an examiner to the PhD committee of Nicolas Berger at University of Nantes and Sergio Segura from University of Seville.
Thomas Jensen participated as rapporteur in the Habilitation committe of Frederic Prost at the University of Grenoble, and as raporteur in the PhD thesis committee of Manuel Garnacho, also University of Grenoble.
Thomas Jensen was president of the PhD thesis committee of Brice Morin and the Habilitation committee of Benoit Baudry, both at the University of Rennes.
Sandrine Blazy was a member of the PhD committee of Benoît Robillard at CNAM Paris.
Thomas Jensen is délégué scientifiquefor the INRIA centre in Rennes and president of the joint Scientific Committee ( comité des projets) between Irisa and Inria Rennes Bretagne Atlantique. Through this duty he is member of the INRIA evaluation board.
Sandrine Blazy is a member of the IRISA council as well as a member of the ISTIC council of Rennes 1 University. Sandrine Blazy participated to a recruitment committee of Rennes 1 University.
David Cachera is a member of INRIA Rennes council.
Sandrine Blazy is in charge of a graduate curriculum (M2 level) at Université de Rennes 1 dedicated to information system security.
Thomas Genet is in charge of the first year of the Master in Computer Science at Université de Rennes 1.
David Pichardie is co-responsible of the Component Based Embedded Software Track of the second year of the research Master in Computer Science at Université de Rennes 1.
Arnaud Gotlieb participated to the recruitment committees of Insa Rennes and IUT Orsay.