The aim of the CARTE research team is to take into account adversity in computations, which is implied by actors whose behaviors are unknown or unclear. We call this notion adversary computation.
The project combines two approaches, and we think that their combination will be fruitful. The first one is the analysis of the behavior of systems, using tools coming from Continuous Computation Theory. The second approach is to build defenses with tools coming from logic, rewriting and, more generally, from Programming Theory.
The activities of the CARTE team are organized around two research actions:
Computation over Continuous Structures
Computer Virology.
We solved a problem that has been open for 15 years, relating three notions of complexity and information: Shannon information and entropy, Kolmogorov algorithmic information and Martin-Löf randomness .
We developed a tool which is able to retrieve implementations of cryptographic primitives inside a trace of a binary. This result is published at CCS .
We presented our work on behavioural malware detection using rewriting and model checking at ESORICS 2012 .
For the Alan Turing year, we published an invited paper in the journal Phil. Trans. R. Soc. .
From a historical point of view, the first official virus appeared in 1983 on Vax-PDP 11. At the very same time, a series of papers was published which always remains a reference in computer virology: Thompson , Cohen and Adleman . The literature which explains and discusses practical issues is quite extensive , . However, there are only a few theoretical/scientific studies, which attempt to give a model of computer viruses.
A virus is essentially a self-replicating program inside an
adversary environment. Self-replication has a solid background
based on works on fixed point in
a virus infects programs by modifying them,
a virus copies itself and can mutate,
it spreads throughout a system.
The above scientific foundation justifies our position to use the word virus as a generic word for self-replicating malwares. There is yet a difference. A malware has a payload, and virus may not have one. For example, worms are an autonous self-replicating malware and so fall into our definition. In fact, the current malware taxonomy (virus, worms, trojans, ...) is unclear and subject to debate.
Classical recursion theory deals with computability over discrete structures (natural numbers, finite symbolic words). There is a growing community of researchers working on the extension of this theory to continuous structures arising in mathematics. One goal is to give foundations of numerical analysis, by studying the limitations of machines in terms of computability or complexity, when computing with real numbers. Classical questions are : if a function
While the notion of a computable function over discrete data is captured by the model of Turing machines, the situation is more delicate when the data are continuous, and several non-equivalent models exist. In this case, let us mention computable analysis, which relates computability to topology , ; the Blum-Shub-Smale model (BSS), where the real numbers are treated as elementary entities ; the General Purpose Analog Computer (GPAC) introduced by Shannon with continuous time.
The rewriting paradigm is now widely used for specifying, modelizing, programming and proving. It allows to easily express deduction systems in a declarative way, and to express complex relations on infinite sets of states in a finite way, provided they are countable. Programming languages and environments with a rewriting based semantics have been developed ; see ASF+SDF , Maude , and Tom .
For basic rewriting, many techniques have been developed to prove properties of rewrite systems like confluence, completeness, consistency or various notions of termination. Proof methods have also been proposed for extensions of rewriting such as equational extensions, consisting of rewriting modulo a set of axioms, conditional extensions where rules are applied under certain conditions only, typed extensions, where rules are applied only if there is a type correspondence between the rule and the term to be rewritten, and constrained extensions, where rules are enriched by formulas to be satisfied , , .
An interesting aspect of the rewriting paradigm is that it allows automatable or semi-automatable correctness proofs for systems or programs: the properties of rewriting systems as those cited above are translatable to the deduction systems or programs they formalize and the proof techniques may directly apply to them.
Another interesting aspect is that it allows characteristics or properties of the modelled systems to be expressed as equational theorems, often automatically provable using the rewriting mechanism itself or induction techniques based on completion . Note that the rewriting and the completion mechanisms also enable transformation and simplification of formal systems or programs.
Applications of rewriting-based proofs to computer security are various. Approaches using rule-based specifications have recently been proposed for detection of computer viruses , . For several years, in our team, we have also been working in this direction. We already have proposed an approach using rewriting techniques to abstract program behaviors for detecting suspicious or malicious programs .
It is rightful to wonder why there is only a few fundamental studies on computer viruses while it is one of the important flaws in software engineering. The lack of theoretical studies explains maybe the weakness in the anticipation of computer diseases and the difficulty to improve defenses. For these reasons, we do think that it is worth exploring fundamental aspects, and in particular self-reproducing behaviors.
The crucial question is how to detect viruses or self-replicating malwares. Cohen demonstrated that this question is undecidable. The anti-virus heuristics are based on two methods. The first one consists in searching for virus signatures. A signature is a regular expression, which identifies a family of viruses. There are obvious defects. For example, an unknown virus will not be detected, like ones related to a 0-day exploit. We strongly suggest to have a look at the independent audit in order to understand the limits of this method. The second one consists in analysing the behavior of a program by monitoring it. Following , this kind of methods is not yet really implemented. Moreover, the large number of false-positive implies this is barely usable. To end this short survey, intrusion detection encompasses virus detection. However, unlike computer virology, which has a solid scientific foundation as we have seen, the IDS notion of “malwares” with respect to some security policy is not well defined. The interested reader may consult .
The aim is to define security policies in order to prevent malware propagation. For this, we need (i) to define what is a computer in different programming languages and setting, (ii) to take into consideration resources like time and space. We think that formal methods like rewriting, type theory, logic, or formal languages, should help to define the notion of a formal immune system, which defines a certified protection.
This study on computer virology leads us to propose and construct a “high security lab” in which experiments can be done in respect with the French law. This project of “high security lab” in one of the main project of the CPER 2007-2013.
Understanding computation theories for continuous systems leads to studying hardness of verification and control of these systems. This has been used to discuss problems in fields as diverse as verification (see e.g. ), control theory (see e.g. ), neural networks (see e.g. ), and so on.
We are interested in the formal decidability of properties of dynamical systems, such as reachability , the Skolem-Pisot problem , the computability of the
Contrary to computability theory, complexity theory over continuous spaces is underdeveloped and not well understood. A central issue is the choice of the representation of objects by discrete data and its effects on the induced complexity notions. As for computability, it is well known that a representation is gauged by the topology it induces. However more structure is needed to capture the complexity notions: topologically equivalent representations may induce different classes of polynomial-time computable objects. For example, developing a sound complexity theory over continuous structures would enable us to make abstract computability results more applicable by analysing the corresponding complexity issues. We think that the preliminary step towards such a theory is the development of higher-order complexity, which we are currently carrying out.
In contrast with the discrete setting, it is of utmost importance to compare the various models of computation over the reals, as well as their associated complexity theories. In particular, we focus on the General Purpose Analog Computer of Claude Shannon , on recursive analysis , on the algebraic approach and on computability in a probabilistic context .
A crucial point for future investigations is to fill the gap between continuous and discrete computational models. This is one deep motivation of our work on computation theories for continuous systems.
The other research direction on dynamical systems we are interested in is the study of properties of adversary systems or programs, i.e. of systems whose behavior is unknown or indistinct, or which do not have classical expected properties. We would like to offer proof and verification tools, to guarantee the correctness of such systems.
On one hand, we are interested in continuous and hybrid systems. In a mathematical sense, a hybrid system can be seen as a dynamical system, whose transition function does not satisfy the classical regularity hypotheses, like continuity, or continuity of its derivative. The properties to be verified are often expressed as reachability properties. For example, a safety property is often equivalent to (non-)reachability of a subset of unsure states from an initial configuration, or to stability (with its numerous variants like asymptotic stability, local stability, mortality, etc ...). Thus we will essentially focus on verification of these properties in various classes of dynamical systems.
We are also interested in rewriting techniques, as a mean to describe dynamical systems, in particular in the adversary context. As they were initially developed in the context of automated deduction, rewriting proof techniques, although now numerous, are not yet adapted to the complex framework of modelling and programming. An important stake in the domain is to enrich them to provide realistic validation tools, both in providing finer rewriting formalisms and their associated proof techniques, and in developing new validation concepts in the adversary case, i.e. when usual properties of the systems like, for example, termination are not verified.
For several years, we have been developing specific procedures for property proofs of rewriting, for the sake of programming, in particular with an inductive technique, already applied with success to termination under strategies , , , to weak termination , sufficient completeness and probabilistic termination .
The last three results are in the context of adversary computations, since they allow to prove that a program can give the expected results, even when it diverges i.e., even when it has not the usual termination property. A common mechanism has been extracted from the above works, and provides a generic inductive proof framework for properties of reduction relations, which can be parametrized by the property to be proved , . Provided that program code can be translated into rule-based specifications, this approach can be applied to correctness proof of software in a larger context.
A crucial element of safety and security of software systems is the problem of resources. We are working in the field of Implicit Computational Complexity. Interpretation based methods like Quasi-interpretations (QI) or sup-interpretations, are the approach we have been developing these last five years, see , , . Implicit complexity is an approach to the analysis of the resources that are used by a program. Its tools come essentially from proof theory. The aim is to compile a program while certifying its complexity.
MMDEX is a virus detector based on morphological analysis. It is composed of our own disassembler tool, on a graph transformer and a specific tree-automaton implementation. The tool is used in the EU-Fiware project and by some other partners (e.g. DAVFI project).
Written in C, 20k lines.
APP License, IDDN.FR.001.300033.000.R.P.2009.000.10000, 2009.
TraceSurfer is a self-modifying code analyzer coming with an IDA add-on. It works as a wave-builder. In the analysis of self-modifying programs, one basic task is indeed to separate parts of the code which are self-modifying into successive layers, called waves. TraceSurfer extracts waves from traces of program executions. Doing so drastically simplifies program verification.
Written in C, 5k lines.
Private licence.
CROCUS is a program interpretation synthetizer. Given a first order program (possibly written in OCAML), it outputs a quasi-interpretation based on max, addition and product. It is based on a random algorithm. The interpretation is actually a certificate for the program's complexity. Users are non academics (some artists).
Written in Java, 5k lines.
Private licence.
Birkhoff theorem is a central result in ergodic theory. Consider a dynamical system
For several years we have been working on the project of identifying the exact computational content of several ergodic theorems: can the speed of convergence of limit frequencies be computed? Can one distinguish between points with different limit frequencies? Can we construct (compute) points whose trajectory follow a prescribed distribution? How random (i.e. incompressible) a point has to be for the distribution of its trajectory to converge?
We have obtained new insight in the above questions by proving that random elements eventually reach effective closed sets of positive measure (while it was only known for a more restricted class of sets). The paper appeared in Information and Computation . This result is a key tool in the proof of the result published in .
A chaotic system is unpredictable because it has much more trajectories than observable initial conditions: hence many undistinguishable initial points lead to radically different trajectories. As there are many trajectories, most of them are complex in the sense that they can hardly be compressed, i.e. described in a shorter way than simply listing them. The Shannon-McMilan-Breiman theorem states that the compression-rate of most trajectories coincides with the entropy of the system.
We have been interested in the computational content of this theorem: how random a point has to be to generate a trajectory whose compression rate is the entropy? This question was raised in and has been left open for 14 years. We have solved the problem by showing that Martin-Löf notion of randomness is sufficient. Our recent result presented in is a key ingredient of our proof. We presented the result at STACS .
The ergodic decomposition theorem states that a dynamical system can always be uniquely decomposed into indecomposable subsystems, technically ergodic subsystems. We have been interested in the computability of the decomposition operation. It is known from that this operation is not computable in general. Whether this operation is still not computable when the system can be decomposed into a finite number of subsystems was open. We raised the question and answer it negatively in . More precisely, we prove the existence of ergodic measures
We strengthen the preceding result by making
We study the constructive content of the Radon-Nikodym theorem, show that it is not computable in general and precisely locate its non-computability in the Weihrauch lattice. The paper appeared in the first issue of the new journal Computability.
Our study on behavioural malware detection has been continued. We have been developing an approach detecting suspicious schemes on an abstract representation of the behavior of a program, by abstracting program traces, rewriting given subtraces into abstract symbols representing their functionality. Considering abstract behaviors allows us to be implementation-independent and robust to variants and mutations of malware. Suspicious behaviors are then detected by comparing trace abstractions to reference malicious behaviors.
We had previously proposed to abstract trace automata by rewriting them with respect to a set of predefined behavior patterns defined as a regular language described by a string rewriting system . We then have increased the power of our approach on two aspects. We fist have modified the abstraction mechanism, keeping the abstracted patterns in the rewritten traces, which allows us to handle interleaved patterns. Second, we have extended the rewriting framework to express data constraints on action parameters by using term rewriting systems. An important consequence is that, unlike in , using the data-flow, we can now detect information leaks in order to prevent unauthorized disclosure or modifications of information.
We also have introduced model checking in our approach: the predefined behavior patterns, used to abstract program traces, have been defined by first order temporal logic formulas, as well as the reference suspicious behaviors, given in a signature. The infection problem can then be seen as the satisfaction problem of the formula of the signature by an abstracted trace of the program, which can be checked using existing model checking techniques. This work has been published at the ESORICS conference .
Analyzing cryptographic implementations has important applications, especially for malware analysis where they are an integral part both of the malware payload and the unpacking code that decrypts this payload. These implementations are often based on well-known cryptographic functions, whose description is publicly available. While potentially very useful for malware analysis, the identification of such cryptographic primitives is made difficult by the fact that they are usually obfuscated. Current state-of-the-art identification tools are ineffective due to the absence of easily identifiable static features in obfuscated code. However, these implementations still maintain the input-output (I/O) relationship of the original function. In a joint work with José M. Fernandez published in , we present a tool that leverages this fact to identify cryptographic functions in obfuscated programs, by retrieving their I/O parameters in an implementation-independent fashion, and comparing them with those of known cryptographic functions. In experimental evaluation, we successfully identified the cryptographic functions TEA, RC4, AES and MD5 in obfuscated programs. In addition, our tool was able to recognize basic operations done in asymmetric ciphers such as RSA.
Self-replication is one of the fundamental aspects of computing where a program or a system may duplicate, evolve and mutate. Our point of view is that Kleene's (second) recursion theorem is essential to understand self-replication mechanisms. An interesting example of self-replication codes is given by computer viruses. This was initially explained in the seminal works of Cohen and of Adleman in the eighties. In fact, the different variants of recursion theorems provide and explain constructions of self-replicating codes and, as a result, of various classes of malware. None of the results are new from the point of view of computability theory. We just propose a self-modifying register machine as a model of computation in which we can effectively deal with self-reproduction and in which new offsprings can be activated as independent organisms. This work was published by Jean-Yves Marion in a special issue on the honor of Alan Turing .
Let us suppose we are given some malware and we want to know what it is doing. One may run it, or one may analyze it more or less statically. Typically, an expert tries to guess the behavior of a malware through the analysis of its binary code (in tools such as Ida). The task is much simpler if the expert already knows some part of the code. We have shown that morphological analysis could be used in such a context. We have rediscovered the parts of the malware Duqu within Stuxnet. We have rediscovered the compilation options used to include OpenSSL's functions within Waledac .
We are currently working with the consortium “malware.lu”.
Emmanuel Jeandel is a member of ANR Blanche ANR-09-BLAN-0164 (EMC: Emerging Phenomena in Computation Models).
We obtained an ANR project called Binsec which will start in 2013. The aim of the BINSEC project is to fill part of the gap between formal methods over executable code on one side, and binary-level security analyses currently used in the security industry. We target two main applicative domains: vulnerability analysis and virus detection. Two other closely related applications will also be investigated: crash analysis and program deobfuscation.
Title: Morphus
Type: COOPERATION (ICT)
Defi: PPP FI: Technology Foundation: Future Internet Core Platform
Instrument: Integrated Project (IP)
Duration: September 2011 - May 2014
Coordinator: Telefonica (Spain)
Others partners:Thales, SAP, Inria
See also: http://
Abstract: FI-WARE will deliver a novel service infrastructure, building upon elements (called Generic Enablers) which offer reusable and commonly shared functions making it easier to develop Future Internet Applications in multiple sectors. This infrastructure will bring significant and quantifiable improvements in the performance, reliability and production costs linked to Internet Applications for building a true foundation for the Future Internet.
Title: Resource Control by Semantic Interpretations and Linear Proof Theory
Inria principal investigator: Romain Péchoux
International Partner (Institution - Laboratory - Researcher):
Universita degli Studi di Torino (Italy) - Dipartimento di informatica
Duration: 2010 - 2012
See also: http://
Topic: resource control using semantics interpretations and linear proof theory.
Mathieu Hoyrup is the principal investigator of a Partenariat Hubert Curien Imhotep 2011-2012 together with Walid Gooma, University of Alexandria, Egypt.
Daniel Leivant: October 25th to November 5th, 2012, Indiana University, USA.
Walid Gooma: December, 2012, University of Alexandria, Egypt.
Guillaume Bonfante: July 7th to 15th, 2012, invited by Stanislas Leibler from the 'Institute of Advanced Studies', Princeton, USA.
He gave a course on computer virology at the summer school "PiTP", Prospects in Theoretical Physics http://
Jean-Yves Marion: October 25th to November 5th, 2012, Indiana University, USA, work with Daniel Leivant.
Romain Péchoux: February and August 2012, University of Pennsylvania, USA, invited talk to the PLclub seminar.
Guillaume Bonfante:
was a member of the program committee of the workshop DICE 2012, Tallinn, Estonia,
refereed some projects for the Netherlands Organisation for Scientific Research (NWO).
Isabelle Gnaedig is:
co-leader of the EPI Carte,
member of the scientific mediation committee of Inria Nancy Grand-Est,
member of the engineer recruitment committee of Inria Nancy-Grand Est,
social referee at Inria Nancy-Grand Est.
Mathieu Hoyrup is:
the organizer of the Seminar of the LORIA Department “Formal Methods”.
Jean-Yves Marion is:
head of the EPI Carte,
head of the High Security Lab,
member of the executive committee of Inria Nancy-Grand Est,
member if the AERES visiting committee,
member of the CNU, section 27,
member of the steering committee of DICE and LCC,
chair of Malware 2012 and PC member of PREW 2012,
co-organizer of a winter school at CIRM (one week on complexity).
Romain Pechoux is:
principal investigator of the Inria associate team Cristal.
Guillaume Bonfante:
talk at the "Rewriting and Logic Seminar" at the JAIST (Japan Advanced Institute of Science and Technology), Kanazawa, March 2012.
Emmanuel Hainry:
presentation at Logic and Interactions 2012, complexity week in Marseille, January 30th to February 3rd, 2012.
Mathieu Hoyrup:
invited to give the talk "On the inversion of computable functions" at the 7th International Conference on Computability, Complexity and Randomness, in Cambridge, July 2012,
gave the talk "Computability in Ergodic Theory” at the Journées Calculabilités, Paris, March 5 and 6, 2012.
Emmanuel Jeandel
invited to give a four hours minicourse: "Computational aspects of multidimensional symbolic dynamics", at the first DySyCo school (Dynamical Systems and Computation) in Santiago, Chile, December 2012.
Jean-Yves Marion
talk on computer viruses at Corte, September, 2012 and at Grenoble, December, 2012,
two talks on computer viruses at Bloomington, Indiana University, October 29th and October 31st, 2012.
Licence:
Guillaume Bonfante:
Java, L3, Ecole des Mines, Université de Lorraine, France.
Emmanuel Hainry:
Operating Systems, 60 h, L1, IUT Nancy Brabois, Université de Lorraine, Nancy, France,
Algorithms and Programs, 60 h, L1, IUT Nancy Brabois, Université de Lorraine, Nancy, France,
Object Oriented Programming, 24 h, L1, IUT Nancy Brabois, Université de Lorraine, Nancy, France,
Databases, 24 h, L2, IUT Nancy Brabois, Université de Lorraine, Nancy, France,
Advanced Databases, 20 h, L2, IUT Nancy Brabois, Université de Lorraine, Nancy, France,
Complexity, 28 h, L2, IUT Nancy Brabois, Université de Lorraine, Nancy, France,
Algorithmics, 12 h, DU PFST (equiv. L1), IUT Nancy Brabois, Université de Lorraine, Nancy, France,
Operating Systems, 9 h, DU PFST (equiv. L1), IUT Nancy Brabois, Université de Lorraine, Nancy, France.
Emmanuel Jeandel
Probability for Computer Science, 46 hours, L3 Informatique, FST, Université de Lorraine, Nancy, France.
Romain Péchoux:
c2i, 20, L1 , Université de Lorraine, Nancy, France,
Propositional logic, 25 hours, L1 ISC, UFR MI, Université de Lorraine, Nancy, France,
Introduction to Java, 45 hours, L3 ISC MIAGE , UFR MI, Université de Lorraine, Nancy, France,
Databases, 42 hours, L3 LSG, ISAM-IAE, Université de Lorraine, Nancy, France,
Algorithmic complexity, 30 hours, L3 ISC MIAGE, IGA, Casablanca, Maroc.
Master:
Guillaume Bonfante:
Modelling and UML, M1, Ecole des Mines, Université de Lorraine, Nancy, France,
Video games, M1, Ecole des Mines, Université de Lorraine, Nancy, France,
Semantics, M1, Ecole des Mines, Université de Lorraine, Nancy, France,
Safety of software, M2, Ecole des Mines, Université de Lorraine, Nancy, France.
Isabelle Gnaedig:
“Design of Safe Software”, Coordination of the module, M2, ESIA-Telecom (Université de Lorraine), Nancy, France,
“Rule-based Programming”, 20 hours, M2, ESIAL-Telecom (Université de Lorraine), Nancy, France.
Emmanuel Jeandel:
Algorithms and Complexity, 35 hours, M1 Informatique, FST, Université de Lorraine, Nancy, France.
Jean-Yves Marion:
Automata Theory, M1, Ecole des Mines, Université de Lorraine, Nancy, France,
Logic, M1, Ecole des Mines, Université de Lorraine, Nancy, France,
Data Base, M1, Ecole des Mines, Université de Lorraine, Nancy, France,
Algorithmic, M1, Ecole des Mines, Université de Lorrain, Nancy, France,
Security, M2, Ecole des Mines, Université de Lorraine, Nancy, France,
Head of the M2 internships, M2, Ecole des Mines, Université de Lorraine, Nancy, France.
Romain Péchoux:
Advanced Java, 52 hours, M1 MIAGE, UFR MI, Université de Lorraine, Nancy, France,
Mathematics for computer science, 20 hours, M1 SCA, UFR MI, Université de Lorraine, Nancy, France.
PhD : Pascal Vanier, "Pavages : périodicité et complexité calculatoire", Aix-Marseille Université, defended November 22th, 2012, Emmanuel Jeandel.
PhD in progress : Thanh Dinh Ta, Malware Algebraic Modeling and Detection, started Sept. 2010, Jean-Yves Marion (director) and Romain Péchoux (co-advisor).
PhD in progress: Hugo Férée, Computational Complexity in Analysis, started Sept. 2011, Jean-Yves Marion (director) and Mathieu Hoyrup (co-advisor).
PhD in progress: Aurélien Thierry, Morphological Analysis of Malware, started October 2011, Jean-Yves Marion.
PhD in progress: Joan Calvet, Viruses: how they interact with their environment, started October 2009, Jean-Yves Marion and José M. Fernandez, joint PhD with Ecole Polytechnique de Montréal.
Isabelle Gnaedig:
participation to the ESIAL-Telecom admission committee.
Emmanuel Hainry:
in charge of the admission committee for dept. Réseaux et Télécoms at IUT Nancy Brabois.
Mathieu Hoyrup:
participation to the jury of Master 2 "Logique Mathématique et Fondements de l'Informatique" of Benoît Monin.
Jean-Yves Marion:
November 30th, 2012, HDR of Virgile Mogbil, U. Paris 13 (reviewer),
December 3th, 2012, PhD of Vincent Cheval, ENS Cachan (president),
December 6th, 2012, PhD of Antoine Madet (reviewer),
December 7th, 2012, PhD of Vincent Demange,
December 13th, 2012, HDR of Emmanuel Thomé,
December 17th, 2012, PhD of Hedi Benzina, ENS Cachan (reviewer).
Romain Péchoux:
participation to the UFR MI, licence Miage, admission committee.
Guillaume Bonfante:
talk at the "Rendez-Vous du Numérique et de l'Intelligence Economique", about ”Computer viruses”, Strasbourg, October 2012,
talk at the "Journées francophones de l'investigation numérique (JFIN)", about “Les virus, amis ou ennemis de l'investigation numérique ? ”, Levallois-Perret, November 2012.
Emmanuel Hainry:
presentation at Inria Jam Session 2012, June 19th, 2012 http://
Jean-Yves Marion:
redaction of an article in “Pour la Science” http://