The aim of the CARTE research team is to take into account adversity in computations, which is implied by actors whose behaviors are unknown or unclear. We call this notion adversary computation.
The project combines two approaches, and we think that their combination will be fruitful. The first one is the analysis of the behavior of a wide-scale system, using tools coming from Continuous Computation Theory. The second approach is to build defenses with tools coming rather from logic, rewriting and, more generally, from Programming Theory.
The activities of the CARTE team are organized around two research actions:
Computer Virology.
Computation over Continuous Structures
From a historical point of view, the first official virus appeared in 1983 on Vax-PDP 11. In the very same time, a series of papers was published which always remain a reference in computer virology: Thompson , Cohen and Adleman .
The literature which explains and discusses practical issues is quite extensive, see for example Ludwig's book or Szor's one and all web sites...But, we think that the best references are both books of Filiol (English translation ) and . However, there are only a few theoretical/scientific studies, which attempt to give a model of computer viruses.
A virus is essentially a self-replicating program inside an adversary environment. Self-replication has a solid background based on works on fixed point in
A virus infects programs by modifying them
A virus copies itself and can mutate
Virus spread throughout a system
The above scientific foundation justifies our position to use the word virus as a generic word for self-replicating malwares. (There is yet a difference. A malware has a payload, and virus may not have one.) For example, worms are an autonous self-replicating malware and so fall into our definition. In fact, the current malware taxonomy (virus, worms, trojans, ...) is unclear and subject to debate.
Classical recursion theory deals with computability over discrete structures (natural numbers, finite symbolic words). There is growing community of researchers working on the extension of
this theory to continuous structures arising in mathematics. One goal is to give foundations of numerical analysis, by studying the limitations, in terms of computability or complexity, of
machines when computing with real numbers. Classical questions are : if a function
While the notion of a computable function over discrete data is captured, according to the Church-Turing thesis, by the model of Turing machines, the situation is more delicate when the data are continuous, and several non-equivalent models exist. We mention computable analysis, which relates computability to topology , ; the Blum-Shub-Smale model (BSS), where the real numbers are treated as elementary entities ; the General Purpose Analog Computer (GPAC) introduced by Shannon where the time is continuous.
Rewriting has reached some maturity and the rewriting paradigm is now widely used for specifying, modelizing, programming and proving. It allows for easily expressing deduction systems in a declarative way, for expressing complex relations on infinite sets of states in a finite way, provided they are countable. Programming languages and environments have been developed, which have a rewriting based semantics. Let us cite ASF+SDF , Maude , and Tom .
For basic rewriting, many techniques have been developed to prove properties of rewrite systems like confluence, completeness, consistency or various notions of termination. In a weaker proportion, proof methods have also been proposed for extensions of rewriting like equational extensions, consisting of rewriting modulo a set of axioms, conditional extensions where rules are applied under certain conditions only, typed extensions, where rules are applied only if there is a type correspondence between the rule and the term to be rewritten, and constrained extensions, where rules are enriched by formulas to be satisfied , , .
An interesting aspect of the rewriting paradigm is that it allows automatable or semi-automatable correctness proofs for systems or programs. Indeed, properties of rewriting systems as those cited above are translatable to the deduction systems or programs they formalize and the proof techniques may directly apply to them.
Another interesting aspect is that it allows characteristics or properties of the modelized systems to be expressed as equational theorems, often automatically provable using the rewriting mechanism itself or induction techniques based on completion . Note that the rewriting and the completion mechanisms also enable transformation and simplification of formal systems or programs. Applications of rewriting-based proofs to computer security are various. Let us mention recent work using rule-based specifications for detection of computer viruses , .
Nowadays, our thoughts lead us to define four different research tracks, that we are describing below.
It is rightful to wonder why there is only a few fundamental studies on computer viruses while it is one of the important flaws in software engineering. The lack of theoretical studies explains maybe the weakness in the anticipation of computer diseases and the difficulty to improve defenses. For these reasons, we do think that it is worth exploring fundamental aspects, and in particular self-reproducing behaviors.
The crucial question is how to detect viruses or self-replicating malwares. Cohen demonstrated that this question is undecidable. The anti-virus heuristics are based on two methods. The first one consists in searching for virus signatures. A signature is a regular expression, which identifies a family of viruses. There are obvious defects. For example, an unknown virus will not be detected, like ones related to a 0-day exploit. We strongly suggest to have a look at the independent audit in order to understand the limits of this method. The second one consists in analysing the behavior of a program by monitoring it. Following , this kind of methods is not yet really implemented. Moreover, the large number of false-positive implies this is barely usable. To end this short survey, intrusion detection encompasses virus detection. However, unlike computer virology, which has a solid scientific foundation as we have seen, the IDS notion of “malwares” with respect to some security policy is not well defined. The interested reader may consult .
The aim is to define security policies in order to prevent malware propagation. For this, we need (i) to define what is a computer in different programming languages and setting, (ii) to take into consideration resources like time and space. We think that formal methods like rewriting, type theory, logic, or formal languages, should help to define the notion of a formal immune system, which defines a certified protection.
This study on computer virology leads us to propose and construct a “high security lab” in which experiments can be done in respect with the French law. This project of “high security lab” in one of the main project of the CPER 2007-2013.
Understanding computation theories for continuous systems leads to studying hardness of verification and control of these systems. This has been used to discuss problems in fields as diverse as verification (see e.g. ), control theory (see e.g. ), neural networks (see e.g. ), and so on.
We are interested in the formal decidability of properties of dynamical systems, such as reachability
, the Skolem-Pisot problem
, the computability of the
Due to the difficulty of their analysis, the study of dynamical systems is often impossible without computer simulations. Nevertheless those simulations are often heuristic and due to round-off errors, what is observed on the screen is not guaranteed to reflect the actual behavior of the original mathematical system. Computable analysis has the advantage of getting rid of the truncation problems, integrating the management of errors to the computation. We then use this theory to investigate the possibility to compute characteristics of dynamical systems that are fundamental objects in the mathematical theory, such as attractors or invariant measures. Being asymptotic objects, they might not be always computable: for instance it has been proved in that some Julia sets (see Figure ) cannot be computed at all, i.e. there is no program that would plot such sets up to any resolution.
In we prove that there exist computable systems for which the statistical long-term behavior (technically the invariant measures) is not computable.
In contrast with the discrete setting, it is of utmost importance to compare the various models of computation over the reals, as well as their associated complexity theories. In particular, we focus on the General Purpose Analog Computer of Claude Shannon , on recursive analysis , on the algebraic approach and on computability in a probabilistic context .
A crucial point for future investigations is to fill the gap between continuous and discrete computational models. This is one deep motivation of our work on computation theories for continuous systems.
The other research direction on dynamical systems we are interested in is the study of properties of adversary systems or programs, i.e. of systems whose behavior is unknown or indistinct, or which do not have classical expected properties. We would like to offer proof and verification tools, to guarantee the correctness of such systems.
On one hand, we are interested in continuous and hybrid systems. In a mathematical sense, a hybrid system can be seen as a dynamical system, whose transition function does not satisfy the classical regularity hypotheses, like continuity, or continuity of its derivative. The properties to be verified are often expressed as reachability properties. For example, a safety property is often equivalent to (non-)reachability of a subset of unsure states from an initial configuration, or to stability (with its numerous variants like asymptotic stability, local stability, mortality, etc ...). Thus we will essentially focus on verification of these properties in various classes of dynamical systems.
We are also interested by rewriting techniques, used to describe dynamic systems, in particular in the adversary context. As they were initially developed in the context of automated deduction, the rewriting proof techniques, although now numerous, are not yet adapted to the complex framework of modelization and programming. An important stake in the domain is then to enrich them to provide realistic validation tools, both in providing finer rewriting formalisms and their associated proof techniques, and in developing new validation concepts in the adversary case, i.e. when usual properties of the systems like, for example, termination are not verified.
For several years, we have been developing specific procedures for property proofs of rewriting, for the sake of programming, in particular with an inductive technique, already applied with success to termination under strategies , , , to weak termination , sufficient completeness and probabilistic termination .
The last three results take place in the context of adversary computations, since they allow for proving that even a divergent program, in the sense where it does not terminate, can give the expected results.
A common mechanism has been extracted from the above works, providing a generic inductive proof framework for properties of reduction relations, which can be parametrized by the property to be proved , . Provided program code can be translated into rule-based specifications, this approach can be applied to correctness proof of software in a larger context.
A crucial element of safety and security of software systems is the problem of resources. We are working in the field of Implicit Computational Complexity. Interpretation based methods like Quasi-interpretations (QI) or sup-interpretations, are the approach we have been developing these last five years, see , , . Implicit complexity is an approach to the analysis of the resources that are used by a program. Its tools come essentially from proof theory. The aim is to compile a program while certifying its complexity.
An anti-virus software based on morphological analysis, Dépôt APP du logiciel MMDEX, 2009, IDDN.FR.001.300033.000.R.P.2009.000.10000
Online disassembler.
http://
A self-modifying code analyzer coming with an IDA add-on.
http://
Crême Brûlée is an experimental Javascript dynamic instrumentation engine.
http://
In , Guillaume Bonfante, Jean-Yves Marion and Jean-Yves Moyen show how quasi-interpretations can be used to deal with the resource analysis of first order functional programs. This work has been a root for several further development in implicit computational complexity.
So far, in the implicit complexity characterizations based on the ordering MPO developed in the team, we were using the subterm relation to compare values. In , we have shown that the embedding relation is a generalization of these results.
We have continued to work on rewriting property proofs in the adversary context. Our inductive proof technique, initially developed for proving termination of rewriting for systems that do not enjoy the strong termination property, was first proposed to establish termination proofs under particular strategies: the innermost, outermost, local strategies .
We then have tackled the proof problem of weak properties i.e., properties that do hold only on certain derivation branches. Weak property proofs are still marginal in the domain of rewriting, probably because classical proof techniques, especially for termination, work on the rules, so that the phenomenons arising in the induced rewriting relation are hidden. Our technique, developing proof trees simulating rewriting trees by abstraction and narrowing, explicitly describes the behavior of the studied property on derivation branches, allowing to establish it on good branches. In addition, it is constructive, which is very useful in the programming context: the good branches are identified at compile time, when the proof is established. At run time, derivations are computed only on a good derivation branch, which avoids using the costly breadth-first strategy.
We then have proposed a procedure, based on our inductive principle, for weak termination and
Our study on behavioural malware detection has been continued. We have been developing an approach detecting suspicious schemes on an abstract representation of the behavior of a program, by abstracting program traces, rewriting given subtraces into abstract symbols representing their functionality. Considering abstract behaviors allows us to be implementation-independent and robust to variants and mutations of malware. Suspicious behaviors are then detected by comparing trace abstractions to reference malicious behaviors.
Last year, we had proposed to abstract trace automata by rewriting them with respect to a set of predefined behavior patterns defined as a regular language described by a string rewriting system . We have increased the power of our approach on two aspects. We fist have modified the abstraction mechanism, keeping the abstracted patterns in the rewritten traces, by just marking them. This now allows us to handle interleaved patterns. Second, we have extended the rewriting framework to express data constraints on action parameters by using term rewriting systems. An important consequence is that, unlike in , using the data-flow, we can now detect information leaks in order to prevent unauthorized disclosure or modifications of information .
The previous approach has also been extended to a probabilistic model of rewriting, in order to express uncertainty in the behavior pattern recognition. All these results on detection of malware by behavior abstraction have been given in the PhD thesis of Philippe Beaucamps, directed by Isabelle Gnaedig and Jean-Yves Marion, and defended 14 November, 2011 .
In , we solve a problem that has been open for 15 years. It relates three notions of complexity and information: Shannon information and entropy, Kolmogorov algorithmic information and Martin-Löf randomness. We obtain that the limit rate of compressibility of a random sequence equals the entropy of the underlying ergodic measure. This result is the achievement of several years of development.
Results about the forcasting of the long-term statistics in dynamical systems. In previous works we studied the computability of the limit-frequencies. We had proved in particular that in general they cannot be computed, we have turned to the following question: can they be computed, allowing the observation of the system as an oracle? In , we obtain several positive results, leaving the general problem open.
In , we study the constructive content of the Radon-Nikodym theorem, show that it is not computable in general and precisely locate its non-computability in the Weihrauch lattice.
A new constructive proof of Birkhoff's ergodic theorem, with as an application a strengthening of former results on random elements: in ergodic systems, random elements eventually reach effective closed sets of positive measure (while it was only known for a more restricted class of sets). The paper is in press and will appear soon in Information and Computation.
New results about randomness for a class of measures (and not only for one particular measure) are presented in .
We have studied the link between undecidability and robustness in dynamical systems. Indeed, undecidability occurs very easily in dynamical systems. However there exist good decision algorithms that work for most systems that are not pathological. We argue that this decidability trait may be related to their robustness to infinitesimal noise. We have proved that in smooth dynamical systems, robustness is equivalent to decidability of the reachability problem. This result relies on various hypotheses depending on the compactness of the domain and whether time is discrete or continuous
In , we present a characterization of polytime computable functions in the Recursive Analysis setting. This paper in fact presents a generic framework for lifting characterizations of complexity or computability classes in the classical setting into analog characterizations in Recursive Analysis.
Jean-Yves Marion has worked on light (soft) linear logics with Marco Gaboardi and Simona Ronchi Della Rocca in . This work is based on an extension of a soft linear lambda calculus by means of a conditional construction. It provides a correspondence with the well-known result aptime= pspace.
Jean-Yves Marion proposed a type system for an imperative programming language which certifies time bounds in . It is based on secure flow information analysis as proposed for instance by Bell and La Padula. Thus, a link is done between computational complexity and security-typed languages.
We have no contract with industry. However, we have several relationships with industrial partners like Thales and Netasq and established a lot of others contacts. See the Fi-Ware project.
Project CyS of GIS 3SGS on smartphone forensics.
We have active collaborations with:
Alexander Shen (LIF),
Laurent Bienvenu (LIAFA),
Florian Deloup came in our group for six months as a CNRS researcher.
Title: Morphus
Type: COOPERATION (ICT)
Defi: PPP FI: Technology Foundation: Future Internet Core Platform
Instrument: Integrated Project (IP)
Duration: May 2011 - April 2014
Coordinator: Telefonica (Spain)
Others partners:Thales, SAP, INRIA
See also:
http://
Abstract: FI-WARE will deliver a novel service infrastructure, building upon elements (called Generic Enablers) which offer reusable and commonly shared functions making it easier to develop Future Internet Applications in multiple sectors. This infrastructure will bring significant and quantifiable improvements in the performance, reliability and production costs linked to Internet Applications ? building a true foundation for the Future Internet.
Stefano Galatolo (Universitá di Pisa),
Daniel Graça(University of Faro),
Georg Moser (University of Innsbruck),
Klaus Weihrauch (FernUniversität Hagen).
ARC CaCO3 (France-Egypt),
http://
Title: COntinuous tiMe comPUTations, computation on the Reals
INRIA principal investigator: Emmanuel Hainry
International Partner:
Institution: Instituto de Telecomunicaçoes (Portugal)
Laboratory: Security and Quantum Information Group
Duration: 2009 - 2011
See also:
http://
Title: Resource Control by Semantic Interpretations and Linear Proof Theory
INRIA principal investigator: Romain Péchoux
International Partner:
Institution: Universita degli Studi di Torino (Italy)
Laboratory: Dipartimento di informatica
Duration: 2010 - 2012
See also:
http://
We have active collaboration with:
Peter Gács (Boston University),
Cristóbal Rojas (Toronto),
José Fernandez (Montreal),
We also start some collaborations with Dawn Song at Berkeley.
Daniel Leivant (Indiana University, invited for six months)
John Case (University of Delaware),
http://
Walid Gomaa (University of Cairo),
http://
Guillaume Bonfante:
has been invited to give a talk at the Workshop on Logic and Computation,
http://
has been a referee of the PhD of Jean-Marie Borello. The thesis, dealing with virology, is entitled "Étude du métamorphisme viral : modélisation, conception et détection",
was in the jury of the PhD of Matthieu Morey. The thesis deals with Natural Language Processing,
contributed to the papers , whose aim is to compute the semantics of a sentence (in a natural language) from its syntactical analysis,
is a member of the program committee of the workshop LCC 2011,
http://
Isabelle Gnaedig:
is co-leader of the Carte research team,
was co-director of the PhD thesis of Philippe Beaucamps, defended 14 November, 2011,
is member of the scientific mediation committee of INRIA Nancy Grand-Est,
participated to the ESIAL admission committee.
Mathieu Hoyrup:
is the organizer of the Seminar of the Department Formal Methods (
http://
gave a course (2 hours) at the École Jeunes Chercheurs en Informatique et Mathématiques in Amiens, April 2011, on computability over the real numbers.
Jean-Yves Marion:
has been in the program committee of FOPARA 2011, CSL 2011,
has been the chair of DICE 2011, Malware 2011,
is co-chair of Complexity and Logic week, 30/01-03/02, at the winter school at CIRM
Romain Péchoux:
is french coordinator of the Associated team Cristal.
Guillaume Bonfante is teaching at the Ecole des Mines:
“Java”, L3,
“Modelling and UML”, M1,
"Video-games", M1,
"Semantics", M1
Safety of software, M2
Isabelle Gnaedig is teaching at ESIAL (Université Henri Poincaré):
Module “Design of Safe Software”, Coordination of the module, M2,
“Rule-based Programming”, 20 hours, M2,
Emmanuel Hainry is teaching courses
on operating systems, algorithmics, object programming, functional programming and databases at IUT Nancy-Brabois, Université Henri Poincaré (level L1, L2)
Romain Péchoux teaches at Université Nancy 2 the following courses:
Préparation au c2i, L1, Université Nancy 2, France
Langage de programmation orienté objet, L3 MIAGE, UFR MI, Université Nancy 2, France
Java avancé, M1 MIAGE, UFR MI, Université Nancy 2, France
Base de données, L3 LSG, IAE, Université Nancy 2, France
Complexité algorithmique, L3 MIAGE, IGA Casablanca, Maroc
Romain Péchoux was responsible of "Préparation au c2i" at Nancy 2 University up to june 2011 (approximatively 3500 students) and he is director of licence MIAGE, UFR MI, Nancy 2 University since july 2011.
PhD : Philippe Beaucamps, Analyse de Programmes Malveillants par Abstraction de Comportements, Université Henri Poincaré, Nancy, defended 14th November, 2011, Directors: Jean-Yves Marion, Isabelle Gnaedig.