The aim of the CARTE research team is to take into account adversity in computations, which is implied by actors whose behaviors are unknown or unclear. We call this notion adversary computation.
The project combines two approaches, and we think that their combination will be fruitful. The first one is the analysis of the behavior of a wide-scale system, using tools coming from Continuous Computation Theory. The second approach is to build defenses with tools coming rather from logic, rewriting and, more generally, from Programming Theory.
The activities of the CARTE team are organized around two research actions:
Computer Virology.
Computation over Continuous Structures
High security Laboratory (LHS).The opening of LHS was officially made on July 1st, 2010. It is the first academic scientific platform in France to make large-scale experiments on computer security. Jean-Yves Marion is the head of LHS and all Carte's member are participating. This is the major achievement of this year.
Jean-Yves Marion got an Award “outstanding contribution” at Malware 2010.
Matthieu Kaczmarek received the first prize for his PhD in February 2010 of Région Lorraine at Metz.
From a historical point of view, the first official virus appeared in 1983 on Vax-PDP 11. In the very same time, a series of papers was published which always remain a reference in computer virology: Thompson , Cohen and Adleman .
The literature which explains and discusses practical issues is quite extensive, see for example Ludwig's book or Szor's one and all web sites...But, we think that the best references are both books of Filiol (English translation ) and . However, there are only a few theoretical/scientific studies, which attempt to give a model of computer viruses.
A virus is essentially a self-replicating program inside an adversary environment. Self-replication has a solid background based on works on fixed point in -calculus and on studies of Von Neumann . More precisely we establish in that Kleene's second recursion theorem is the cornerstone from which viruses and infection scenarios can be defined and classified. The bottom line of a virus behavior is
A virus infects programs by modifying them
A virus copies itself and can mutate
Virus spread throughout a system
The above scientific foundation justifies our position to use the word virus as a generic word for self-replicating malwares. (There is yet a difference. A malware has a payload, and virus may not have one.) For example, worms are an autonous self-replicating malware and so fall into our definition. In fact, the current malware taxonomy (virus, worms, trojans, ...) is unclear and subject to debate.
Classical recursion theory deals with computability over discrete structures (natural numbers, finite symbolic words). There is growing community of researchers working on the extension of this theory to continuous structures arising in mathematics. One goal is to give foundations of numerical analysis, by studying the limitations, in terms of computability or complexity, of machines when computing with real numbers. Classical questions are : if a function is computable in some sense, are its roots computable? in which time? Another goal is to investigate the possibility of designing new computation paradigms, transcending the usual discrete-time, discrete-space computer model initiated by the Turing machine and underlying the modern computers.
While the notion of a computable function over discrete data is captured, according to the Church-Turing thesis, by the model of Turing machines, the situation is more delicate when the data are continuous, and several non-equivalent models exist. We mention computable analysis, which relates computability to topology , ; the Blum-Shub-Smale model (BSS), where the real numbers are treated as elementary entities ; the General Purpose Analog Computer (GPAC) introduced by Shannon where the time is continuous.
Rewriting has reached some maturity and the rewriting paradigm is now widely used for specifying, modelizing, programming and proving. It allows for easily expressing deduction systems in a declarative way, for expressing complex relations on infinite sets of states in a finite way, provided they are countable. Programming languages and environments have been developed, which have a rewriting based semantics. Let us cite ASF+SDF , Maude , and Tom .
For basic rewriting, many techniques have been developed to prove properties of rewrite systems like confluence, completeness, consistency or various notions of termination. In a weaker proportion, proof methods have also been proposed for extensions of rewriting like equational extensions, consisting of rewriting modulo a set of axioms, conditional extensions where rules are applied under certain conditions only, typed extensions, where rules are applied only if there is a type correspondence between the rule and the term to be rewritten, and constrained extensions, where rules are enriched by formulas to be satisfied , , .
An interesting aspect of the rewriting paradigm is that it allows automatable or semi-automatable correctness proofs for systems or programs. Indeed, properties of rewriting systems as those cited above are translatable to the deduction systems or programs they formalize and the proof techniques may directly apply to them.
Another interesting aspect is that it allows characteristics or properties of the modelized systems to be expressed as equational theorems, often automatically provable using the rewriting mechanism itself or induction techniques based on completion . Note that the rewriting and the completion mechanisms also enable transformation and simplification of formal systems or programs. Applications of rewriting-based proofs to computer security are various. Let us mention recent work using rule-based specifications for detection of computer viruses , .
Nowadays, our thoughts lead us to define four different research tracks, that we are describing below.
It is rightful to wonder why there is only a few fundamental studies on computer viruses while it is one of the important flaws in software engineering. The lack of theoretical studies explains maybe the weakness in the anticipation of computer diseases and the difficulty to improve defenses. For these reasons, we do think that it is worth exploring fundamental aspects, and in particular self-reproducing behaviors.
The crucial question is how to detect viruses or self-replicating malwares. Cohen demonstrated that this question is undecidable. The anti-virus heuristics are based on two methods. The first one consists in searching for virus signatures. A signature is a regular expression, which identifies a family of viruses. There are obvious defects. For example, an unknown virus will not be detected, like ones related to a 0-day exploit. We strongly suggest to have a look at the independent audit in order to understand the limits of this method. The second one consists in analysing the behavior of a program by monitoring it. Following , this kind of methods is not yet really implemented. Moreover, the large number of false-positive implies this is barely usable. To end this short survey, intrusion detection encompasses virus detection. However, unlike computer virology, which has a solid scientific foundation as we have seen, the IDS notion of “malwares” with respect to some security policy is not well defined. The interested reader may consult .
The aim is to define security policies in order to prevent malware propagation. For this, we need (i) to define what is a computer in different programming languages and setting, (ii) to take into consideration resources like time and space. We think that formal methods like rewriting, type theory, logic, or formal languages, should help to define the notion of a formal immune system, which defines a certified protection.
This study on computer virology leads us to propose and construct a “high security lab” in which experiments can be done in respect with the French law. This project of “high security lab” in one of the main project of the CPER 2007-2013.
Understanding computation theories for continuous systems leads to studying hardness of verification and control of these systems. This has been used to discuss problems in fields as diverse as verification (see e.g. ), control theory (see e.g. ), neural networks (see e.g. ), and so on.
We are interested in the formal decidability of properties of dynamical systems, such as reachability , the Skolem-Pisot problem , the computability of the -limit set . Those problems are analogous to verification of safety properties.
Due to the difficulty of their analysis, the study of dynamical systems is often impossible without computer simulations. Nevertheless those simulations are often heuristic and due to round-off errors, what is observed on the screen is not guaranteed to reflect the actual behavior of the original mathematical system. Computable analysis has the advantage of getting rid of the truncation problems, integrating the management of errors to the computation. We then use this theory to investigate the possibility to compute characteristics of dynamical systems that are fundamental objects in the mathematical theory, such as attractors or invariant measures. Being asymptotic objects, they might not be always computable: for instance it has been proved in that some Julia sets (see Figure ) cannot be computed at all, i.e. there is no program that would plot such sets up to any resolution.
In we prove that there exist computable systems for which the statistical long-term behavior (technically the invariant measures) is not computable.
In contrast with the discrete setting, it is of utmost importance to compare the various models of computation over the reals, as well as their associated complexity theories. In particular, we focus on the General Purpose Analog Computer of Claude Shannon , on recursive analysis , on the algebraic approach and on computability in a probabilistic context .
A crucial point for future investigations is to fill the gap between continuous and discrete computational models. This is one deep motivation of our work on computation theories for continuous systems.
The other research direction on dynamical systems we are interested in is the study of properties of adversary systems or programs, i.e. of systems whose behavior is unknown or indistinct, or which do not have classical expected properties. We would like to offer proof and verification tools, to guarantee the correctness of such systems.
On one hand, we are interested in continuous and hybrid systems. In a mathematical sense, a hybrid system can be seen as a dynamical system, whose transition function does not satisfy the classical regularity hypotheses, like continuity, or continuity of its derivative. The properties to be verified are often expressed as reachability properties. For example, a safety property is often equivalent to (non-)reachability of a subset of unsure states from an initial configuration, or to stability (with its numerous variants like asymptotic stability, local stability, mortality, etc ...). Thus we will essentially focus on verification of these properties in various classes of dynamical systems.
We are also interested by rewriting techniques, used to describe dynamic systems, in particular in the adversary context. As they were initially developed in the context of automated deduction, the rewriting proof techniques, although now numerous, are not yet adapted to the complex framework of modelization and programming. An important stake in the domain is then to enrich them to provide realistic validation tools, both in providing finer rewriting formalisms and their associated proof techniques, and in developing new validation concepts in the adversary case, i.e. when usual properties of the systems like, for example, termination are not verified.
For several years, we have been developing specific procedures for property proofs of rewriting, for the sake of programming, in particular with an inductive technique, already applied with success to termination under strategies , , , to weak termination , sufficient completeness and probabilistic termination .
The last three results take place in the context of adversary computations, since they allow for proving that even a divergent program, in the sense where it does not terminate, can give the expected results.
A common mechanism has been extracted from the above works, providing a generic inductive proof framework for properties of reduction relations, which can be parametrized by the property to be proved , . Provided program code can be translated into rule-based specifications, this approach can be applied to correctness proof of software in a larger context.
A crucial element of safety and security of software systems is the problem of resources. We are working in the field of Implicit Computational Complexity. Interpretation based methods like Quasi-interpretations (QI) or sup-interpretations, are the approach we have been developing these last five years, see , , . Implicit complexity is an approach to the analysis of the resources that are used by a program. Its tools come essentially from proof theory. The aim is to compile a program while certifying its complexity.
An anti-virus software based on morphological analysis, Dépôt APP du logiciel MMDEX, 2009, IDDN.FR.001.300033.000.R.P.2009.000.10000
Online disassembler. http
A self-modifying code analyzer coming with an IDA add-on. http
Crême Brûlée is an experimental Javascript dynamic instrumentation engine. http
Guillaume Bonfante and Florian Deloup have shown that the characterization of complexity classes based on interpretations over the integers hold when using interpretations over the reals numbers in .
The ordering KBO has been introduced in the 70's by Knuth and Bendix. Generally speaking any multiply recursive function can be computed by a program with a KBO proof of termination. Guillaume Bonfante and Georg Moser have shown in how to restrain KBO to get LINSPACE, PSPACE and ESPACE.
Comparing characterizations of complexity classes is undecidable in general. In , Guillaume Bonfante shows how transformation of programs can be used to characterize algorithms and not functions.
Characterization of polytime computable functions over streams using interpretations, . A notion of polynomial interpretation of streams (which are second-order objects) is proposed, and polytime computable functions over streams are characterized as those functions which can be computed by a stream program admitting a polynomial interpretation.
A new constructive proof of Birkhoff's ergodic theorem, with as an application a strengthening of former results on random elements: in ergodic systems, random elements eventually reach effective closed setsof positive measure (while it was only known for a more restricted class of sets). The paper is currently submitted.
The statistical properties of a dynamical system are captured by its invariant measures. Given a computable system (a system whose evolution law can be implemented on a computer), we study the problem of computing its invariant measures. We first show that large classes of systems have computable “interesting” invariant measures. Secondly, we construct a computable system having no computable invariant measure. The paper will appear in 2011.
Botnets, that is networks of infected computers, constitute a serious security problem. A lot of effort has been invested towards understanding them better, while developing and learning how to deploy effective counter-measures against them. Their study via various analysis, modeling and experimental methods are integral parts of the development cycle of any such botnet mitigation schemes. It also constitutes a vital part of the process of understanding present threats and predicting future ones. Currently, the most popular of these techniques are “in-the-wild” botnet studies, where researchers interact directly with real-world botnets. This approach brings a lot of problems, including scientific validity, ethical and legal issues. Consequently, we developed an alternative approach employing “in the lab” experiments involving at-scale emulated botnets . We have thus implemented an experiment in which we emulated a close to 3000-node, fully-featured version of the Waledac botnet, complete with a reproduced command and control (C&C) infrastructure . By observing the load characteristics and yield (rate of spamming) of such a botnet, we have drawn interesting conclusions about its real-world operations and design decisions made by its creators. In particular we found that the choices made in term of cryptography by the botmasters was not so bad as we first thought during our in-the-wild study of the botnet: their primary purpose was the botnet performance, it became clear when we were setting up our own version of the botnet.
Furthermore, we conducted experiments where we launched sybil attacks against the botnet . We were able to verify that such an attack is, in the case of Waledac, viable. However, we were able to determine that mounting such an attack is not so simple: high resource consumption can cause havoc and partially neutralize the attack. Finally, we were able to repeat the attack with varying parameters, in an attempt to optimize it. We thus found the optimal parameters for the attack. The merits of this experimental approach is underlined by the fact that it is very difficult to obtain these results by employing other methods.
Olivier Bournez, Daniel Graça and Emmanuel Hainry have
studied the link between undecidability and robustness in
dynamical systems. Indeed, undecidability occurs very easily
in dynamical systems. However there exist good decision
algorithms that work for most systems that are not
pathological. They argue that this decidability trait may be
related to their robustness to infinitesimal noise. They have
proved that in smooth dynamical systems, robustness implies
decidability of the reachability problem and that the
perturbed reachability problem is
Pi01-complete
.
We have further developed our study on behavioral malware detection. We have developed an approach detecting suspicious schemes on an abstract representation of the behavior of a program. Our technique works by abstracting program traces, rewriting given subtraces into abstract symbols representing their functionality. Suspicious behaviors are then detected by comparing trace abstractions to reference malicious behaviors.
Sets of execution traces are represented by trace automata. Abstraction is then computed by rewriting trace automata, with respect to a set of predefined behavior patterns defined as a regular language described by a string rewriting system. Finally, the abstracted trace automaton is intersected with a malware database, composed of abstract signatures representing known malicious behaviors. We have shown that detection works in quadratic time with respect to the size of both trace automaton and signature database.
Traces are captured dynamically by code instrumentation, which allows us to handle packed or self-modifying malware. The expressive power of abstraction allows us to handle general suspicious behaviors rather than specific malware code and then, to detect malware mutations
An implementation has been developed, validating our approach on known malwares like Allaple, Virut and Agent.
We are currently studying self-modifying programs. This programming method is used in different contexts like JIT, packers but also to hide malware. We develop a representation of self-modifying programs by waves. Each wave corresponds to a maximal non-modifying part of a program. We built a tool to automatically extract waves, Tracesurfer, by using code instrumentation. We presented this tool at SSTIC (Symposium sur la Sécurité des Technologies de l'Information et de la Communication)
We have proposed a new approach for studying self-modifying code in packed programs. We have given an original semantics-based technique to simplify dynamic code analysis by flattening, allowing to get a program reconstitution freed from self-modification from a packed program execution trace. It works by data flow analysis of execution traces, based on x86 instruction semantics. Flattening distinguishes traced instructions related to packing features from other ones and portrays a static view of the program, given by a time evolution of its assembly code: the flat view. Flattening has been formalized to a model based on Hybrid-GOTO, a small imperative language, having the main usual instructions. It introduces a functional view of flattening using the notions of self-modification kernel and of instruction evolution.
The approach can be very useful, because once the flattened view computed, it allows classical malware analyzis tools to apply on the program, which is not possible directly on packed programs .
Jean-Yves Marion investigates intrinsic characterizations of well-known complexity classes. He develops a type system for imperative programming language to bound program runtimes. The type system is based on secure information flow typing, which has been widely studied. Intuitively, program variables are classified as either low or high. He wishes to prevent information from flowing from low variables to high variables. He proves non-interference results, which basically say that values of high variables are independent from values of low variables. Thus, he establishes a closed relation with ramified recursion, which is used in implicit computational complexity.
He also deals with declassification in order to delineate a broad class of programs. Finally, He provides a characterization of the class of polynomial time functions.
This result was presented at DICE 2010 workshop and at PICS Concerto meeting as invited talks.
We have no contract with industry. However, we have several relationships with industrial partners like Thales and Netasq and established a lot of others contacts.
Virus (see last year)
Project CyS of GIS 3SGS on smartphone forensics.
EA Cristal(France-Italy)
EA ComputR(France-Portugal)
ARC CaCO3(France-Egypt)
A chapter in a Handbook on IA on “Informatique théorique: décidabilité et complexité”. Jean-Yves Marion co-coordinated this chapter . The contributors are Olivier Bournez, Gilles Dowek, Rémi Gilleron, Serge Grigorieff (Co-coordinator), Simon Perdrix, and Sophie Tison. This 50 page long chapter covers some selected parts of theoretical computer science, mainly from the computability and complexity point of view. The handbook on IA is edited by Pierre Marquis, Odile Papini and Henri Prade.
Guillaume Bonfante:
talk at the ANR-Complice (in may) meeting about course-of-value recursion and complexity classes.
contribution to the work whose aim is to compute the semantics of a sentence (in a natural language) from its syntactical analysis.
Joan Calvet was involved in the following events:
“Modern Malware Reverse Engineering” EkoParty Conference, Buenos Aires (Argentina), September 2010 ( http) The training was targeted for security researchers, exploit writers or reverse engineers looking to learn about the common techniques, tips and tools for analyzing current complex malware.
“The Waledac botnet - its original characteristics, its weaknesses and its end in February 2010!” Association de Sécurité de l'Information du Montréal Métropolitain, Juin 2010, ( http) The purpose of this presentation was to present the Waledac malware family and describe its botnet, plus some attacks that were launched against it.
Emmanuel Hainry:
35th international symposium on Mathematical Foundations of Computer Science - MFCS 2010, in Brno (Czech Republic)[ http]
Visit at Universidade do Algarve, Faro, Portugal in october to work with Daniel Graça on the question of decidability in dynamical systems.
Mathieu Hoyrup participated in the following events:
organization, together with Cristóbal Rojas, of a workshop named “Computability in ergodic theorems” inside the event Dynamics and Computation, Luminy, February 2010.
workshop Logical Approaches to Barriers in Computing and Complexity, Greifswald (Germany), February 2010. The title of the talk was “Randomness and the ergodic decomposition”.
5th Conference on Logic, Computability and Randomness, South Bend (Indiana, US), May 2010.
Franco-Russian workshop on Algorithms, complexity and applications, Moscow, June 2010. The title of the talk was “Randomness, computability and the ergodic decomposition”.
Mathieu Hoyrup is now in charge of the organization of the SSS seminar (temporary website: here).
Jean-Yves Marion:
co-chairman of STACS 2010 and the main organizer. He is editor of a special issue of TOCS.
in the program committee of DICE 2010 and FOPARA 2010.
co-chair of Malware 2010 and the main organizer. He was one of the Panel-talker at Malware on cloud computing security.
invited in Torino and gave a talk at PICS Concerto workshop.
talk at INRIA centre Lille Nord Europe for the local members and local industrial partners of agence nationale de la sécurité des systèmes d'information (ANSII).
selected to participate to the symposium franco-japonais au frontier de la science. This invitation was declined.
participated to the habilitation thesis committee of Pierre Valarcher, U. Paris 12.
participated to the PhD committee of Romain Demangeon (ENS Lyon) and Jonathan Rouzaud-Cornabas (U. Orleans).
one of the visiting committee members of AERES at Caen (Greyc).
together with Matthieu Kaczmarek, Jean-Yves Marion wrote a paper on computer viruses in Pour la Science.
Emmanuel Hainry, Mathieu Hoyrup, Jean-Yves Marion and Romain Péchoux participated to the E-JUST Workshop on Computer Science Research, Alexandria, October 2010. They gave the following talks:
Emmanuel Hainry: “Robust computations with dynamical systems”
Mathieu Hoyrup: “Decidability over infinite sequences”
Jean-Yves Marion: “Ramified information flow”
Romain Péchoux: “Interpretation of stream programs”
Invited researchers:
John Case(University of Delaware) was invited 2 weeks in July by Jean-Yves Marion.
Florian Deloup from the department of mathematics of the University Paul Sabatier - Toulouse III was invited for one week in february. Guillaume Bonfante and Florian Deloup initiated some new research on the topological theory of finite state automata.
Walid Gomaa, from the University of Alexandria, during 3 weeks in July and one week in October. We work on computational complexity over the real numbers and more generally over mathematical structures from analysis. We submitted an Imhotep project (the franco-egyptian Partenariat Hubert Curien) to continue this collaboration in 2011.
Daniel Graça, Universidade do Algarve, Faro, Portugal, came one week in May.
Diffusion in the media.
Interview for Interstice: http.
Interview for France Culture (10/10/2010) http.
Press on the new results : Setting up ... with Joan Calvet. atelier.net lemondeinformatique.fr
and an interview on RTV Suisse.
Press on LHS opening: La Croix, Science et Vie Junior, le Figaro, les Échos, la Tribune, France Info, Micro Hebdo.
Guillaume Bonfante is giving the following courses at Ecole des Mines:
theory of computing for second year students,
security of information systems for third year students,
model driven architecture for second year students,
java programming for first year students.
Isabelle Gnaedig is coordinator of the course on “Design of Safe Software” at ESIAL, 3rd year. In this context, she also gave courses and supervised practical works on “Rule-based Programming”. She is also part of the ESIAL admission committee.
Emmanuel Hainry is teaching courses on operating systems, algorithmics, object programming, functional programming and databases at IUT Nancy-Brabois.
Jean-Yves Marion is giving the following courses:
Computer security for third year students,
Theoretical computer science for second year students,
Logic for computer scientists for second year students.
Jean-Yves Marion gave a course (3 hours) at Science Po, Paris on computer security.
Romain Péchoux teaches at Université Nancy 2 the following courses:
Préparation au c2i en L1,
Langage de programmation orienté objet en L3 MIAGE,
Java avancé en M1 MIAGE,
Base de données en L3 IAE,
UML en formation continue.