The Spades project-team aims at contributing to meet the challenge of
designing and programming dependable embedded systems in an
increasingly distributed and dynamic context. Specifically, by
exploiting formal methods and techniques, Spades aims to answer three
key questions:
These questions above are not new, but answering them in the context
of modern embedded systems, which are increasingly distributed, open
and dynamic in nature 55, makes them more
pressing and more difficult to address: the targeted system properties
– dynamic modularity, time-predictability, energy efficiency, and
fault-tolerance – are largely antagonistic (e.g., having a highly
dynamic software structure is at variance with ensuring that resource
and behavioral constraints are met). Tackling these questions
together is crucial to address this antagonism, and constitutes a key
point of the Spades research program.
A few remarks are in order:
The SPADES research program is organized around three main themes,
Design and Programming Models, Certified real-time
programming, and Fault management and causal analysis, that
seek to answer the three key questions identified in
Section 2. We plan to do so by developing and/or
building on programming languages and techniques based on formal
methods and formal semantics (hence the use of “sound
programming” in the project-team title). In particular, we seek to
support design where correctness is obtained by construction, relying
on proven tools and verified constructs, with programming languages
and programming abstractions designed with verification in mind.
Work on this theme aims to develop models , languages and tools to support a “correct-by-construction” approach to the development of embedded systems.
On the programming side, we focus on the definition of domain specific programming models and languages supporting static analyses for the computation of precise resource bounds for program executions. We propose dataflow models supporting dynamicity while enjoying effective analyses. In particular, we study parametric extensions and dynamic reconfigurations where properties such as liveness and boundedness remain statically analyzable.
On the design side, we focus on the definition of component-based models
for software architectures combining distribution, dynamicity, real-time and fault-tolerant
aspects.
Component-based construction has long been advocated as a key approach
to the “correct-by-construction” design of complex embedded
systems 43. Witness component-based toolsets such
as Ptolemy 33, BIP 26, or
the modular architecture frameworks used, for instance, in the
automotive industry (AUTOSAR) 24. For building large,
complex systems, a key feature of component-based construction is the
ability to associate with components a set of contracts, which
can be understood as rich behavioral types that can be composed and
verified to guarantee a component assemblage will meet desired
properties.
Formal models for component-based design are an active area of
research. However, we are
still missing a comprehensive formal model and its associated
behavioral theory able to deal at the same time with different
forms of composition, dynamic component structures, and quantitative
constraints (such as timing, fault-tolerance, or energy consumption).
We plan to develop our component theory by progressing on two fronts:
a semantical framework and domain-specific programming models.
The work on the semantical framework should, in the longer term,
provide abstract mathematical models for the more operational and
linguistic analysis afforded by component calculi. Our work on
component theory will find its application in the development of a
Coq-based toolchain for the certified design and construction of
dependable embedded systems, which constitutes our first main
objective for this axis.
Programming real-time systems (i.e., systems whose correct behavior
depends on meeting timing constraints) requires appropriate languages
(as exemplified by the family of synchronous
languages 28), but also the support of
efficient scheduling policies, execution time and schedulability
analyses to guarantee real-time constraints (e.g., deadlines) while
making the most effective use of available (processing, memory, or
networking) resources. Schedulability analysis involves analyzing the
worst-case behavior of real-time tasks under a given scheduling
algorithm and is crucial to guarantee that time constraints are met in
any possible execution of the system. Reactive programming and
real-time scheduling and schedulability for multiprocessor systems are
old subjects, but they are nowhere as mature as their uniprocessor
counterparts, and still feature a number of open research
questions 25, 32, in particular in
relation with mixed criticality systems. The main goal in this theme
is to address several of these open questions.
We intend to focus on two issues: multicriteria scheduling on
multiprocessors, and schedulability analysis for real-time
multiprocessor systems. Beyond real-time aspects, multiprocessor
environments, and multicore ones in particular, are subject to several
constraints in conjunction, typically involving real-time,
reliability and energy-efficiency constraints, making the scheduling
problem more complex for both the offline and the online
cases. Schedulability analysis for multiprocessor systems, in
particular for systems with mixed criticality tasks, is still very
much an open research area.
Distributed reactive programming is rightly singled out as a major open issue in the recent, but heavily biased (it essentially ignores recent research in synchronous and dataflow programming), survey by Bainomugisha et al. 25. For our part, we intend to focus on devising synchronous programming languages for distributed systems and precision-timed architectures.
Managing faults is a clear and present necessity in networked embedded systems. At the hardware level, modern multicore architectures are manufactured using inherently unreliable technologies 29, 41. The evolution of embedded systems towards increasingly distributed architectures highlighted in the introductory section means that dealing with partial failures, as in Web-based distributed systems, becomes an important issue.
In this axis we intend to address the question of how to cope
with faults and failures in embedded systems? We will tackle this
question by exploiting reversible programming models and by developing
techniques for fault ascription and explanation in component-based
systems.
A common theme in this axis is the use and exploitation of causality
information. Causality, i.e., the logical dependence of an effect on a
cause, has long been studied in disciplines such as
philosophy 49, natural sciences,
law 50, and statistics 51, but it
has only recently emerged as an important focus of research in
computer science. The analysis of logical causality has applications
in many areas of computer science. For instance, tracking and
analyzing logical causality between events in the execution of a
concurrent system is required to ensure
reversibility 46, to allow the diagnosis of faults
in a complex concurrent system 42, or to enforce
accountability 44, that is, designing systems in
such a way that it can be determined without ambiguity whether a
required safety or security property has been violated, and why. More
generally, the goal of fault-tolerance can be understood as being to
prevent certain causal chains from occurring by designing systems such
that each causal chain either has its premises outside of the fault
model (e.g., by introducing redundancy 36), or is
broken (e.g., by limiting fault propagation 54).
Our applications are in the embedded system area, typically: transportation, energy production, robotics, telecommunications, the Internet of things (IoT), systems on chip (SoC). In some areas, safety is critical, and motivates the investment in formal methods and techniques for design. But even in less critical contexts, like telecommunications and multimedia, these techniques can be beneficial in improving the efficiency and the quality of designs, as well as the cost of the programming and the validation processes.
Industrial acceptance of formal techniques, as well as their
deployment, goes necessarily through their usability by specialists of
the application domain, rather than of the formal techniques
themselves. Hence, we are looking to propose domain-specific (but
generic) realistic models, validated through experience (e.g., control
tasks systems), based on formal techniques with a high degree of
automation (e.g., synchronous models), and tailored for concrete
functionalities (e.g., code generation).
We also consider the development of formal tools that can certify the result of industrial applications (see e.g., CertiCAN in Sec. 7.2.2).
Regarding applications and case studies with industrial end-users of our techniques, we cooperate with Orange Labs on software architecture for cloud services. We also collaborate with RTaW regarding the integration of our CAN-bus analysis certifier (CertiCAN) in the RTaW-Pegase program suite.
We have not yet computed the footprint of our research activities in 2022. A tool designed in collaboration with Labo 1.5 should be available next year. At the time being, it allows us to compute the carbon footprint due to individual travels (both professional and commute travels), due to new hardware, but not the footprint due to our share of the Inria services (data centers, networks, ...) nor the building usage. Finally, only the carbon footprint can be computed, but not the usage of other resources (water, raw materials) or the resulting pollution (impact on the bio-diversity).
Our research on certification and fault-tolerance aims at making embedded systems safer. Certified systems tend also to be simpler, less depending on updates and therefore less prone to obsolescence. A potential major application of causality analysis is to help establish liability for accidents caused by software errors.
On the other hand, this type of research may contribute to make more acceptable or even to promote many problematic systems such as IoT, drones, avionics, autonomous vehicles, ... with a potential strong negative environmental impact.
Sophie Quinton and Éric Tannier (from the BEAGLE team in Lyon), with the help of many colleagues, including some in the SPADES team, have set up a series of one-day workshops called “Ateliers SEnS” (for Sciences-Environnements-Sociétés), which offer a venue for members of the research community (in particular, but not limited to, researchers) to reflect on the social and environmental implications of their research. Around 50 Ateliers SEnS have taken place so far, all across France and beyond INRIA and the computer science field. Participants to a workshop can replicate it, and quite a few have already done so. SPADES organized its own Atelier SEnS in 2022. Sophie Quinton has facilitated 8 Ateliers SEnS in 2022.
Research into the connection between ICT (Information and Communication Technologies) and the environmental crisis has started in 2020 within the SPADES team, see Section 7.4.
The multiplication of models, languages, APIs and tools for cloud and network configuration management raises heterogeneity issues that can be tackled by introducing a reference model. A reference model provides a common basis for interpretation for various models and languages, and for bridging different APIs and tools. The Cloudnet Computational Model formally specifies, in the Alloy specification language, a reference model for cloud configuration management. The Cloudnet software formally interprets several configuration languages in it, including the TOSCA configuration language, the OpenStack Heat Orchestration Template and the Docker Compose configuration language.
The use of the software shoes, for examples, how the Alloy formalization allowed us to discover several classes of errors in the OpenStack HOT specification.
Application of the Cloudnet model developed by Inria to software network deployment and reconfiguration description languages.
The Cloudnet model allows syntax and type checking for cloud configuration templates as well as their visualization (network diagram, UML deployment diagram). Three languages are addressed for the moment with the modules:
* Cloudnet TOSCA toolbox for TOSCA inncluding NFV description * cloudnet-hot for HOT (Heat Orchestration Template) from OpenStack * cloudnet-compose for Docker Compose
We can use directly the software from an Orange web portal: https://toscatoolbox.orange.com
Dataflow Models of Computation (MoCs) are widely used in embeddedsystems, including multimedia processing, digital signal processing, telecommunications, and automatic control. One of the first and most popular dataflow MoCs, Synchronous Dataflow (SDF), provides static analyses to guarantee boundedness and liveness, which are key properties for embedded systems. However, SDF and most of its variants lacks the capability to express the dynamism needed by modern streaming applications.
For many years, the Spades team has been working on more expressive and dynamic models that nevertheless allow static analyses for boundedness and liveness. We have proposed several parametric
dataflow models of computation (MoCs) (SPDF 35 and BPDF 27), we have
written a survey providing a comprehensive description of the existing
parametric dataflow MoCs 30, and we have studied
symbolic analyses of dataflow graphs 31. We have proposed an original method to deal with lossy
communication channels in dataflow graphs 34.
More recently, we have studied models allowing dynamic reconfigurations
of the topology of the dataflow graphs. This is required by
many modern streaming applications that have a strong need for
reconfigurability, for instance to accommodate changes in the input
data, the control objectives, or the environment.
We have proposed a new MoC called Reconfigurable Dataflow
(RDF) 3. RDF extends SDF with
transformation rules that specify how the topology and actors of
the graph may be reconfigured. Starting from an initial RDF graph
and a set of transformation rules, an arbitrary number of
new RDF graphs can be generated at runtime. The major
feature and advantage of RDF is that it can be statically analyzed
to guarantee that all possible graphs
generated at runtime will be connected, consistent, and live. To the
best of our knowledge, RDF is the only dataflow MoC allowing an
arbitrary number of topological reconfigurations while remaining
statically analyzable.
We have also worked on a practical way to integrate statically parameterized SDF graphs into the PREESM tool 18. While this work does not provide a theoretical analysis of the evolution of the graph according to the parameters as in SPDF, it supports a wider range of parameterized expressions including complex mathematic operations. On the few tested applications, we have shown that a mere design-space exploration on a subset of all possible parameter configurations was providing reasonable approximations of the impact of each parameter on throughput, latency and energy.
Up to now, the main application domain of dataflow models has been streaming multimedia applications but they also seem particularly well suited to the efficient implementation of neural networks. We have started an exploratory action (see Section 9.2) to study the potential of dataflow MoCs for the implementation of neural networks. We expect advances in the form of a better time efficiency and a lower memory and energy consumption.
We are currently working on the reduction of the memory footprint of tasks graphs scheduled on unicore processors. This is motivated by the fact that some recent neural networks such as GPT-3, seen as tasks graphs, use too much memory and cannot fit on a single GPU. Using several exact and heuristic techniques, we are able to produce schedules that minimize the memory requirement of a sequential dataflow application, and therefore tasks graphs as well. Another technique used by memory greedy neural networks is activity and gradient checkpointing (a.k.a. rematerialization) which recompute intermediate values rather than keeping them in memory. We are now studying rematerialization in the more general dataflow context.
In 2022, we have completed our work on a very general model of real-time systems, made of a single-core processor equipped with DVFS and an infinite sequence of preemptive real-time jobs. Each job inter-arrival time between actual size of relative deadline of non-clairvoyant, meaning that, at release time, statistical information on the jobs' characteristics: release time, AET, and relative deadline.
In this context, we have proposed a Markov Decision Process (MDP) solution to compute the optimal online speed policy guaranteeing that each job completes before its deadline and minimizing the energy consumption. To the best of our knowledge, our MDP solution is the first to be optimal. We have also provided counter examples to prove that the two previous state of the art algorithms, namely OA 56 and PACE 47, are both sub-optimal. Finally, we have proposed a new heuristic online speed policy called Expected Load (EL) that incorporates an aggregated term representing the future expected jobs into a speed equation similar to that of OA. A journal paper is currently under review.
Simulations show that our MDP solution outperforms the existing online solutions (OA, PACE, and EL), and can be very attractive in particular when the mean value of the execution time distribution is far from the WCET.
This was the topic of Stephan Plassart's PhD 5237, 39, 38, funded by the Caserm Persyval project, who defended his PhD in June 2020.
We contribute to
Prosa 23, a Coq library of reusable concepts and proofs
for real-time systems analysis. A key scientific challenge is to
achieve a modular structure of proofs, e.g., for response time
analysis. Our goal is to use this library for:
the certification of (results of) existing analysis techniques or tools.
In the recent past, we have developed CertiCAN, a tool produced using the Coq proof assistant, allowing the formal certification of CAN bus analysis results. CertiCAN is able to certify the results of industrial CAN analysis tools, even for large systems. We have described this work in a long journal article to appear 11.
We have completed our work on a formal connection between Network Calculus (NC) and Response Time Analysis (RTA) 19. This enables specialists of both formalisms to get increased confidence in their models (or to discover errors, as has happened). The presented mathematical results are all mechanically checked with the interactive theorem prover Coq, building on existing formalizations of RTA (namely Prosa) and NC (namely NCCoq). Establishing such a link between NC and RTA paves the way for improved real-time analyses obtained by combining both theories to enjoy their respective strengths (e.g., multicore analyses for RTA or clock drifts for NC).
The work on the formalization in Prosa of Compositional Performance Analysis is still ongoing.
Model-Based Diagnosis of discrete event systems (DES) usually aims at
detecting failures and isolating faulty event occurrences based on a
behavioural model of the system and an observable execution log. The
strength of a diagnostic process is to determine what
happened that is consistent with the observations. In order to go a
step further and explain why the observed outcome occurred,
we borrow techniques from causal analysis. We are currently exploring techniques that are able to extract, from an execution trace, the causally relevant part for a property
violation.
In particular, as part of the SEC project, we are investigating how such techniques can be extended to classes of hybrid systems. As a first result we have studied the problem of explaining faults in real-time systems 48. We have provided a formal definition of causal explanations on dense-time models, based on the well-studied formalisms of timed automata and zone-based abstractions. We have proposed a symbolic formalization to effectively construct such explanations, which we have implemented in a prototype tool. Basically, our explanations identify the parts of a run that move the system closer to the violation of an expected safety property, where safe alternative moves would have been possible.
We have recently generalized the work of 48 and defined robustness functions as a family of mappings from system states to a scalar that, intuitively, associate with each state its distance to the violation of a given safety requirement, e.g., in terms of the remaining number of bad system moves or of the time remaining to react. An explanation then summarizes the portions of the execution on which robustness decreases.
However, as our instantiation of robustness in 48 is defined on a discrete abstraction, robustness may decrease in discrete steps once some timing threshold is crossed, thus exonerating the preceding absence of action. We are currently working on a truly hybrid definition of robustness functions that “anticipate” such thresholds, hence ensuring a smooth decrease indicating early when a dangerous event is approaching.
As part of the DCore project on causal debugging of concurrent programs, the goal of Aurélie Kong Win Chang's PhD thesis is to investigate the use of abstractions to construct causal explanations for Erlang programs. We are interested in developing abstractions that "compose well" with causal analyses, and understanding precisely how explanations found on the abstraction relate to explanations on the concrete system. It is worth noting that the presence of abstraction, which inherently comes with some induction and extrapolation processes, completely recasts the issue of reasoning about causality. Causal traces do no longer describe only potential scenarios in the concrete semantics, but also mix some approximation steps coming from the computation of the abstraction itself. Therefore, not all explanations are replayable counter-examples: they may contain some steps witnessing some lack of accuracy in the analysis. Vice versa, a research question to be addressed is how to define causal analyses that have a well understood behavior under abstraction.
We are currently working on a formalization of an abstract Erlang semantics that allows for a finite abstraction while still accounting for the exchanges of messages and signals between processes.
Concurrent and distributed debugging is a promising application of the notion of reversible computation 40. As part of the ANR DCore project, we have contributed to the theory behind, and the development of the CauDEr reversible debugger for the Erlang programming language. In 14, we have shown how to automate using the Maude rewriting logic environment the generation of a reversible semantics for a concurrent program such as Erlang, in effect implementing in Maude the theory developped by Lanese and Medic 45. In the same paper we have also shown how to automatically generate the semantics of the imperative rollback primitive which is at the core of the CauDEr debugger. In 15, we have extended the reversible semantics of Erlanf and CauDEr to take into account of imperative constructs in Erlang allowing Erlang processes to access a shared map of process names and process identifiers. This is an first instance of dealing with reversibility in presence of a form of shared memory.
We have also started this year two novel threads of activity: studying reversibility for distributed programs and studying reversibilty for concurrent programs based on shared memory. For the time being we only have preliminary results on these two threads. On the first one, we have devised a small distributed process calculus featuring located processes with location and link failures with recovery and are currently working on its behavioral theory. This small calculus is intended as a reasonably faithful abstraction of the behavior of Erlang systems in presence of node and link failures. On the second thread, we have introduced a modular operational framework for the definition of shared memory concurrency models, and shown that we can capture in our framework several forms of weak memory models, including models of transactional memory.
Digital technologies are often presented as a powerful ally in the fight against climate change (see e.g., the discourse around the “convergence of the digital and the ecological transitions”).
In 9, we have detailed limitations of state of the art assessments of such claims. First, most papers do not provide enough details on the scenarios underlying their evaluations: which hypotheses they are based on and why, and why specific scenarios are chosen as the baseline. This is a key point because it may lead to overestimating the current or potential benefits of digital solutions. Second, results are rarely discussed in the context of global strategies for
greenhouse gas (GHG) emissions reduction. These leaves open how the proposed
technologies would fit into a realistic plan for meeting current GHG
reduction goals. To overcome the underlined limitations, we propose a set of guidelines that all studies on digital solutions for mitigating GHG emissions should satisfy, point out overlooked research directions, and provide concrete examples and initial results for the specific case of ridesharing.
We are now working on estimating the potential of ridesharing as a solution for reducing the GHG emissions of commuting.
During her short internship in the group, Ludmila Courtillat-Piazza addressed the issue of how to choose a research topic in computer science taking into account the systemic nature of the environmental impact of ICT. In addition to a literature review, she experimented with a case study on web design.
The SPADES team has started working together on a project proposal to investigate the current role played by ICT in the Anthropocene as well as new approaches to their design. We have identified the following main challenges: How do local measures meant to reduce the environmental impact of ICT relate (or not) to global effects? What can we learn from, and what are the limits of, current quantitative approaches for environmental impact assessment and their use for public debate and policy making? Which criteria could/should we take into account to design more responsible computer systems (other than efficiency, which is already well covered and subject to huge rebound effects in the case of digital technologies)? To come up with a solid research agenda, we are thus studying the state of the art of many new topics, including STS (Science and Technology Studies), low tech software and hardware, lifecyle assessment, (digital) commons... A new network of collaborations is also in the making, in particular with colleagues from social sciences.
RT-proofs is an ANR/DFG project between Inria, MPI-SWS, Onera, TU Braunschweig and Verimag, running from 2018 until 2022.
The overall objective of the RT-proofs project was to lay the foundations for computer-assisted formal verification of timing analysis results. More precisely, the goal was to provide:
The results obtained in 2022 in connection with the RT-proofs project are described in Section 7.2.2.
DCore is an ANR project between Inria project teams Antique, Focus and Spades, and the Irif lab, running from 2019 to 2024.
The overall objective of the project is to develop a semantically
well-founded, novel form of concurrent debugging, which we call causal debugging, that aims to alleviate the deficiencies of
current debugging techniques for large concurrent software systems.
The causal debugging technology developed by DCore will comprise and
integrate two main novel engines:
LiberAbaci is a project between Inria project teams
Cambium, Camus, Gallinette, Spades, Stamp, Toccata, and the Laboratoire d’Informatique de Paris-Nord.
The overall objective is to study how one could use the Coq proof assistant in a Mathematical course in the University to help teaching proofs.
At Spades, Martin Bodin is working with IREM de Grenoble to involve math teachers and didactic researchers to the project.
The DF4DL action is funded by Inria's DGDS. It aims at exploring the use of the dataflow model of computation to better program deep neural networks, in particular a variant called dynamic neural networks. As a first step, we have studied the problem of minimizing the peak memory requirement for the execution of a dataflow graph. This is of paramount importance for deep neural networks since the largest ones cannot fit on a single core due to their very high memory requirement. It happens that this was an open problem since 1995 53, and the new algorithms we proposed in 2022 have allowed us to significantly lower the memory peak as well as the compute time required to find it. For instance, we managed to lower the optimal memory peak of the satellite benchmark in 53 from 1.920 units (obtained after 4 days of compute time) to 1.680 units (obtained after 10 milliseconds).