Languages, compilers, and run-time systems are some of the most important components to bridge the gap between applications and hardware.
With the continuously increasing power of computers, expectations are evolving, with more and more ambitious, computational intensive and complex applications.
As desktop PCs are becoming a niche and servers mainstream, three categories of computing impose themselves for the next decade:
mobile, cloud, and super-computing.
Thus diversity, heterogeneity (even on a single chip) and thus also hardware virtualization is putting more and more pressure both on compilers and run-time systems.
However, because of the energy wall, architectures are becoming more and more complex and parallelism ubiquitous at every level.
Unfortunately, the memory-CPU gap continues to increase and energy consumption remains an important issue for future platforms.
To address the challenge of performance and energy consumption raised by silicon companies, compilers and run-time systems must evolve and, in particular, interact, taking into account the complexity of the target architecture.
The overall objective of Corse is to address this challenge by combining static and dynamic compilation techniques, with more interactive embedding of programs and compiler environment in the run-time system.
One of the characteristics of Corse is to base our researches on diverse advanced mathematical tools.
Compiler optimization requires the usage of the several tools around discrete mathematics:
combinatorial optimization, algorithmic, and graph theory.
The aim of Corse is to tackle optimization not only for general purpose but also for domain specific applications.
In addition to run-time and compiler techniques for program instrumentation, hybrid analysis and compilation advances will be mainly based on polynomial and linear algebra.
The other specificity of Corse is to address technical challenges related to compiler technology, run-time systems, and hardware characteristics.
This implies mastering the details of each.
This is especially important as any optimization is based on a reasonably accurate model.
Compiler expertise will be used in modeling applications (e.g. through automatic analysis of memory and computational complexity);
Run-time expertise will be used in modeling the concurrent activities and overhead due to contention (including memory management);
Hardware expertise will be extensively used in modeling physical resources and hardware mechanisms (including synchronization, pipelines, etc.).
The core foundation of the team is related to the combination of static and dynamic techniques, of compilation, and run-time systems. We believe this to be essential in addressing high-performance and low energy challenges in the context of new important changes shown by current application, software, and architecture trends.
Our project is structured along three main directions. The first direction belongs to the area of program analysis and optimization. This direction breaks down into:
The second direction belongs to the area of runtime monitoring, verification, and enforcement. This direction breaks into:
The third direction belongs to the area of teaching and tutoring of programming. This direction breaks into:
The main industrial sector related to the research activities of Corse is the one of semi-conductor (programmable architectures spanning from embedded systems to servers).
Obviously any computing application which has the objective of exploiting as much as possible the resources (in terms of high-performance but also low energy consumption) of the host architecture is intended to take advantage of advances in compiler and run-time technology.
These applications are based over numerical kernels (linear algebra, FFT, convolution...) that can be adapted on a large spectrum of architectures.
More specifically, an important activity concerns the optimization of machine learning applications for some high-performance accelerators.
Members of Corse already maintain fruitful and strong collaborations with several companies such as STMicroelectronics, Atos/Bull, Orange, Kalray.
As expected, after the COVID pandemia, team members kept travel activities quite low compared to before the pandemia.Whenever long distance meetings (such as conference PC) could be done virtually, travel have been avoided. Also, team members try to better use existing hardware instead of replacing them (buying new ones).
Because of takeback effect, improving efficiency does not necessarily improve environmental impact. It is thus crucial to think how our community can have actual impact on sustainable computing, that is, influence better design ("R" friendly) and better usage (consume less) of our compute resources. For this purpose, we organize panels with the objective of sensitize our community to this important problem. We expect some of our future research projects to address the challenge of sustainable computing without just focusing on energy efficiency but by considering the global systemic impact as much as possible instead.
The main two challenges of sustainable computing are:
Compiler analysis, programming infrastructure, hardware modeling, teaching tools, HIM, etc. are at the heart of those challenges.
After humongous efforts, the SSA book is finally available. Twelve years were necessary to give birth to this book, composed of 24 chapters and written by 31 authors.
It provides readers with a single-source reference to static-single assignment (SSA)-based compiler design. It is the first (and up to now only) book that covers in a deep and comprhensive way how an optimizing compiler can be designed using the SSA form. After introducing vanilla SSA and its main properties, the authors describe several compiler analyses and optimizations under this form. They illustrate how compiler design can be made simpler and more efficient, thanks to the SSA form. This book also serves as a valuable text/reference for lecturers, making the teaching of compilers simpler and more effective. Coverage also includes advanced topics, such as code generation, aliasing, predication and more, making this book a valuable reference for advanced students and practicing engineers.
Pipedream reverse engineers the following performance characteristics: (1) Instruction latency – The number of cycles an instruction requires to execute. (2) Peak micro-op retirement rate – How many fused micro-ops the CPU can retire per cycle. (3) Micro-fusion – The number of fused micro-ops an instruction decomposes into. (4) Micro-op decomposition and micro-op port usage – The list of unfused micro-ops every instruction decomposes into and the list of execution ports every one of these micro-ops can execute on.
The first step of the reverse engineering process consists of generating a number of microbenchmarks. Pipedream then runs these benchmark, measuring their performance using hardware counters. The latency, throughput, and micro-fusion of different instructions can then be read directly from these measurements.
The process of finding port mappings, i.e. micro-op decompositions and micro-op port usage, however, is more involved. For this purpose, we have defined a variation of the maximum flow problem which we call the "instruction flow problem". We have developed a linear program (LP) formulation of the instruction flow problem which can be used to calculate the peak IPC and micro-operations per cycle (MPC) a benchmark kernel can theoretically achieve with a given port mapping. The actual port mapping of the underlying hardware is then determined by finding the mapping for which the throughput predicted by instruction flow best matches the actual measured IPC and MPC.
Our current efforts with regard to code optimization follows two directions.
Tensor computation such as Sparse Matrix Multi-vector multiplication, Sampled Dense Dense Matrix Multiplication, Dense Matrix Multiplication, Tensor Contraction, Convolution are important kernels used in many domains like Fluid Dynamics, Data Analytics, Economic Modelling, and Machine Learning. Developing highly optimized code for such kernels requires the combination of highly tuned register/instruction level micro-kernels and appropriate multi-level tiling. In this context we developed an hybrid (analytical/statistical) performance-based optimization scheme along with a code generator for DNNs.
Addressing the problem of automatic generation of optimized operators raises two challenges: The first is associated to the design of a domain specific code generation framework able to output high-quality binary code. The second is to carefully bound the search space and choose an optimizing objective function that neither leads to yet another combinatorial optimizing problem, nor leads to a too approximate performance objective. This work tackles those two challenges by: 1. revisiting the usual belief that packing should enable stride-1 accesses at every level allowing to make packing optional; 2. highlighting the importance of considering the packing decision and shape as being part of the optimization problem; 3. revisiting the usual belief that register spilling should be avoided if possible allowing to consider other (more packing friendly) micro-kernels as good candidates; 4. revisiting the misleading intuition that convolution dimensions should be brought at the innermost level allowing more freedom for memory reuse at outer-dimensions; 5. showing that the optimization problem can be decoupled into: finding a small set of good micro-kernels candidates using an exhaustive search; finding a good schedule (loop tiling/permutation) and associated packing using operational research; finding the best tiles sizes using auto-tuning; 6. designing a single-pass micro-kernel generation algorithm, to emit code for any choice of register blocking dimensions, unrolling factor, and packing decisions; 7. designing a lowering scheme for abstract iterators, compatible with diverse packing and tiling strategies thrifty with integer arithmetic and loop control usage; 8. designing a packing algorithm compatible with various choices of transposition and subviews; 9. implementing a code generator based on these algorithms, driven by a simple and modular configuration language.
Part of this work lead to a paper that has been accepted for publication at ACM TACO 1 and that will be presented at HiPEAC 2023.
Performance modeling is a critical component for program optimizations, assisting compilers as well as developers in predicting the performance of code variations ahead of time. Performance models can be obtained through different approaches that span from precise and complex simulation of a hardware description (Zesto, GEM5, PTLSim) to application level analytical formulations. An interesting approach for modeling the CPU of modern pipelined, super-scalar, out-of-order processors trades simulation time with accuracy by separately characterizing both latency and throughput of instructions. This approach is suitable both for optimizing compilers, but also for hand-tuning critical kernels written in assembler (see Section 8.1.1). It is used by performance-analysis tools such as CQA, Intel IACA, OSACA, MIAMI or llvm-mca. Cycle-approximate simulators such as ZSim or MCsimA can also take advantage of such an instruction characterization. In this context, we developed two tools: PALMED and GUS (see new software section).
This work has been done in the context of the European project CPS4EU (see Section 9.3.1).
This section overviews our ongoing efforts on the topics of runtime monitoring, verification, and enforcement. More specifically, our work can be categorized into the following topics:
We use runtime verification (RV) to check various specifications in a smart apartment. The specifications can be broken down into three types: behavioral correctness of the apartment sensors, detection of specific user activities (known as activities of daily living), and composition of specifications of the previous types. The context of the smart apartment provides us with a complex system with a large number of components with two different hierarchies to group specifications and sensors: geographically within the same room, floor or globally in the apartment, and logically following the different types of specifications. We leverage a recent approach to decentralized RV of decentralized specifications, where monitors have their own specifications and communicate together to verify more general specifications. We leverage the hierarchies, modularity and re-use afforded by decentralized specifications to: (1) scale beyond existing centralized RV techniques, and (2) greatly reduce computation and communication costs.
Kotlin was introduced to Android as the recommended language for development. One of the unique functionalities of Kotlin is that of co-routines, which are lightweight tasks that can run concurrently inside threads. Programming using co-routines is difficult, among other things, because they can move between threads and behave unexpectedly. We introduce runtime verification in Kotlin. We provide a language to write properties and produce runtime monitors tailored to verify Kotlin co-routines. We identify, formalize and runtime verify seven properties about common runtime errors that are not easily identifiable by static analysis. To demonstrate the acceptability of the technique in real applications, we apply our framework to an in-house Android app and micro-benchmarks and measure the execution time and memory overheads.
This work has been published in RV 10.
Ensuring the correctness of distributed cyber-physical systems can be done at runtime by monitoring properties over their behavior. In a decentralized setting, such behavior consists of multiple local traces, each offering an incomplete view of the system events to the local monitors, as opposed to the standard centralized setting with a unique global trace. We introduce the first monitoring framework for timed properties described by timed regular expressions over a distributed network of monitors. First, we define functions to rewrite expressions according to partial knowledge for both the centralized and decentralized cases. Then, we define decentralized algorithms for monitors to evaluate properties using these functions, as well as proofs of soundness and eventual completeness of said algorithms. Finally, we implement and evaluate our framework on synthetic timed regular expressions, giving insights on the cost of the centralized and decentralized settings and when to best use each of them.
In this work, we leverage static verification to reduce monitoring overhead when runtime verifying a property. We present a sound and efficient analysis to statically find safe execution paths in the control flow at the intra-procedural level of programs. Such paths are guaranteed to preserve the monitored property and thus can be ignored at runtime. Our analysis guides an instrumentation tool to select program points that should be observed at runtime. The monitor is left to perform residual runtime verification for parts of the program that the analysis could not statically prove safe. Our approach does not depend on dataflow analysis, thus separating the task of residual analysis from static analysis; allowing for seamless integration with many RV frameworks and development pipelines. We implement our approach within BISM, which is a recent tool for bytecode-level instrumentation of Java programs. Our experiments on the DaCapo benchmark show a reduction in instrumentation points by a factor of 2.5 on average (reaching 9), and accordingly, a reduction in the number of runtime events by a factor of 1.8 on average (reaching 6).
This work has been published in VSTT 8.
DECENT is a benchmark for evaluating decentralized enforcement. It implements two enforcement algorithms that differ in their strategy for correcting the execution: the first one explores all alternatives to perform a globally optimal correction, while the second follows an incremental strategy based on locally optimal choices. Decent allows comparing these algorithms with a centralized enforcement algorithm in terms of computational metrics and metrics for decentralized monitoring such as the number and size of messages or the required computation on each component. Our experiments show that (i) the number of messages sent and the internal memory usage is much smaller with decentralized algorithms (ii) the locally optimal algorithm performs closely to the globally optimal one.
This work has been published in RV 11.
Runtime Enforcement (RE) is a monitoring technique to ensure that a system obeys a set of formal requirements (properties). RE employs an enforcer (a safety wrapper for the system) which modifies the (untrustworthy) output by performing actions such as delaying (by storing/buffering) and suppressing events, when needed. In this work, to handle practical applications with memory constraints, we propose a new RE paradigm where the memory of the enforcer is bounded/finite. Besides the property to be enforced, the user specifies a bound on the enforcer memory. Bounding the memory poses various challenges such as how to handle the situation when the memory is full, how to optimally discard events from the buffer to accommodate new events and let the enforcer continue operating. We define the bounded-memory RE problem and develop a framework for any regular property. The proposed framework is implemented and its performance evaluated via some examples from application scenarios indicates that the enforcer has reasonable execution time overhead.
Industrial automation is a complex process involving various stakeholders. The international standard IEC 61499 helps to specify distributed automation using a generic architectural model, targeting the technical development of the automation. However, analyzing the correctness of IEC 61499 models remains a challenge because of their informal semantics and distributed logic. We propose new verification techniques for IEC 61499 applications. These techniques rely on the concept of runtime enforcement, which can be applied to systems for preventing bad behaviors from happening. The main idea of our approach is to integrate an enforcer in the application for allowing it to respect specific properties when executing. The techniques begin with the definition of a property. The language of this property supports features such as discarding and replacing events. Next, this property is used to synthesize an enforcer in the form of a function block. Finally, the synthesized enforcer is integrated into the application. Our approach is illustrated on a realistic example and fully automated.
This work has been published in SEFM 7.
In this work, we present an extension of the Java bytecode instrumentation tool BISM that captures and prepares a model that abstracts the program behavior at the intra-procedural level. We analyze program methods we are interested in monitoring and construct a control-flow graph automaton where the states represent actions of the program that produce events. Directed towards monitoring general behavioral properties at runtime, the resulting model is presented for the users to write static analyzers and combine both static and runtime verification.
Business Process Model and Notation (BPMN) is a standard modeling language for workflow-based processes. Building an optimized process with this language is not easy for non-expert users due to the lack of support at design time. This work presents a lightweight modeling tool to support such users in building optimized processes. First, the user defines the tasks involved in the process and possibly gives a partial order between tasks. The tool then generates an abstract graph, which serves as a simplified version of the process being specified. Next, the user can refine this graph using the minimum and maximum execution time of the whole graph computed by the tool. Once the user is satisfied with a specific abstract graph, the tool synthesizes a BPMN process corresponding to that graph. Our tool is called WEASY and is available as an open-source web application.
Business Process Model and Notation (BPMN) is a standard business process modeling language that allows users to describe a set of structured tasks, which results in a service or product. Before running a BPMN process, the user often has no clear idea of the probability of executing some task or specific combination of tasks. This is, however, of prime importance for adjusting resources associated with tasks and thus optimizing costs. In this work, we define an approach to perform probabilistic model checking of BPMN models at runtime. To do so, we first transform the BPMN model into a Labeled Transition System (LTS). Then, by analyzing the execution traces obtained when running multiple instances of the process, we can compute the probability of executing each transition in the LTS model, and thus generate a Probabilistic Transition System (PTS). Finally, we perform probabilistic model checking for verifying that the PTS model satisfies a given probabilistic property. This verification loop is applied periodically to update the results according to the execution of the process instances. All these steps are implemented in a tool chain, which was applied successfully to several realistic BPMN processes.
This work has been published in iFM 9.
Business process optimization is a strategic activity in organizations because of its potential to increase profit margins and reduce operational costs. One of the main challenges in this context is concerned with the problem of optimizing the allocation and sharing of resources. In this work, processes are described using the BPMN notation extended with an explicit description of execution time and resources associated with tasks, and can be concurrently executed multiple times. First, a simulation-based approach for computing certain metrics of interest, such as average execution time or resource usage, is presented. This approach applies off-line and is static in the sense that the number of resources does not evolve over the time of the simulation. In a second step, an alternative approach is presented, which works online, thus requiring the instrumentation of an existing platform for retrieving information of interest during the processes’ execution. This second approach is dynamic because the number of resource replicas is updated over the time of the execution. This work aims at stressing pros and cons of both approaches, and at showing how they complement each other.
This work has been published in WRLA 5.
Business Process Model and Notation (BPMN) is a standard business process modeling language that allows users to describe a set of structured tasks, which results in a service or product. Before running a BPMN process, the user often has no clear idea of the probability of executing some task or specific combination of tasks. This is, however, of prime importance for adjusting resources associated with tasks and thus optimizing costs. In this work, we define an approach to perform probabilistic model checking of BPMN models at runtime. To do so, we first transform the BPMN model into a Labeled Transition System (LTS). Then, by analyzing the execution traces obtained when running multiple instances of the process, we can compute the probability of executing each transition in the LTS model, and thus generate a Probabilistic Transition System (PTS). Finally, we perform probabilistic model checking for verifying that the PTS model satisfies a given probabilistic property. This verification loop is applied periodically to update the results according to the execution of the process instances. All these steps are implemented in a tool chain, which was applied successfully to several realistic BPMN processes.
This work has been published in 6.
This domain is a new axis of the CORSE team. Our goal here is to combine our expertise in compilation and teaching to help teachers and learners in computer science fields such as programming, algorithms, data structures, automata, debugging, or more generally computing literacy. This axis is derived into two projects: Easytracker, which is a library that helps building tools to visualize program execution and data structures; and Agdbentures, a game that helps learners to gain skills in debugging, which is based on EasyTracker.
Learning to program involves building a mental representation of how a machine executes instructions and stores information in memory. To help students, teachers often use visual representations to illustrate executions of programs or particular concepts in their lectures. As a famous example, references/pointers are very often represented with arrows pointing to objects or memory locations. While these visual representations are most of the time hand-drawn, they nowadays tend to be supplemented by tool-generated ones. These tools have the advantage of being usable by learners, empowering them with the ability of validating their own understanding of the concept the tool aims at representing. However, building such a tool from scratch requires a lot of effort and a high level of technical expertize, and the ones that already exist are difficult to adapt to different contexts. In this work we developped EasyTracker, a library that assists teachers of programming courses in building tools that generate representations tuned to their needs from actual programs. At its core, EasyTracker provides ways of controlling the execution and inspecting the state of programs. The control and inspection are driven and customized through a Python interface. The controlled program itself can be written either in Python or in any GDB supported language like C. This work showcases two tools built on EasyTracker which are used in a teaching context to explain the notions of stack and heap, and to visualize recursion as well as Agdbentures, presented in the next section, a game prototype to help students learn debugging.
This work has been submitted for publication at ACM ITICSE 2023.
Debugging is an important task in software development and can be the source of a lot of frustration and time consumption. However, it is not often taught explicitly in computer science curricula even at university level. For these reasons, we developped Agdbentures, a debug practicing game where “levels” consist of programs containing bugs that the learner needs to debug to advance in the game.
In Agdbentures, the level programs are executed using Easytracker, which allows us to present a live visual representation of the program state during execution in the form of a 2D RPG-like world. For instance, the “player_x” and “player_y” variables in the level code are inspected at runtime and used to place a character representing the player on a graphical 2D map. The interest is three-fold: First, this makes the game appealing as the player/learner is plunged into a “real” game; Second, it showcases the importance of having information on the state of the program being executed in order to be able to do debugging; Third, it separates completely the graphical code, which can be very complex and is hidden from players, from the level code which is given to players: this allows us to simplify the source code so novice programmers won't be rebuked. The levels share a common codebase that is increasing in size and complexity as the player advances in the game. It initially only controls the main character position, then more features are added such has interactive objects, NPCs (non playable characters), level logic (activating levers, collecting items...). This allows the player to get familiar with the codebase over time so we can present more difficult bugs which could arise in real life development. It also allows us to create “fun” levels where bugs have interesting or amusing effects on the visual representation, and where finding the solution (fixing the bugs) is rewarding.
Although there is currently only about ten levels, the first experiments we conducted are very encouraging about the engagement of students at the L2 university level. All where eager to participate and declared they would really like to continue playing Agdbentures on their own with more levels.
This work has been done in the context of the AI4HI Inria exploratory project and has been submitted for publication at ACM ITICSE 2023.