EN FR
EN FR
BENAGIL - 2025

2025‌Activity reportProject-TeamBENAGIL‌​‌

RNSR: 202324438T
  • Research center​​ Inria Saclay Centre at​​​‌ Institut Polytechnique de Paris‌
  • In partnership with:Institut‌​‌ Polytechnique de Paris, TELECOM​​ SUDPARIS
  • Team name: Efficient​​​‌ and safe distributed systems‌
  • In collaboration with:Services‌​‌ répartis, Architectures, MOdélisation, Validation,​​ Administration des Réseaux

Creation​​​‌ of the Project-Team: 2023‌ September 01

Each year,‌​‌ Inria research teams publish​​ an Activity Report presenting​​​‌ their work and results‌ over the reporting period.‌​‌ These reports follow a​​ common structure, with some​​​‌ optional sections depending on‌ the specific team. They‌​‌ typically begin by outlining​​ the overall objectives and​​​‌ research programme, including the‌ main research themes, goals,‌​‌ and methodological approaches. They​​ also describe the application​​​‌ domains targeted by the‌ team, highlighting the scientific‌​‌ or societal contexts in​​ which their work is​​​‌ situated.

The reports then‌ present the highlights of‌​‌ the year, covering major​​ scientific achievements, software developments,​​​‌ or teaching contributions. When‌ relevant, they include sections‌​‌ on software, platforms, and​​ open data, detailing the​​​‌ tools developed and how‌ they are shared. A‌​‌ substantial part is dedicated​​ to new results, where​​​‌ scientific contributions are described‌ in detail, often with‌​‌ subsections specifying participants and​​ associated keywords.

Finally, the​​​‌ Activity Report addresses funding,‌ contracts, partnerships, and collaborations‌​‌ at various levels, from​​ industrial agreements to international​​​‌ cooperations. It also covers‌ dissemination and teaching activities,‌​‌ such as participation in​​​‌ scientific events, outreach, and​ supervision. The document concludes​‌ with a presentation of​​ scientific production, including major​​​‌ publications and those produced​ during the year.

Keywords​‌

Computer Science and Digital​​ Science

  • A1.1.1. Multicore, Manycore​​​‌
  • A1.1.4. High performance computing​
  • A1.1.13. Virtualization
  • A1.3.5. Cloud​‌

Other Research Topics and​​ Application Domains

  • B6.1.1. Software​​​‌ engineering

1 Team members,​ visitors, external collaborators

Research​‌ Scientist

  • Gael Thomas [​​Team leader, INRIA​​​‌, Senior Researcher,​ HDR]

Faculty Members​‌

  • Mathieu Bacou [TELECOM​​ SUDPARIS, Associate Professor​​​‌]
  • Elisabeth Brunet [​TELECOM SUDPARIS, Associate​‌ Professor]
  • Valentin Honore​​ [ENSIIE, Associate​​​‌ Professor]
  • Alexandre Nolin​ [TELECOM SUDPARIS,​‌ Associate Professor, from​​ Nov 2025]
  • Pierre​​​‌ Sutra [TELECOM SUDPARIS​, Professor, HDR​‌]
  • Francois Trahay [​​TELECOM SUDPARIS, Professor​​​‌, HDR]

Post-Doctoral​ Fellows

  • Nicolas Derumigny [​‌TELECOM SUDPARIS, Post-Doctoral​​ Fellow]
  • Ayush Pandey​​​‌ [TELECOM SUDPARIS,​ Post-Doctoral Fellow, from​‌ Apr 2025]

PhD​​ Students

  • Tara Aggoun [​​​‌TELECOM SUDPARIS, from​ Sep 2025]
  • Mickaël​‌ Boichot [TELECOM SUDPARIS​​, until Jul 2025​​​‌]
  • Adam Chader [​TELECOM SUDPARIS]
  • Jean-Francois​‌ Dumollard [TELECOM SUDPARIS​​]
  • Catherine Guelque [​​​‌TELECOM SUDPARIS]
  • Boubacar​ Kane [TELECOM SUDPARIS​‌, until Jan 2025​​]
  • Harena Rakotondratsima [​​​‌TELECOM SUDPARIS, from​ Sep 2025]
  • Marie​‌ Reinbigler [TELECOM SUDPARIS​​, until Sep 2025​​​‌]
  • Jules Risse [​INRIA]
  • Jana Toljaga​‌ [TELECOM SUDPARIS]​​
  • Guillermo Toyos Marfurt [​​​‌TELECOM SUDPARIS, from​ Aug 2025]
  • Nguyen​‌ Tung [TELECOM SUDPARIS​​, from Aug 2025​​​‌]
  • Lucas Van Lanker​ [CEA]
  • Nevena​‌ Vasilevska [TELECOM SUDPARIS​​]

Interns and Apprentices​​​‌

  • Tara Aggoun [TELECOM​ SUDPARIS, Intern,​‌ from Mar 2025 until​​ Sep 2025]
  • Joni​​​‌ Dervishi [Telecom SudParis​, Intern, from​‌ Mar 2025 until May​​ 2025]
  • Harena Rakotondratsima​​​‌ [TELECOM SUDPARIS,​ Intern, from Mar​‌ 2025 until Sep 2025​​]

Administrative Assistant

  • Julienne​​​‌ Moukalou [INRIA]​

2 Overall objectives

Distributed​‌ systems are pivotal to​​ many applications used in​​​‌ our daily life: AI,​ data analytics, online gaming,​‌ social networks, web services,​​ healthcare, etc. Because they​​​‌ have to sustain massive​ workloads, these systems scatter​‌ computation across many units,​​ which coordinate to store​​​‌ the input, execute the​ calculus and return results​‌ in a usable manner​​ to the application. Inefficiencies​​​‌ in these infrastructures hinder​ the ability to handle​‌ large computations. They also​​ lead to wasting energy​​​‌ and hardware resources. Errors​ at runtime may result​‌ in painful data losses​​ and exploitable security loopholes.​​​‌ As a consequence, designing​ and implementing such systems​‌ in an efficient and​​ safe manner is essential,​​​‌ and it has a​ strong commitment from all​‌ the major IT industries.​​

The Benagil team works​​​‌ on the design and​ implementation of more efficient​‌ and safer distributed systems.​​ For that, the Benagil​​​‌ team focuses on the​ core system components at​‌ the frontier with the​​ hardware: hypervisors, operating systems,​​ language runtimes, storage systems​​​‌ and communication libraries. Improving‌ the efficiency and safety‌​‌ of distributed systems is​​ a challenging task. Modern​​​‌ distributed systems manage large‌ pools of machines, a‌​‌ plethora of users and​​ they process very large​​​‌ datasets. Consequently, they are‌ inherently complex and both‌​‌ their design and implementation​​ is notoriously hard. Complexity​​​‌ arises from the software‌ stack, the algorithms at‌​‌ the core of these​​ systems, as well as​​​‌ the hardware itself:

  • System‌ software level. A typical‌​‌ modern computer system runs​​ many software components: hypervisors​​​‌ (e.g., KVM/Qemu), operating systems‌ (e.g., Linux), container systems‌​‌ (e.g., Linux containers), language​​ runtimes (e.g., the Java​​​‌ virtual machine) and specialized‌ runtimes for HPC (e.g.,‌​‌ MPI), data analytics (e.g.,​​ Spark) or AI (e.g.​​​‌ PyTorch). Such software are‌ today very large. For‌​‌ instance, the last version​​ of the Linux kernel​​​‌ runs over 22,000,000 lines‌ of code.
  • Distributed system‌​‌ level. As pointed above,​​ modern systems are distributed,​​​‌ involving many machines. These‌ machines are connected with‌​‌ heterogeneous networks, ranging from​​ fast local networks (e.g.,​​​‌ Infiniband or Ethernet 10Gb)‌ to high-latency planet-scale connections.‌​‌ Many of these systems​​ have to be highly​​​‌ available, that is they‌ need to be responsive‌​‌ 99.999% of the time.​​ This requires to use​​​‌ complex monitoring mechanisms and‌ replication algorithms that solve‌​‌ trade-offs between availability and​​ performance. Distributed systems need​​​‌ also to do fine-grained‌ task and data placement‌​‌ choices. They aggregate resources,​​ have to use them​​​‌ efficiently, and provide high-enough‌ isolation levels between the‌​‌ multiple applications using them.​​
  • Machine level. Internally, each​​​‌ machine is a very‌ complex entity. It is‌​‌ today composed of multiple​​ processors, memory banks and​​​‌ devices inter-connected with a‌ complex network. A processor‌​‌ contains tens of cores​​ with finely tunable cache​​​‌ hierarchies and out-of-order execution‌ pipelines. Each core is‌​‌ a very dense unit​​ of calculus, as testified​​​‌ by the specification of‌ Intel Skylake that covers‌​‌ more than 4,800 pages.​​ A machine also often​​​‌ includes multiple heterogeneous accelerators‌ and specialized hardware such‌​‌ as persistent memory that​​ provides durability at the​​​‌ nanosecond scale, GPUs specialized‌ for massively parallel computations,‌​‌ FPGAs used to offload​​ complex computations from the​​​‌ CPUs, and TPUs specialized‌ in deep neural network‌​‌ computation. Accessing all these​​ components is not uniform​​​‌ both in terms of‌ bandwidth and latency. Heterogeneity‌​‌ must be taken into​​ account at multiple levels​​​‌ of the system stack.‌ This makes data access‌​‌ optimization especially challenging. This​​ complexity also opens security​​​‌ breaches, such as cache‌ timing attacks, code timing‌​‌ attacks and data access​​ pattern attacks. Preventing these​​​‌ attacks requires to solve‌ complex trade-offs between performance,‌​‌ security and usability.

The​​ inherent complexity of distributed​​​‌ systems makes analyzing their‌ performance and safety difficult.‌​‌ This difficulty is increased​​ by complex and unexpected​​​‌ interactions between software and‌ hardware components. Besides that,‌​‌ understanding and improving the​​ system components in the​​​‌ context of distributed systems‌ require an expertise in‌​‌ many areas: hypervisors, operating​​ systems, containerization, language runtimes,​​​‌ compilation, network, architecture, web,‌ databases, data analytics runtimes,‌​‌ cloud runtimes and distributed​​​‌ algorithms. As an example,​ in a previous work,​‌ we observed a large​​ performance degradation in a​​​‌ data analytics application written​ in Scala (namely, PageRank​‌ in Apache Spark). This​​ phenomenon was caused by​​​‌ a bad memory placement​ performed by the Java​‌ virtual machine on a​​ non-uniform memory architecture. This​​​‌ issue was also reinforced​ by the use of​‌ a (system) virtual machine​​ that blindly allocates memory​​​‌ from any memory bank.​ Another source of inefficiencies​‌ was due to the​​ hypervisor which was continuously​​​‌ moving memory without telling​ the virtual machine. All​‌ in all, understanding and​​ solving the performance bottleneck​​​‌ at each level of​ the system stack took​‌ us 8 years. It​​ involved 3 PhD students​​​‌ and 6 researchers with​ expertise in different system​‌ areas.

3 Research program​​

The Benagil team works​​​‌ on improving the performance​ and the safety of​‌ the core system components​​ of the distributed systems.​​​‌ In order to achieve​ this goal, we propose​‌ a systematic approach. This​​ approach first consists in​​​‌ profiling and analyzing current​ distributed systems to identify​‌ their limits in term​​ of efficiency and/or safety​​​‌ when they execute large​ distributed applications. Then, building​‌ upon this analysis, we​​ develop new algorithms, mechanisms​​​‌ and components to improve​ them.

The Benagil team​‌ is structured along three​​ main axes which articulate​​​‌ the above approach. The​ first axis is devoted​‌ to performance profiling and​​ analysis. In this axis,​​​‌ we introduce new tools​ and techniques to automatically​‌ analyze the performance of​​ a large distributed system.​​​‌ Based on this analysis,​ we identify performance issues,​‌ which we use as​​ input in the two​​​‌ other axes to improve​ performance. The two other​‌ axes study two aspects​​ of the system components.​​​‌ In the system components​ for cloud infrastructure axis,​‌ we devise new system​​ techniques to improve the​​​‌ performance and safety of​ two core system components​‌ used in cloud infrastructure:​​ virtualization and storage. In​​​‌ the system components for​ emerging computing models, we​‌ propose new system mechanisms​​ and interfaces for two​​​‌ pivotal upcoming programming models:​ serverless and edge computing.​‌

3.1 Performance analysis

Due​​ to the high complexity​​​‌ of modern large-scale distributed​ applications, understanding performance problems​‌ is a tedious task​​ even for the most​​​‌ experienced programmers. A performance​ bottleneck may arise from​‌ different interactions, between hardware​​ and software, or between​​​‌ different software components. Even​ just a single contended​‌ lock, or a falsely​​ shared cache line, in​​​‌ one of the system​ components may lead to​‌ a dramatic slowdown.

Because​​ of this complexity, manually​​​‌ identifying the root cause​ of a performance bottleneck​‌ is notoriously difficult. In​​ this axis, we propose​​​‌ to help the developer​ by designing new profiling​‌ tools able to handle​​ the complexity of hardware​​​‌ and software stacks, and​ able to scale with​‌ the size of the​​ system.

3.2 System components​​​‌ for the cloud

In​ this axis, we aim​‌ at studying and designing​​ the next generation of​​​‌ systems for cloud infrastructures.​ Today, these infrastructures are​‌ undergoing major changes at​​ the hardware level with​​ the generalization of ultra-fast​​​‌ networks at the micro-second‌ scale (e.g., RDMA) and‌​‌ storage devices (e.g., NVMe​​ or Non-Volatile Memory). Their​​​‌ joint arrivals require to‌ radically revisit the way‌​‌ we design two core​​ system components of any​​​‌ cloud infrastructure: the virtualization‌ system and the storage‌​‌ system.

3.3 System​​ components for emerging computing​​​‌ models

At a higher‌ level of the system‌​‌ stack, we are witnessing​​ the arrival of two​​​‌ new computing models: serverless‌ computing and edge computing‌​‌. These computing models​​ deeply change the assumptions​​​‌ under which the current‌ system components were built.‌​‌ Current system components assume​​ long-running applications and powerful​​​‌ computing infrastructures. However, this‌ is no more the‌​‌ case with these two​​ new computing models. In​​​‌ serverless computing, applications are‌ split into short-lived tasks.‌​‌ In edge computing, applications​​ execute at the border​​​‌ of the network, atop‌ low performance hardware.

4‌​‌ Application domains

Overall, the​​ Benagil team is mostly​​​‌ specialized on the low-level‌ components of distributed systems.‌​‌ This specialization is at​​ the frontier of security,​​​‌ hardware, high-performance computing (HPC),‌ machine learning, data analytics‌​‌ and databases. With respect​​ to security, the team​​​‌ studies some system aspects,‌ such as trusted execution‌​‌ environments (e.g., Intel SGX)​​ to protect applications, or​​​‌ data replication to improve‌ availability. However, the Benagil‌​‌ team is not a​​ security one per se.​​​‌ Regarding hardware, the Benagil‌ team has a strong‌​‌ background in using modern​​ hardware such as persistent​​​‌ memory or GPU. This‌ knowledge is crucial to‌​‌ efficiently use the hardware​​ in system components. However,​​​‌ the team is only‌ consuming hardware and does‌​‌ not directly design it.​​ This is also the​​​‌ case with HPC, machine‌ learning and data analytics.‌​‌ The Benagil team understand​​ the system requirements of​​​‌ these highly-demanding applications, and‌ use them to benchmark‌​‌ their system components. However​​ the team only rarely​​​‌ contribute to these runtimes‌ themselves. The Benagil team‌​‌ has also a strong​​ knowledge regarding the storage​​​‌ system components used in‌ databases. This includes the‌​‌ algorithmic and implementation concerns​​ related to data distribution,​​​‌ consistency, replication and persistence.‌ However, the Benagil team‌​‌ is not specialized in​​ database in general.

5​​​‌ Highlights of the year‌

  • Four PhD students of‌​‌ the team defended their​​ PhD in 2025: Boubacar​​​‌ Kane, Mickaël Boichot, Marie‌ Reinbigler, and Adam Chader‌​‌
  • The team obtained three​​ new grants: the ANR​​​‌ JCJC VHS, the ANR‌ Centeanes, and the PIA‌​‌ Camelia

6 Latest software​​ developments, platforms, open data​​​‌

6.1 Latest software developments‌

6.1.1 EZTrace

  • Keywords:
    MPI‌​‌ communication, Execution trace, Traces,​​ High performance computing, Performance​​​‌ analysis, HPC, OpenMP, CUDA‌
  • Functional Description:

    The improvement‌​‌ of the performances of​​ parallel applications (numerical simulation​​​‌ for example) is an‌ important phase of the‌​‌ development. For that it​​ is necessary to detect​​​‌ the various phases of‌ the application and to‌​‌ understand the performances of​​ them.

    The automatic generation​​​‌ of traces of execution‌ makes it possible the‌​‌ developer to quickly detect​​ simply and the various​​​‌ phases of the application‌ and to understand the‌​‌ behavior of it.

  • URL:​​​‌
  • Publications:
  • Contact:​
    Francois Trahay
  • Participant:
    2​‌ anonymous participants

6.1.2 Pallas​​

  • Keywords:
    Performance analysis, HPC,​​​‌ High performance computing, Execution​ trace
  • Functional Description:
    Pallas​‌ is a generic trace​​ format tailored for conducting​​​‌ various post-mortem performance analyses​ of traces describing large​‌ executions of HPC applications.​​ During the execution of​​​‌ the application, Pallas collects​ events and detects their​‌ repetitions on-the-fly. When storing​​ the trace to disk,​​​‌ PALLAS groups the data​ from similar events or​‌ groups of events together​​ in order to later​​​‌ speed up trace reading.​ The Pallas format allows​‌ faster trace analysis compared​​ to other trace formats.​​​‌
  • URL:
  • Contact:
    Francois​ Trahay
  • Participant:
    an anonymous​‌ participant

6.1.3 numamma

  • Keywords:​​
    NUMA, Memory Allocation, Profiling​​​‌
  • Functional Description:
    NumaMMa is​ both a NUMA memory​‌ profiler/analyzer and a NUMA​​ application execution engine. The​​​‌ profiler allows to run​ an application while gathering​‌ information about memory accesses.​​ The analyzer visually reports​​​‌ information about the memory​ behavior of the application​‌ allowing to identify memory​​ access patterns. Based on​​​‌ the results of the​ analyzer, the execution engine​‌ is capable of executing​​ the application in an​​​‌ efficient way by allocating​ memory pages in a​‌ clever way.
  • URL:
  • Publications:
  • Contact:
    Francois Trahay
  • Participant:​
    an anonymous participant

6.1.4​‌ ForkNox

  • Name:
    ForkNox: a​​ micro-hypervisor to protect Linux​​​‌
  • Keywords:
    Virtualization, Security
  • Functional​ Description:
    ForkNox is a​‌ micro-hypervisor designed to protect​​ Linux. By leveraging virtualization​​​‌ techniques, ForkNox can revoke​ read, write, and execute​‌ permissions for specific memory​​ regions of Linux. This​​​‌ ensures that, even if​ Linux is under attack,​‌ the attacker cannot modify​​ those parts of the​​​‌ system.
  • Release Contributions:
    Initial​ version of the software.​‌
  • News of the Year:​​
    Initial version of the​​​‌ software.
  • URL:
  • Contact:​
    Gael Thomas
  • Participant:
    4​‌ anonymous participants

6.1.5 VoliMem​​

  • Name:
    VoliMem: a lightweight​​​‌ virtualization for processes
  • Keyword:​
    Virtualization
  • Functional Description:
    VoliMem​‌ is a small library​​ that remaps a native​​​‌ process inside a virtual​ machine. Thanks to this,​‌ the process gains access​​ to low-level system hardware​​​‌ primitives, such as a​ page table in user​‌ space or fast inter-processor​​ interrupts.
  • Release Contributions:
    Initial​​​‌ version of the prototype​
  • News of the Year:​‌
    Initial implementation of the​​ software.
  • URL:
  • Contact:​​​‌
    Gael Thomas
  • Participant:
    3​ anonymous participants

6.1.6 Tele-GC​‌

  • Name:
    Tele-GC: a garbage​​ collector for disaggregated memory​​​‌
  • Keywords:
    Garbage Collection, Java,​ Disaggregated memory
  • Functional Description:​‌
    Tele-GC is a garbage​​ collector specifically designed for​​​‌ disaggregated memory. It runs​ the application on the​‌ compute node while the​​ garbage collector operates on​​​‌ the memory node. Tele-GC​ leverages the discrepancy between​‌ the cache on the​​ compute node and the​​​‌ memory on the memory​ node to avoid any​‌ synchronization during a collection.​​
  • URL:
  • Contact:
    Gael​​​‌ Thomas

6.1.7 FaaSLoad

  • Keywords:​
    Cloud computing, Serverless, Function-as-a-Service,​‌ Measures, Resource utilization, Workload​​ injection, Performance measure
  • Scientific​​​‌ Description:
    FaaSLoad is a​ tool to gather fine-grained​‌ data about performance and​​ resource usage of the​​ programs that run on​​​‌ Function-as-a-Service cloud platforms. It‌ considers individual instances of‌​‌ functions to collect hardware​​ and operating-system performance information,​​​‌ by monitoring them while‌ injecting a workload. FaaSLoad‌​‌ helps building a dataset​​ of function executions to​​​‌ train machine learning models,‌ studying at fine grain‌​‌ the behavior of function​​ runtimes, and replaying real​​​‌ workload traces for in‌ situ observations.
  • Functional Description:‌​‌
    Invoke functions in a​​ Function-as-a-Service platform, and gather​​​‌ data about their performance‌ and their resource usage‌​‌ to understand their behavior​​ in Serverless environments.
  • Release​​​‌ Contributions:
    Stabilization and opening‌ to outsiders.
  • News of‌​‌ the Year:
    Release of​​ public version 2.0 (and​​​‌ then 2.1.0), the first‌ mature and useful to‌​‌ outsiders. Published in a​​ dedicated scientific paper at​​​‌ OPODIS'24.
  • URL:
  • Publications:‌
  • Contact:‌​‌
    Mathieu Bacou
  • Participant:
    an​​ anonymous participant

7 New​​​‌ results

This year, the‌ Benagil team carried out‌​‌ research projects along three​​ axes.

In the performance​​​‌ analysis axis, the Benagil‌ team studied: (i) the‌​‌ optimization of performance trace​​ representations to improve analysis​​​‌ time, (ii) methods for‌ measuring energy consumption at‌​‌ a fine granularity, and​​ (iii) the performance prediction​​​‌ of an application when‌ we change hardware.

In‌​‌ the system components for​​ cloud infrastructures axis, the​​​‌ Benagil team studied: (i)‌ how we can simplify‌​‌ the use of persistent​​ memory by relying on​​​‌ a page table to‌ identify the dirty set‌​‌ of a transaction, (ii)​​ the protection of the​​​‌ internal data structures of‌ the Linux kernel with‌​‌ virtualization techniques, and (iii)​​ the memory collection of​​​‌ a large heap in‌ a disaggregated context.

In‌​‌ the system components for​​ emerging computing models axis,​​​‌ the Benagil team worked‌ on: (i) analyzing large‌​‌ images on a modest​​ cluster, and (ii) adjusting​​​‌ data consistency to the‌ actual needs of an‌​‌ application.

7.1 Performance analysis​​

7.1.1 Scalable trace format​​​‌

Participants: Catherine Guelque,‌ Valentin Honoré, Philippe‌​‌ Swartvegher [Inria TOPAL],​​ François Trahay.

Identifying​​​‌ performance bottlenecks in a‌ parallel application is tedious,‌​‌ especially because it requires​​ analyzing the behavior of​​​‌ various software components, as‌ bottlenecks may have several‌​‌ causes and symptoms. Detecting​​ a performance problem means​​​‌ investigating the execution of‌ an application and applying‌​‌ several performance analysis techniques.​​ To do so, one​​​‌ can use a tracing‌ tool to collect information‌​‌ describing the behavior of​​ the application. At the​​​‌ end of the execution,‌ a trace file in‌​‌ a specific format is​​ available to the application​​​‌ user, which can be‌ used to conduct a‌​‌ complete post-mortem investigation. When​​ analyzing the performance of​​​‌ application running at a‌ large scale, the post-mortem‌​‌ analysis needs to load​​ thousands of trace files​​​‌ in memory, and process‌ them. This quickly becomes‌​‌ impractical for large scale​​ applications, as memory gets​​​‌ exhausted and the number‌ of opened files exceeds‌​‌ the system capacity.

As​​ part of the Exa-SofT​​​‌ project, Catherine Guelque proposes‌ Pallas, a generic trace‌​‌ format tailored for conducting​​ various post-mortem performance analyses​​​‌ of traces describing large‌ executions of HPC applications‌​‌ 7. During the​​​‌ execution of the application,​ Pallas collects events and​‌ detects their repetitions on-the-fly.​​ When storing the trace​​​‌ to disk, Pallas groups​ the data from similar​‌ events or groups of​​ events together in order​​​‌ to later speed up​ trace reading. We conducted​‌ large-scale experiments on the​​ Jean-Zay supercomputer to evaluate​​​‌ Pallas. Our experiments show​ that the Pallas format​‌ allows faster trace analysis​​ compared to other evaluated​​​‌ trace formats. Overall, the​ Pallas trace format allows​‌ an interactive analysis of​​ a trace that is​​​‌ required when a user​ investigates a performance problem.​‌ These results were presented​​ at IPDPS'257.​​​‌

7.1.2 Fine-grain energy measurement​

Participants: Jules Risse,​‌ Amina Guermouche [Inria STORM]​​, François Trahay.​​​‌

The power consumption of​ supercomputers is and will​‌ be a major concern.​​ As a matter of​​​‌ fact, Frontier, the fastest​ super computer in the​‌ world consumes around 20​​ MW. As a consequence,​​​‌ reducing the power consumtion​ of HPC applications is​‌ mandatory. The first step​​ towards reducing the power​​​‌ consumption of programs is​ being able to monitor​‌ their energy consumption. Servers​​ usually contain wattmeters able​​​‌ to measure the power​ consumption of the CPU,​‌ the memory, the GPU,​​ etc. However, these wattmeters​​​‌ only provide coarse grain​ energy measurement, with a​‌ typical measurement period of​​ dozens of milliseconds. During​​​‌ this period of time,​ the application may execute​‌ hundreds of tasks. As​​ a result, analyzing the​​​‌ power consumption of an​ application at the microsecond​‌ scale is tedious.

As​​ part of the Exa-SofT​​​‌ project, Jules Risse's PhD​ investigates fine grain energy​‌ measurement in StarPU. Since​​ StarPU executes many instances​​​‌ of a few types​ of tasks, it should​‌ be possible to build​​ an energy consumption model​​​‌ of each type of​ task. The energy consumption​‌ model can then be​​ provided to StarPU so​​​‌ that the task scheduling​ takes into account both​‌ the performance of tasks,​​ and their energy consumption.​​​‌ In this project, we​ measure the energy consumption​‌ of a server (its​​ CPU, GPU, etc.) at​​​‌ coarse-grain (typically, one sample​ every 20 ms), and​‌ we log which tasks​​ were executed during this​​​‌ period of time. By​ repeating this many times,​‌ we build a linear​​ system that can be​​​‌ solved to model the​ energy consumption of microsecond-scale​‌ tasks. We show that​​ the model can accurately​​​‌ predict the energy consumption​ of fine grain tasks​‌ running on CPUs. We​​ conducted similar experiments on​​​‌ GPUs where the accuracy​ is lower due to​‌ errornous power consumption metrics​​ reported by the GPU.​​​‌ These results were presented​ at Cluster'25 9.​‌

7.1.3 Performance prediction

Participants:​​ Lucas Van Lanker,​​​‌ Hugo Taboada [CEA/DAM],​ Mickaêl Boichot, Adrien​‌ Roussel [CEA/DAM], Patrick​​ Carribault [CEA/DAM], Elisabeth​​​‌ Brunet, François Trahay​.

With the advent​‌ of heterogeneous systems that​​ combine CPUs and GPUs,​​​‌ designing a supercomputer becomes​ more and more complex.​‌ The hardware characteristics of​​ GPUs significantly impact the​​​‌ performance. Choosing the GPU​ that will maximize performance​‌ for a limited budget​​ is tedious because it​​ requires predicting the performance​​​‌ on a non-existing hardware‌ platform.

During his Phd,‌​‌ Mickaël Boichot studied the​​ relation between the expressed​​​‌ parallelism and memory footprint‌ of loops in order‌​‌ to extrapolate which data​​ sizes provide sufficient parallelism​​​‌ to load a new‌ GPU architecture. In the‌​‌ case oversubscribing memory, his​​ work focused on how​​​‌ to efficiently exploit new‌ unified memory feature of‌​‌ GPU in order to​​ place data where it​​​‌ is most often reused.‌ These results are detailed‌​‌ in his thesis and​​ in 6

Lucas Van​​​‌ Lanker's PhD explores means‌ for predicting the performance‌​‌ of kernels running on​​ GPUs. We propose a​​​‌ methodology that analyzes the‌ behavior of an application‌​‌ running on an existing​​ platform, and projects its​​​‌ performance on another GPU‌ based on the target‌​‌ hardware characteristics. The performance​​ projection relies on a​​​‌ hierarchical roofline model as‌ well as on a‌​‌ comparison of the kernel’s​​ assembly instructions of both​​​‌ GPUs to estimate the‌ operational intensity of the‌​‌ target GPU. Our experiments​​ show that the performance​​​‌ can be predicted accurately‌ at a low cost.‌​‌

7.2 System components for​​ the cloud

7.2.1 VoliPMem:​​​‌ using transparently a persistent‌ memory

Participants: Jana Toljaga‌​‌, Tara Aggoun,​​ Gaël Thomas, Mathieu​​​‌ Bacou, Nicolas Derumigny‌.

Handling persistent memory‌​‌ is complex because the​​ application can fail at​​​‌ any time, leaving the‌ persistent memory in an‌​‌ inconsistent state. To avoid​​ inconsistency, the developer must​​​‌ use transactions, which are‌ applied with an all-or-nothing‌​‌ semantics at the end​​ of a transaction. To​​​‌ achieve this, persistent memory‌ writes are first executed‌​‌ in volatile memory and​​ only applied to persistent​​​‌ memory at the end‌ of a transaction. Unfortunately,‌​‌ currently, to propagate the​​ writes, the developer has​​​‌ to explicitly indicate the‌ modified memory locations, which‌​‌ is cumbersome and error-prone.​​ With VoliPMem (PhD thesis​​​‌ of Jana Toljaga), we‌ propose to transparently identify‌​‌ the memory locations modified​​ inside a transaction for​​​‌ the developer. To achieve‌ this, we rely on‌​‌ a library called VoliMem,​​ which, instead of executing​​​‌ a process natively, executes‌ it in a lightweight‌​‌ virtual machine. By executing​​ the process inside a​​​‌ virtual machine, the application‌ can directly manage a‌​‌ secondary page table within​​ its address space. In​​​‌ VoliPMem, we use this‌ page table to automatically‌​‌ identify the modified memory​​ locations. Specifically, VoliPMem identifies​​​‌ the modified pages by‌ traversing the page table‌​‌ to find the dirty​​ pages. At the end​​​‌ of a transaction, VoliPMem‌ collects these dirty pages‌​‌ and copies them atomically​​ to persistent memory with​​​‌ an all-or-nothing semantics. Thanks‌ to these abstractions, using‌​‌ persistent memory becomes straightforward:​​ the developer simply has​​​‌ to indicate the boundaries‌ of the transaction and‌​‌ no longer has to​​ worry about annotating each​​​‌ write.

7.2.2 ForkNox: protecting‌ the internal data structures‌​‌ of Linux

Participants: Jean-François​​ Dumollard, Harena Rakotondratsima​​​‌, Gaël Thomas,‌ Mathieu Bacou, Nicolas‌​‌ Derumigny.

Linux is​​ designed as a monolithic​​​‌ kernel, which leaves it‌ vulnerable as soon as‌​‌ an attacker can execute​​​‌ code in system mode.​ Technically, if an attacker​‌ can execute code in​​ system mode, the attacker​​​‌ can modify any part​ of Linux: the attacker​‌ can alter Linux’s code,​​ modify any data structure,​​​‌ and disable any security​ mechanisms installed by Linux.​‌ With ForkNox (PhD thesis​​ of Jean-François Dumollard), we​​​‌ propose a new technique​ to enforce Linux’s security,​‌ even if an attacker​​ is able to execute​​​‌ code inside the kernel.​ To do so, we​‌ introduce a new protection​​ ring by leveraging the​​​‌ processor’s virtualization feature. Specifically,​ ForkNox is a Linux​‌ module that runs as​​ a hypervisor while Linux​​​‌ runs as a guest​ operating system. By leveraging​‌ virtualization, ForkNox can revoke​​ read, write, or execute​​​‌ permissions for important Linux​ memory regions, which allows​‌ ForkNox to protect Linux​​ against an attacker capable​​​‌ of executing code inside​ the Linux kernel.

7.2.3​‌ Tele-GC: a garbage collector​​ for disaggregated memory

Participants:​​​‌ Adam Chader, Nevena​ Vasilevska, Yohan Pipereau​‌ [Engineer at Gandi],​​ Gaël Thomas, Mathieu​​​‌ Bacou, Nicolas Derumigny​.

A disaggregated infrastructure​‌ simplifies hardware resource management.​​ In detail, in a​​​‌ disaggregated infrastructure, the cloud​ system can dynamically adjust​‌ the hardware resources allocated​​ to a virtual machine​​​‌ to its actual use​ by allocating hardware resource​‌ from a specialized blade.​​ Designing a garbage collector​​​‌ in this context is​ challenging because of the​‌ high-memory latency. With TéléGC​​ (PhD of Adam Chader),​​​‌ we propose a new​ garbage collector (GC) for​‌ a disaggregated infrastructure. TéléGC​​ runs on the memory​​​‌ node while the application​ runs on the CPU​‌ node. It runs concurrently​​ with the application while​​​‌ avoiding most synchronization. To​ achieve this, we introduce​‌ the write-back barrier. With​​ the write-back barrier, instead​​​‌ of synchronously executing a​ barrier when the application​‌ writes to the heap,​​ TéléGC executes a barrier​​​‌ asynchronously later, when the​ CPU node writes back​‌ a page to the​​ memory node. Thanks to​​​‌ this, the application does​ not pay the cost​‌ of synchronizing with the​​ GC, boosting its performance.​​​‌ Our evaluation on a​ disaggregated infrastructure shows that​‌ TéléGC significantly reduces both​​ completion time and pause​​​‌ time compared to Mako​ and G1, which are​‌ the state-of-the-art GCs of​​ Hotspot.

7.3 System components​​​‌ for emerging computing models​

7.3.1 Efficient Pyramidal Analysis​‌ of Gigapixel Images on​​ a Decentralized Modest Computer​​​‌ Cluster

Participants: Marie Reinbigler​, Rishi Sharma [EPFL]​‌, Rafael Pires [EPFL]​​, Elisabeth Brunet,​​​‌ Anne-Marie Kermarrec [EPFL],​ Catalin Fetita [Telecom SudParis]​‌.

Analyzing gigapixel images​​ is recognized as computationally​​​‌ demanding. In this work,​ we introduce PyramidAI, a​‌ technique for analyzing gigapixel​​ images with reduced computational​​​‌ cost 8. The​ proposed approach adopts a​‌ gradual analysis of the​​ image, beginning with lower​​​‌ resolutions and progressively concentrating​ on regions of interest​‌ for detailed examination at​​ higher resolutions. We investigated​​​‌ two strategies for tuning​ the accuracy-computation performance trade-off​‌ when implementing the adaptive​​ resolution selection, validated against​​​‌ the Camelyon16 dataset of​ biomedical images. Our results​‌ demonstrate that PyramidAI substantially​​ decreases the amount of​​ processed data required for​​​‌ analysis by up to‌ 2.65x, while preserving the‌​‌ accuracy in identifying relevant​​ sections on a single​​​‌ computer. To ensure democratization‌ of gigapixel image analysis,‌​‌ we evaluated the potential​​ to use mainstream computers​​​‌ to perform the computation‌ by exploiting the parallelism‌​‌ potential of the approach.​​ Using a simulator, we​​​‌ estimated the best data‌ distribution and load balancing‌​‌ algorithm according to the​​ number of workers. The​​​‌ selected algorithms were implemented‌ and highlighted the same‌​‌ conclusions in a real-world​​ setting. Analysis time is​​​‌ reduced from more than‌ an hour to a‌​‌ few minutes using 12​​ modest workers, offering a​​​‌ practical solution for efficient‌ large-scale image analysis.

7.3.2‌​‌ Efficient and Principled Approaches​​ to Scalable Programming

Participants:​​​‌ Boubacar Kane, Tung‌ Nguyen, Pierre Sutra‌​‌.

Parallel programs require​​ software support to coordinate​​​‌ access to shared data.‌ For this purpose, modern‌​‌ programming languages provide strongly-consistent​​ shared objects. To account​​​‌ for their many usages,‌ these objects offer a‌​‌ large API. However, in​​ practice, each program calls​​​‌ only a tiny fraction‌ of the interface. Leveraging‌​‌ such an observation, we​​ propose to tailor a​​​‌ shared object for a‌ specific usage. We call‌​‌ this principle adjusted objects​​.

Adjusted objects already​​​‌ exist in the wild.‌ Our work provides their‌​‌ first systematic study. We​​ explain how everyday programmers​​​‌ already adjust common shared‌ objects (such as queues,‌​‌ maps, and counters) for​​ better performance. We present​​​‌ the formal foundations of‌ adjusted objects using a‌​‌ new tool to characterize​​ scalability, the indistinguishability graph.​​​‌ Leveraging this study, we‌ introduce a library named‌​‌ DEGO to inject adjusted​​ objects in a Java​​​‌ program. In micro-benchmarks, objects‌ from the DEGO library‌​‌ improve the performance of​​ standard JDK shared objects​​​‌ by up to two‌ orders of magnitude. We‌​‌ also evaluate DEGO with​​ a Retwis-like benchmark modeled​​​‌ after a social network‌ application. On a modern‌​‌ server-class machine, DEGO boosts​​ by up to 1.7x​​​‌ the performance of the‌ benchmark. This work was‌​‌ conducted during the PhD​​ of Boubacar Kane, who​​​‌ successfully defended in January‌ 2025 11.

A‌​‌ key question in concurrent​​ programming is determining the​​​‌ synchronization power of a‌ shared object. An object‌​‌ has consensus number n​​ when n is the​​​‌ largest number for which‌ we may solve consensus‌​‌ with copies of this​​ object and registers. The​​​‌ indistinguishability graph can be‌ used to characterize the‌​‌ consensus number of a​​ shared object. However, this​​​‌ characterizations is incomplete, and‌ it covers only objects‌​‌ that are readable. In​​ a seminal work, Herlihy​​​‌ and Ruppert provide an‌ exact characterization of the‌​‌ consensus number for deterministic​​ one-shot objects (that can​​​‌ be accessed by each‌ process at most once).‌​‌ In 12, we​​ extend the study of​​​‌ Herlihy and Ruppert to‌ deterministic two-shot objects in‌​‌ a two-process system. Such​​ objects that can be​​​‌ accessed by each process‌ at most twice. We‌​‌ introduce three disjoint classes​​ of two-shot objects: The​​​‌ first class is similar‌ to one-shot objects in‌​‌ the sense that the​​​‌ first operation call gives​ enough information to solve​‌ consensus. Objects in the​​ second class do not​​​‌ provide any useful information​ after the first call​‌ to one of the​​ two processes. The last​​​‌ class contains objects for​ which calling the object​‌ twice is always necessary.​​ In this class, the​​​‌ second operation to call​ is chosen adaptively, which​‌ may lead to using​​ different operations in different​​​‌ schedules. For instance, the​ second operation used in​‌ a solo run might​​ differ from the one​​​‌ called when processes interleave.​ We show that these​‌ three classes provide an​​ exact characterization of the​​​‌ two-shot deterministic objects able​ to solve two-process consensus.​‌

8 Bilateral contracts and​​ grants with industry

8.1​​​‌ Bilateral contracts with industry​

Participants: Mickaël Boichot,​‌ Lucas Van Lanker.​​

  • Contract with CEA for​​​‌ the PhD of Mickaël​ Boichot (2021-2025), and Lucas​‌ Van Lanker (2024-2027)
  • Adobe​​ research gift to support​​​‌ our research activities.

9​ Partnerships and cooperations

9.1​‌ National initiatives

PEPR NumPex​​ – Exa-SofT

Participants: Catherine​​​‌ Guelque, Jules Risse​, Élisabeth Brunet,​‌ Valentin Honoré, François​​ Trahay.

Partners: Université​​​‌ Paris Saclay, Télécom SudParis,​ CEA, CNRS, Inria

Coordinator:​‌ Raymond Namyst, Inria Bordeaux​​

Funding: 453 k€

Date:​​​‌ 2023-2028

Summary: Though significant​ efforts have been devoted​‌ to the implementation and​​ optimization of several crucial​​​‌ parts of a typical​ HPC software stack, most​‌ HPC experts agree that​​ exascale supercomputers will raise​​​‌ new challenges, mostly because​ the trend in exascale​‌ compute-node hardware is toward​​ heterogeneity and scalability: Compute​​​‌ nodes of future systems​ will have a combination​‌ of regular CPUs and​​ accelerators (typically GPUs), along​​​‌ with a diversity of​ GPU architectures. Meeting the​‌ needs of complex parallel​​ applications and the requirements​​​‌ of exascale architectures raises​ numerous challenges which are​‌ still left unaddressed. As​​ a result, several parts​​​‌ of the software stack​ must evolve to better​‌ support these architectures. More​​ importantly, the links between​​​‌ these parts must be​ strengthened to form a​‌ coherent, tightly integrated software​​ suite. Our project aims​​​‌ at consolidating the exascale​ software ecosystem by providing​‌ a coherent, exascale-ready software​​ stack featuring breakthrough research​​​‌ advances enabled by multidisciplinary​ collaborations between researchers. The​‌ main scientific challenges we​​ intend to address are:​​​‌ productivity, performance portability, heterogeneity,​ scalability and resilience, performance​‌ and energy efficiency.

PEPR​​ Cloud – DiVa

Participants:​​​‌ Jana Toljaga, Nevena​ Vasilevska, Tara Aggoun​‌, Mathieu Bacou,​​ Nicolas Derumigny, Gaël​​​‌ Thomas.

Partners: LIP6,​ LIG, IRIT, Inria Paris,​‌ Benagil/Telecom SudParis

Coordinator: Gaël​​ Thomas, Télécom SudParis

Funding:​​​‌ 864 k€

Date: 2023-2030​

Summary: The DiVa project​‌ investigates new virtualization mechanisms​​ tailored for a disaggregated​​​‌ infrastructure and for an​ infrastructure composed of small​‌ edge infrastructures connected to​​ powerful data centers. In​​​‌ the context of a​ disaggregated cloud, the DiVa​‌ project will focus on​​ the virtualization interfaces, the​​​‌ scheduling, the use of​ programmable networks, and replication​‌ mechanisms. In the context​​ of the continuum between​​​‌ the edge and the​ cloud, the DiVa project​‌ will focus on migration​​ between heterogeneous machines, edge/edge​​ and edge/data center network​​​‌ optimizations, and virtualization interfaces‌ for micro virtual machines.‌​‌

PEPR Cloud – Archi-CESAM​​

Participants: Jean-François Dumollard,​​​‌ Harena Rakotondratsima, Mathieu‌ Bacou, Nicolas Derumigny‌​‌, Gaël Thomas.​​

Partners: Université de Rennes,​​​‌ Benagil/Telecom SudParis, Institut Polytechnique‌ de Grenoble, CEA, Inria‌​‌

Coordinator: Denis Dutoit, CEA​​

Funding: 580 k€

Date:​​​‌ 2023-2030

Summary: European sovereignty‌ in the cloud also‌​‌ means sovereignty over hardware,​​ especially processors and accelerators.​​​‌ Dennard's Law is now‌ over and Moore's Law‌​‌ is slowing down. In​​ this technological context, which​​​‌ will continue, the improvement‌ of processor performance will‌​‌ require hardware architectures that​​ evolve towards more parallelism​​​‌ (multi-core), more specialization (accelerators),‌ towards a closer relationship‌​‌ between computing and memory​​ and new types of​​​‌ interconnections between components. On‌ the other hand, by‌​‌ dissociating hardware resources (computing,​​ memory, interconnection) from logical​​​‌ resources, virtualization facilitates the‌ deployment of converged architectures‌​‌ that bring together the​​ computing, storage and network​​​‌ infrastructure. The cloud gains‌ in modularity, speed and‌​‌ agility for the deployment​​ of new services with​​​‌ optimal use of resources.‌ Hardware disaggregation on the‌​‌ one hand and resource​​ virtualization on the other​​​‌ are making the intermediate‌ adaptation layer increasingly complex,‌​‌ difficult to validate and​​ prone to failure. The​​​‌ Archi-CESAM project proposes to‌ rethink the hardware (computing,‌​‌ memory and interconnection) so​​ that it is co-designed​​​‌ with the application in‌ a perspective of converged‌​‌ architecture and trust, in​​ an environment known for​​​‌ its abundance of data‌ to be processed. The‌​‌ Archi-CESAM project addresses this​​ major evolution of the​​​‌ Cloud in a global‌ and coordinated approach between‌​‌ distributed architectures, acceleration, interconnection​​ and security bricks, without​​​‌ forgetting the design methods.‌

ANR PRC – FrugalDinet‌​‌

Participants: Gaël Thomas.​​

Partners: LIP6, LISTIC, Benagil/Inria​​​‌ Saclay, New-York University Shanghai‌

Coordinator: Pierre Sens, LIP6‌​‌

Funding: 171 k€

Date:​​ 2024-2028

Summary: In recent​​​‌ years, innovative hardware technologies‌ have emerged to enhance‌​‌ distributed computations in datacenters.​​ Programmable switches enable packet​​​‌ processing with user-defined functionality‌ on packets in transit.‌​‌ Similarly, SmartNIC DPUs offload​​ data-centric computations from host​​​‌ CPUs. Simultaneously, the urgency‌ of climate and energy‌​‌ crises has emphasized the​​ need for frugal architectures.​​​‌ These technologies present an‌ opportunity to reduce overall‌​‌ network traffic from distributed​​ services, offloading computations from​​​‌ CPUs to the network‌ itself. They should be‌​‌ integrated in designing fundamental​​ distributed system components like​​​‌ failure detectors, group membership,‌ reliable broadcast, or consensus.‌​‌ We propose FrugalDinet a​​ framework to build reliable,​​​‌ low-cost distributed services, leveraging‌ these technologies which minimizes‌​‌ CPU usage in datacenters​​ and subsequently their energy​​​‌ consumption. Our holistic approach‌ extends key algorithms such‌​‌ as leader election, group​​ membership and broadcasting, necessary​​​‌ for the creation of‌ reliable services. We intend‌​‌ not only to offload​​ algorithmic logics on network​​​‌ elements, but also to‌ make opportunistic use of‌​‌ the information available at​​ the switch level. We​​​‌ also plan to introduce‌ a new high-level programming‌​‌ language facilitating transparent utilization​​ of these frugal, reliable​​​‌ distributed services. The implemented‌ frugal algorithms and programming‌​‌ abstractions will be applied​​​‌ to design a distributed​ transaction system

ANR PRCE​‌ – Centeanes

Participants: Pierre​​ Sutra.

Partners: Télécom​​​‌ SudParis, Université Paris Cité,​ École Polytechnique, Université de​‌ Paris 6.

Coordinator: Pierre​​ Sutra

Funding: 196 k€​​​‌

Date: 2025-2029

Summary: Cloud​ computing of the past​‌ was concerned with the​​ management of infrastructure resources,​​​‌ e.g., servers, VMs or​ containers. Today, serverless computing​‌ promises to abstract this​​ worry away. In this​​​‌ new paradigm, the quantum​ of computation is the​‌ function; a function-as-a-service​​ platform automatically manages deployment​​​‌ of functions, executing them​ on demand and at​‌ scale. This greatly simplifies​​ access to the cloud,​​​‌ letting the application developer​ focus on getting the​‌ application code right, and​​ ignore infrastructure issues.

Unfortunately,​​​‌ serverless computing remains difficult​ to use and to​‌ reason about. Indeed, the​​ serverless environment is inherently​​​‌ unpredictable and non-deterministic, making​ it hard to understand​‌ and to control. Being​​ distributed, serverless must cope​​​‌ with concurrency, unpredictable failures,​ or impossibility of consensus.​‌ On top of that,​​ serverless poses more, new​​​‌ challenges to the application​ programmer. Events may trigger​‌ the same function invoked​​ multiple times and/or terminate​​​‌ it before it has​ finished. Functions are stateless,​‌ starting from afresh every​​ time; but often it​​​‌ must access an external​ storage service, thus being​‌ exposed to stale or​​ inconsistent state. Finally, existing​​​‌ platforms suffer from inefficiencies,​ such as excessive data​‌ movement or random placement.​​

The Centeanes project aims​​​‌ to address these challenges​ from the perspectives of​‌ correctness, efficiency, and expressivity,​​ in a real application​​​‌ context. It will develop​ tools for specifying, programming​‌ and running correct-by-design serverless​​ applications. In detail, we​​​‌ propose a formal framework​ to study the foundations​‌ of serverless computing, including​​ function composition and fault-tolerance.​​​‌ This framework is implemented​ in a lightweight runtime​‌ environment, where stateful operations​​ and data locality are​​​‌ first class citizen. We​ also construct a toolchain​‌ to program and verify​​ serverless applications executing in​​​‌ the runtime. This verification​ toolchain simplifies the programming​‌ of applications and helps​​ enforce their correctness. The​​​‌ design is informed by,​ and will be validated​‌ against, benchmarks and full-scale​​ industrial cloud or edge​​​‌ applications built with Eclipse​ Zenoh.

ANR PRC –​‌ Maplurinum

Participants: Adam Chader​​, Mathieu Bacou,​​​‌ Gaël Thomas.

Partners:​ INPG, Inria Rennes, CEA,​‌ Benagil/Telecom SudParis

Coordinator: Gaël​​ Thomas, Telecom SudParis

Funding:​​​‌ 184 k€

Date: 2021-2025​

Summary: High-Performance architectures are​‌ increasingly heteregenous and incorporate​​ often specialized hardware. We​​​‌ have first seen the​ generalization of GPUs in​‌ the most powerful machines,​​ followed a few years​​​‌ later by the introduction​ of FPGAs. More recently​‌ we have seen nascence​​ of many other accelerators​​​‌ such as tensor processor​ units (TPUs) for DNNs​‌ or variable precision FPUs.​​ Recent hardware manufacturing trends​​​‌ make it very likely​ that specialization will not​‌ only persist, but increase​​ in future supercomputers. Because​​​‌ manually managing this heterogeneity​ in each application is​‌ complex and not maintainable,​​ we propose in this​​​‌ project to revisit how​ we design both hardware​‌ and operating systems in​​ order to better hide​​ the heterogeneity to supercomputer​​​‌ users. In summary, we‌ propose to rethink the‌​‌ hardware/software boundary in order​​ to hide the heterogeneity​​​‌ behind a common minimal‌ instruction set and a‌​‌ unified address space.

ANR​​ JCJC – VHS

Participants:​​​‌ Valentin Delis, François‌ Trahay.

Partners: CEA/DAM,‌​‌ Benagil/Telecom SudParis

Coordinator: Valentin​​ Delis, ensIIE

Funding: 225​​​‌ k€

Date: 2025-2029

Summary:‌ Magnetic tapes have been‌​‌ used to store computer​​ data since the 1950s,​​​‌ so the layman now‌ often considers it as‌​‌ an outdated technology. However,​​ tape storage is still​​​‌ and will remain essential‌ in many fields such‌​‌ as academic research, international​​ organisations or cloud companies​​​‌ for its strong practical‌ benefits: low cost per‌​‌ TB, low energy consumption,​​ longevity etc... This dependency​​​‌ on tapes has motivated‌ industrial efforts in technology‌​‌ improvements, resulting in much​​ faster data density progression​​​‌ on tape rather than‌ on disk. After recent‌​‌ breakthroughs in materials used,​​ tape capacity is expected​​​‌ to witness a massive‌ leap in coming years,‌​‌ increasing to several hundreds​​ of TB per tape.​​​‌ This evolution will amplify‌ the main benefits of‌​‌ tape storage.

However, tapes​​ have often been primarily​​​‌ considered for archiving cold‌ data, because of their‌​‌ main drawback: it takes​​ around a minute to​​​‌ mount a tape from‌ its shelf into a‌​‌ drive and position the​​ reading head before starting​​​‌ reading data. This explains‌ the current lack of‌​‌ academic effort to optimize​​ relatively frequent data accesses.​​​‌ Nevertheless, more and more‌ research projects require to‌​‌ handle tremendous volumes of​​ data, which are not​​​‌ only destined to be‌ archived but also regularly‌​‌ accessed for scientific analysis.​​ Budget constraints impose the​​​‌ usage of tape storage,‌ and optimizing tape data‌​‌ access therefore becomes more​​ and more significant, and​​​‌ not limited to improving‌ archive retrieval.

The general‌​‌ idea of the VHS​​ project is to propose​​​‌ new interactions between resource‌ management and tape systems.‌​‌ Using filesystem on tapes,​​ we plan to design​​​‌ novel data placement strategies‌ that will propose efficient‌​‌ data accesses by considering​​ tapes at the level​​​‌ of the storage hierarchy‌ by optimizing its operational‌​‌ cost. Our methodology starts​​ from the tapes themselves,​​​‌ to better understand the‌ physical processes involved in‌​‌ the different operations. Then,​​ we will leverage this​​​‌ knowledge to derive interactions‌ between tape and disk‌​‌ storage systems in order​​ to improve data placement.​​​‌

PIA Camelia

Participants: Élisabeth‌ Brunet, Gaël Thomas‌​‌.

Partners: CEA, Inria,​​ CNRS, IMT, UGA, ECL,​​​‌ SU, INSA Rennes, UM,‌ UB, IJL, INL, IM2NP,‌​‌ UJM, Mines Paris, UniStra,​​ UPVD, UBO

Coordinator: C.​​​‌ Auliac et O. Santieys‌

Funding: 319 k€

Date:‌​‌ 2026-2032

Summary: Ce projet​​ a pour objectif la​​​‌ conception et le développement‌ d’un environnement et d’une‌​‌ pile logicielle permettant l’apprentissage​​ et l’inférence de grands​​​‌ réseaux de neurones, dans‌ des environnements exigeants en‌​‌ ressources, tels que le​​ near-edge, le Cloud ou​​​‌ le HPC. À ce‌ titre, il devra permettre‌​‌ de tirer pleinement parti​​ des accélérateurs matériels développés​​​‌ dans le cadre des‌ projets 1 (accélérateurs numériques),‌​‌ 2 (accélérateurs analogiques) et​​​‌ 3 (plateforme de co-intégration​ matérielle) du programme. Les​‌ solutions développées devront également​​ être suffisamment flexibles pour​​​‌ permettre l’exploitation ultérieure de​ cibles matérielles exogènes au​‌ programme, notamment des solutions​​ industrielles françaises ou européennes​​​‌ telles que celles de​ SiPearl, STMicroelectronics et Kalray.​‌ En complément, la facilité​​ de prise en main​​​‌ par les ingénieurs et​ chercheurs en IA (souvent​‌ peu familiers du matériel​​ ou des couches logicielles​​​‌ basses qu’ils exploitent) et​ la compatibilité avec les​‌ concepts émergents en IA,​​ sont des aspects clefs​​​‌ pour le succès du​ projet, qui seront étudiés​‌ de près.

Chist-ERA -​​ Redonda

Participants: Pierre Sutra​​​‌.

Partners: Institut Mines-Télécom,​ IMDEA Software Institute, University​‌ of Surrey, Royal Holloway​​ College - University of​​​‌ London, University of Neuchâtel​

Coordinator: Pierre Sutra

Funding:​‌ 320 k€

Date: 2023-2026​​

Summary: The Redonda project's​​​‌ ambition is to design​ a next-generation replication protocol​‌ for blockchain. To achieve​​ this, the project taps​​​‌ into recent advances in​ networking, secure computing and​‌ distributed systems. At the​​ scale of a datacenter,​​​‌ the protocol relies on​ two recent technologies: RDMA​‌ and TEE. Both technologies​​ are leveraged to create​​​‌ a sub-microsecond consensus layer​ that tolerates Byzantine failures.​‌ TEEs are also used​​ in a novel upgradable​​​‌ and portable smart contract​ engine to execute blockchain​‌ transactions across a variety​​ of infrastructures and hardware.​​​‌ Between datacenters, the protocol​ relies on leaderless state-machine​‌ replication. This recent approach​​ decomposes transaction ordering into​​​‌ two sub-tasks that can​ execute in parallel, without​‌ a central coordinator to​​ bottleneck the system. To​​​‌ ensure security and safety​ at runtime, the Redonda​‌ project creates the blockchain​​ protocol by composing mechanically-verified​​​‌ building blocks. The new​ blockchain protocol is assessed​‌ using real hardware against​​ benchmarks and publicly available​​​‌ traces. We target that​ it scales across hundreds​‌ of geo-distributed nodes while​​ offering 100k+ transactions per​​​‌ second and split-second latency.​

10 Dissemination

10.1 Promoting​‌ scientific activities

10.1.1 Scientific​​ events: organisation

  • Gaël Thomas​​​‌ : annual Inria Defi​ OS workshop (11/2025), annual​‌ PEPR DiVa workshop (05/2025)​​
  • Mathieu Bacou : annual​​​‌ thematic workshop of the​ working group "Virtualization" of​‌ CNRS's GDR RSD about​​ virtualization of systems and​​​‌ networks (12/2025)
Member of​ the organizing committees
  • François​‌ Trahay : participation to​​ the organization of the​​​‌ Per3S workshop as part​ of the steering committee;​‌

10.1.2 Scientific events: selection​​

Member of the steering​​​‌ committees
  • Gaël Thomas :​ chair of the steering​‌ committee of Compas (french)​​
  • François Trahay : member​​​‌ of the steering committee​ of Compas (french)
  • Pierre​‌ Sutra : member of​​ the steering committee for​​​‌ PaPoC
Member of the​ conference program committees
  • Gaël​‌ Thomas : member of​​ Usenix ATC 2025, Eurosys​​​‌ 2025, Apsys 2025, and​ Resdis 2025 program committee.​‌
  • François Trahay : member​​ of the ISC 2025​​​‌ program comittee.
  • Élisabeth Brunet​ : member of PDS​‌ 2025, Compas 2025, SC​​ 2025.
  • Valentin Delis :​​​‌ member of Cluster 2025​ and ESA 2025 program​‌ comittee.
  • Pierre Sutra :​​ TPC member for Middleware​​​‌ 2025, ICDCS 2025, PaPoC​ 2025, and SRDS 2025.​‌
Reviewer - reviewing activities​​
  • Valentin Delis : Reviewer​​ for TPDS

10.1.3 Invited​​​‌ talks

  • Gaël Thomas
    • 06/2025,‌ invited talk at Epita,‌​‌ téléGC: a barrier-free garbage​​ collector for disaggregated memory​​​‌
    • 07/2025, invited talk at‌ Sushi Seminar, téléGC: a‌​‌ barrier-free garbage collector for​​ disaggregated memory
  • François Trahay​​​‌
    • workshop ECLAT
    • journée scientifique‌ de l'Institut Polytechnique de‌​‌ Paris
    • tutorial on performance​​ analysis with EZTrace, as​​​‌ part of the Compas‌ conference
  • Pierre Sutra
    • keynote‌​‌ at LADC '25

10.1.4​​ Scientific expertise

  • François Trahay​​​‌ was a member of‌ selection committee for an‌​‌ Associate Professor position at​​ Télécom SudParis, May 2025.​​​‌
  • Élisabeth Brunet w as‌ a member of the‌​‌ selection committee for an​​ Associate Professor position at​​​‌ INSA Lyon in 2025.‌
  • Elisabeth Brunet was a‌​‌ member of the committee​​ awarding the Prix de​​​‌ thèse Gilles Kahn of‌ the SIF-Société Informatique‌​‌ de France in 2025.​​
  • Mathieu Bacou was a​​​‌ member of selection committee‌ for twin Associate Professor‌​‌ positions at Université de​​ Lille, May 2025.

10.1.5​​​‌ Research administration

  • François Trahay‌ : head of research‌​‌ action "Energy Efficiency" of​​ the Energy4Climate interdisciplinary center.​​​‌
  • François Trahay : head‌ of working group "Large‌​‌ Scale Computing" of CNRS's​​ GDR C4P.
  • Mathieu Bacou​​​‌ : co-head of working‌ group "Virtualization" of CNRS's‌​‌ GDR RSD.

10.2 Teaching​​ - Supervision - Juries​​​‌ - Educational and pedagogical‌ outreach

  • Master: François Trahay‌​‌ is the head of​​ the master of Computer​​​‌ Science at Institut Polytechnique‌ de Paris
  • Master: Pierre‌​‌ Sutra and Gaël Thomas​​ are the heads of​​​‌ the Parallel & Distributed‌ Systems master track at‌​‌ Institut Polytechnique de Paris​​
  • Engineering: Élisabeth Brunet is​​​‌ in charge of the‌ AI 3rd year track‌​‌ at Télécom SudParis
  • Engineering:​​ Pierre Sutra is in​​​‌ charge of the ASR‌ 3rd year track at‌​‌ Télécom SudParis
  • Engineering :​​ Valentin Delis is in​​​‌ charge of the CIDM‌ HPC track at ensIIE‌​‌ (2nd & 3rd year​​ of engineering program). Holder​​​‌ of the Chair "Technologies‌ avancées & émergentes pour‌​‌ la Souveraineté Numérique" between​​ ensIIE and CEA. 330h​​​‌ of teaching duties (including‌ administrative duties) at ensIIE‌​‌ from 1st to 3rd​​ year in both initial​​​‌ and apprenticeship training programs.‌ Teaching in CPES Data‌​‌ Science course at Lycée​​ International de Saclay (course​​​‌ leader: Maria Boritchev ,‌ Télécom Paris). Jury member‌​‌ for the oral examination​​ of Concours Mines-Telecom.

10.2.1​​​‌ Supervision

Phd in progress:‌

  • Jean-Francois Dumollard , "Virtualization‌​‌ techniques to enforce the​​ security of an operating​​​‌ system", supervised by G.‌ Thomas, M. Bacou and‌​‌ N. Derumigny
  • Catherine Guelque​​ , "Large scale performance​​​‌ analysis", supervised by F.‌ Trahay, and V. Delis‌​‌
  • Martin Horth , "Static​​ analysis methods for obfuscated​​​‌ software reverse engineering", supervised‌ by F. Trahay, and‌​‌ O. Levillain
  • Jules Risse​​ , "Fine-grain energy consumption​​​‌ measurement", supervised by F.‌ Trahay, and A. Guermouche‌​‌
  • Jana Toljaga , "Virtualization​​ techniques for persistent memory",​​​‌ supervised by G. Thomas,‌ M. Bacou and N.‌​‌ Derumigny
  • Guillermo Toyos Marfurt​​ , "A Next-Generation State-Machine​​​‌ Replication Protocol for Blockchain",‌ supervised by P. Sutra‌​‌ and P. Kuznetsov
  • Lucas​​ Van Lanker , "Performance​​​‌ projection of GPU applications",‌ supervised by F. Trahay,‌​‌ E. Brunet, and H.​​​‌ Taboada
  • Nevena Vasilevska ,​ "Hardware cache controlled by​‌ software for memory disaggregation",​​ supervised by G. Thomas,​​​‌ J. Dumas, and N.​ Derumigny
  • Tara Aggoun ,​‌ "Design and implementation of​​ a disaggregated Java virtual​​​‌ machine", supervised by G.​ Thomas and J.-P. Lozi​‌
  • Harena Rakotondratsima , "Design​​ and implementation of in-process​​​‌ isolation mechanisms", supervised by​ G. Thomas and N.​‌ Derumigny
  • Minh Tung Nguyen​​ , "Computability and Complexity​​​‌ in Mixed-Trust Distributed Systems",​ supervised by P. Sutra​‌

Defended Phd:

  • Mickaël Boichot​​ , "Caracterizing parallel applications​​​‌ for porting to multi-GPUs​ systems", supervised by P.​‌ Carribault, and E. Brunet​​
  • Adam Chader , "Large-scale​​​‌ garbage collectors", supervised by​ G. Thomas, and M.​‌ Bacou
  • Marie Reinbigler ,​​ "Frugal multiresolution analysis of​​​‌ gigapixel images : application​ to biomedical data and​‌ beyond", supervised by C.​​ Fetita, and E. Brunet​​​‌
  • Boubacar Kane , "Les​ objets ajustés : Une​‌ approche bien fondée et​​ efficace pour la programmation​​​‌ concurrente", supervised by P.​ Sutra

10.2.2 Juries

  • Gaël​‌ Thomas
    • Reviewer of the​​ PhDs of Papa Assane​​​‌ Fall, Nahuel Palumbo, Xiaoxiang​ (William) Wu (Australia), Nahuel​‌ Palumbo, Lana Scravaglieri, Simon​​ Lambert, Adrian Khelili, Aghiles​​​‌ Ait Messaoud, Guillermo Polito​ (HdR)
    • Examiner of the​‌ PhDs of Léo Cosseron,​​ Eduardo Tomasi Ribeiro, Himadri​​​‌ Pandya, Matthieu Bettinger, Ayush​ Pandey
  • François Trahay
    • President​‌ of the PhD committee​​ for Boubacar Kane, Institut​​​‌ Polytechnique de Paris
    • Reviewer​ of the PhDs of​‌ Aymeric Millan, Louis Boulanger,​​ Himadri Pandya
  • Pierre Sutra​​​‌
    • President of the PhD​ committee for Luciano Freitas​‌ de Souza, Institut Polytechnique​​ de Paris
  • Élisabeth Brunet​​​‌ : examiner of the​ PhD of Youssouph Faye.​‌
  • Mathieu Bacou
    • Expert member​​ of the jury to​​​‌ award the VAE "Expert​ en Sécurité des Systèmes​‌ d'Information (ESSI)" of ANSSI​​
    • Examiner of the PhD​​​‌ of Jean-Baptiste Decourcelle

11​ Scientific production

11.1 Major​‌ publications

  • 1 inproceedingsC.​​Catherine Guelque, V.​​​‌Valentin Honoré, P.​Philippe Swartvagher, G.​‌Gaël Thomas and F.​​François Trahay. PALLAS:​​​‌ a generic trace format​ for large HPC trace​‌ analysis.IPDPS 2025:​​ 39th IEEE International Parallel​​​‌ & Distributed Processing Symposium​39th IEEE International Parallel​‌ & Distributed Processing Symposium(IPDPS)​​Milan, Italy2025HAL​​​‌
  • 2 proceedingsAdjusted Objects:​ An Efficient and Principled​‌ Approach to Scalable Programming​​.MIDDLEWARE '25: 26th​​​‌ International Middleware ConferenceNashville​ (Tenessee), United StatesACM​‌December 2025, 215-227​​HALDOI
  • 3 proceedings​​​‌An Exact Characterization of​ the Two-shot Deterministic Objects​‌ Solving Two-process Consensus.​​PODC '25: ACM Symposium​​​‌ on Principles of Distributed​ ComputingSanta María Huatulco,​‌ MexicoACMJune 2025​​, 477-487HALDOI​​​‌

11.2 Publications of the​ year

International journals

International peer-reviewed conferences​​

Conferences without proceedings

Edition (books,‌​‌ proceedings, special issue of​​ a journal)

  • 11 proceedings​​​‌Adjusted Objects: An Efficient‌ and Principled Approach to‌​‌ Scalable Programming.MIDDLEWARE​​ '25: 26th International Middleware​​​‌ ConferenceNashville (Tenessee), United‌ StatesACMDecember 2025‌​‌, 215-227HALDOI​​back to text
  • 12​​​‌ proceedingsAn Exact Characterization‌ of the Two-shot Deterministic‌​‌ Objects Solving Two-process Consensus​​.PODC '25: ACM​​​‌ Symposium on Principles of‌ Distributed ComputingSanta María‌​‌ Huatulco, MexicoACMJune​​ 2025, 477-487HAL​​​‌DOIback to text‌
  • 13 proceedingsBrief Announcement:‌​‌ Revisiting Lower Bounds for​​ Two-Step Consensus.PODC​​​‌ '25: ACM Symposium on‌ Principles of Distributed Computing‌​‌Santa Huatulco Huatulco, France​​ACMJune 2025,​​​‌ 58-61HALDOI
  • 14‌ proceedingsMaking Democracy Work:‌​‌ Fixing and Simplifying Egalitarian​​ Paxos.29th International​​​‌ Conference on Principles of‌ Distributed Systems (OPODIS 2025)‌​‌Iaşi, RomaniaSchloss Dagstuhl​​ – Leibniz-Zentrum für Informatik​​​‌2026HALDOI

Doctoral‌ dissertations and habilitation theses‌​‌

  • 15 thesisB.Boubacar​​​‌ Kane. Adjusted objects​ : An efficient and​‌ principled approach to scalable​​ programming.Institut Polytechnique​​​‌ de ParisJanuary 2025​HAL

Reports & preprints​‌