EN FR
EN FR
CAMUS - 2025

2025Activity report‌​‌Project-TeamCAMUS

RNSR: 200920957V​​
  • Research center Inria Branch​​​‌ at the University of‌ Strasbourg
  • In partnership with:‌​‌Université de Strasbourg
  • Team​​ name: Compilation for multi-processor​​​‌ and multi-core architectures
  • In‌ collaboration with:Laboratoire des‌​‌ sciences de l'ingénieur, de​​ l'informatique et de l'imagerie​​​‌

Creation of the Project-Team:‌ 2023 October 01

Each‌​‌ year, Inria research teams​​​‌ publish an Activity Report​ presenting their work and​‌ results over the reporting​​ period. These reports follow​​​‌ a common structure, with​ some optional sections depending​‌ on the specific team.​​ They typically begin by​​​‌ outlining the overall objectives​ and research programme, including​‌ the main research themes,​​ goals, and methodological approaches.​​​‌ They also describe the​ application domains targeted by​‌ the team, highlighting the​​ scientific or societal contexts​​​‌ in which their work​ is situated.

The reports​‌ then present the highlights​​ of the year, covering​​​‌ major scientific achievements, software​ developments, or teaching contributions.​‌ When relevant, they include​​ sections on software, platforms,​​​‌ and open data, detailing​ the tools developed and​‌ how they are shared.​​ A substantial part is​​​‌ dedicated to new results,​ where scientific contributions are​‌ described in detail, often​​ with subsections specifying participants​​​‌ and associated keywords.

Finally,​ the Activity Report addresses​‌ funding, contracts, partnerships, and​​ collaborations at various levels,​​​‌ from industrial agreements to​ international cooperations. It also​‌ covers dissemination and teaching​​ activities, such as participation​​​‌ in scientific events, outreach,​ and supervision. The document​‌ concludes with a presentation​​ of scientific production, including​​​‌ major publications and those​ produced during the year.​‌

Keywords

Computer Science and​​ Digital Science

  • A1.1.1. Multicore,​​​‌ Manycore
  • A1.1.2. Hardware accelerators​ (GPGPU, FPGA, etc.)
  • A1.1.4.​‌ High performance computing
  • A2.1.1.​​ Semantics of programming languages​​​‌
  • A2.1.6. Concurrent programming
  • A2.1.7.​ Distributed programming
  • A2.1.10. Domain-specific​‌ languages
  • A2.2.1. Static analysis​​
  • A2.2.4. Parallel architectures
  • A2.2.5.​​​‌ Run-time systems
  • A2.2.6. GPGPU,​ FPGA...
  • A2.2.7. Adaptive compilation​‌
  • A2.2.8. Code generation
  • A4.5.​​ Formal method for verification,​​​‌ reliability, certification

Other Research​ Topics and Application Domains​‌

  • B4.5.1. Green computing
  • B6.1.1.​​ Software engineering
  • B6.6. Embedded​​​‌ systems

1 Team members,​ visitors, external collaborators

Research​‌ Scientists

  • Bérenger Bramas [​​INRIA, Researcher]​​​‌
  • Arthur Charguéraud [INRIA​, Senior Researcher,​‌ HDR]
  • Jens Gustedt​​ [INRIA, Senior​​​‌ Researcher, HDR]​
  • Thomas Koehler [CNRS​‌, Researcher]

Faculty​​ Members

  • Philippe Clauss [​​​‌Team leader, UNIV​ STRASBOURG, Professor,​‌ HDR]
  • Cedric Bastoul​​ [UNIV STRASBOURG,​​​‌ Professor, from Apr​ 2025]
  • Stephane Genaud​‌ [UNIV STRASBOURG,​​ Professor, HDR]​​​‌
  • Alain Ketterlin [UNIV​ STRASBOURG, Associate Professor​‌]
  • Vincent Loechner [​​UNIV STRASBOURG, Associate​​​‌ Professor]
  • Eric Violard​ [UNIV STRASBOURG,​‌ Associate Professor, HDR​​]

Post-Doctoral Fellow

  • Clément​​​‌ Flint [INRIA,​ Post-Doctoral Fellow, until​‌ Jul 2025]

PhD​​ Students

  • Ugo Battiston [​​​‌INRIA]
  • Guillaume Bertholon​ [UNIV STRASBOURG,​‌ until Aug 2025]​​
  • Raphael Colin [INRIA​​​‌]
  • Tom Hammer [​UNIV STRASBOURG]
  • Atoli​‌ Huppe [INRIA]​​
  • Yanni Lefki [INRIA​​​‌, from Oct 2025​]
  • Valeran Maytie [​‌ UNIV STRASBOURG, from​​ Oct 2025]
  • Clément​​​‌ Rossetti [UNIV STRASBOURG​, until Oct 2025​‌]

Technical Staff

  • Erwan​​ Auer [INRIA,​​​‌ Engineer]
  • Antoine Pierquin​ [UNIV STRASBOURG,​‌ Engineer]
  • Adilla Susungi​​ [UNIV STRASBOURG,​​​‌ Engineer, from Feb​ 2025]

Interns and​‌ Apprentices

  • Julien De Curieres​​ De Castelnau [INRIA​​, Intern, from​​​‌ Sep 2025]
  • Julien‌ Gaupp [INRIA,‌​‌ Intern, until Aug​​ 2025]
  • Ilyas Kermad​​​‌ [INRIA, Intern‌, from Jun 2025‌​‌ until Jul 2025]​​
  • Yanni Lefki [INRIA​​​‌, Intern, from‌ Mar 2025 until Aug‌​‌ 2025]
  • Valeran Maytie​​ [INRIA, Intern​​​‌, from Mar 2025‌ until Aug 2025]‌​‌
  • Elian Morel [INRIA​​, Intern, from​​​‌ May 2025]
  • Marceau‌ Noury [INRIA,‌​‌ Intern, until Jan​​ 2025]

Administrative Assistants​​​‌

  • Marine Dufourmantelle [INRIA‌]
  • Sylvie Hilbert [‌​‌CNRS]

2 Overall​​ objectives

The CAMUS team​​​‌ is focusing on developing,‌ adapting and extending automatic‌​‌ and semi-automatic parallelization and​​ optimization techniques, as well​​​‌ as proof and certification‌ methods, for accelerating applications‌​‌ with the efficient use​​ of current and future​​​‌ multi-processor and multicore hardware‌ platforms.

The team's research‌​‌ activities are organized into​​ three main axes which​​​‌ are: (1) semi-automatic and‌ assisted code optimization, (2)‌​‌ fully-automatic code optimization, and​​ (3) fundamental algorithms and​​​‌ mathematical tools. Axes (1)‌ and (2) include two‌​‌ sub-axes each: (1.1) interactive​​ program transformation, (1.2) new​​​‌ language constructs, (2.1) runtime‌ systems and dynamic analysis‌​‌ & optimization, and (2.2)​​ static analysis & optimization.​​​‌ Every axis may include‌ some activities related to‌​‌ interdisciplinary collaborations focusing on​​ high performance computing.

3​​​‌ Research program

While trusted‌ and fully automatic code‌​‌ optimizations are generally the​​ most convenient solutions for​​​‌ developers, the growing complexity‌ of software and hardware‌​‌ obviously impacts their scope​​ and effectiveness. Although fully​​​‌ automatic techniques can be‌ successfully applied in restricted‌​‌ contexts, it is often​​ beneficial to let expert​​​‌ developers make some decisions‌ on their own. Moreover,‌​‌ some expert knowledge, contextual​​ requirements, and hardware novelties​​​‌ cannot be immediately integrated‌ into automatic tools.

Thus,‌​‌ besides automatic optimizers that​​ play undoubtedly an important​​​‌ role, semi-automatic optimizers providing‌ helpful assistance to expert‌​‌ developers are also essential​​ for reaching high performance.​​​‌ Note that such semi-automatic‌ tools must ideally invoke‌​‌ fully automatic sub-parts, including​​ dependence analyzers, code generators,​​​‌ correctness checkers or performance‌ evaluators, in order to‌​‌ save the user from​​ the burden of these​​​‌ tasks and expand the‌ scope of the tools.‌​‌ Fully automatic tools may​​ either be used as​​​‌ standalone solutions, when targeting‌ the corresponding restricted codes,‌​‌ or used as satellite​​ tools for semi-automatic environments.​​​‌ Fully automatic mechanisms are‌ the elementary pieces of‌​‌ any more ambitious semi-automatic​​ optimizing tool.

Figure 1

Schematic image​​​‌ illustrating how the CAMUS‌ team focuses on both‌​‌ fully automatic methods (including​​ runtime systems, dynamic tools,​​​‌ and static analysis/optimization) and‌ semi-automatic methods (such as‌​‌ interactive transformations and the​​ introduction of new language​​​‌ constructs).

Figure 1:‌ General view of CAMUS'‌​‌ research objectives.

CAMUS' main​​ research axes are depicted​​​‌ in Figure 1.‌ Semi-automatic methods for code‌​‌ optimization will be implemented​​ either as interactive transformation​​​‌ tools, or as language‌ extensions allowing users to‌​‌ control the way programs​​ are transformed. Both approaches​​​‌ will be supported by‌ fully automatic processes devoted‌​‌ to baseline code analysis​​​‌ and transformation schemes. Such​ schemes may be either​‌ static, i.e. applied at​​ compile-time, or dynamic, i.e.​​​‌ applied while the target​ code runs. Note that​‌ these characteristics are not​​ mutually exclusive: one optimization​​​‌ process may include simultaneously​ a static and a​‌ dynamic part. Note also​​ that the invoked fully​​​‌ automatic processes may be​ very ambitious frameworks on​‌ their own, as for​​ instance implementing advanced speculative​​​‌ optimization strategies.

Strong advances​ in code analysis and​‌ transformation are often due​​ to fundamental algorithms and​​​‌ mathematical tools, that enable​ the extraction of important​‌ properties of programs, through​​ a constructive conceptual modeling.​​​‌ We believe that the​ investment in core mathematics​‌ and computer science research​​ must be permanent in​​​‌ the following directions:

  • Mathematics​ are obviously a great​‌ pool of modeling and​​ computing methods that may​​​‌ have a high impact​ in the field of​‌ program analysis and transformation.​​ Additionally, mathematical results must​​​‌ be adapted and transformed​ into algorithms which are​‌ usable for our purpose.​​ This task may require​​​‌ some mathematical extensions and​ the creation of fast​‌ and reliable algorithms and​​ implementations.
  • Some new contexts​​​‌ of use require the​ conception of new algorithms​‌ dedicated to well-known fundamental​​ and essential tasks. For​​​‌ instance, many standard code​ analysis and transformation algorithms,​‌ originally developed to be​​ exclusively used at compile-time,​​​‌ need to be revised​ to be used at​‌ runtime. Indeed, their respective​​ execution times may not​​​‌ be acceptable when analyzing​ and optimizing code on-the-fly.​‌ The time-overhead must be​​ dramatically lowered, while the​​​‌ ambitions may be adjusted​ to the new context.​‌ Typically, “optimal” solutions resulting​​ from time-consuming computations may​​​‌ not be the final​ goal of runtime optimization​‌ strategies. Sub-optimal solutions may​​ suffice, since the performance​​​‌ of a dynamically optimized​ code includes the time​‌ overhead of the runtime​​ optimization process.
  • It is​​​‌ always useful to identify​ a restricted class of​‌ programs to which very​​ efficient optimizations may be​​​‌ applied. Such a restricted​ class usually takes advantage​‌ of an accurate model.​​ Conversely, it may also​​​‌ be fruitful to target​ the removal of some​‌ restrictions regarding the class​​ of programs that are​​​‌ candidates for efficient optimizations.​
  • Other scientific disciplines may​‌ also provide fundamental strategies​​ to track code optimization​​​‌ issues. However, they may​ also require some prior​‌ adaptation. For instance, machine​​ learning techniques are more​​​‌ and more considered in​ the area of code​‌ optimization.

Collaborations with researchers​​ whose applications require high​​​‌ performance will be developed.​ Besides offering our expertise,​‌ we will especially use​​ their applications as an​​​‌ inspiration for new developments​ of optimization techniques. Those​‌ colleagues from other teams​​ will also play the​​​‌ role of beta testers​ for our semi-automatic code​‌ optimizers. Most research axes​​ of CAMUS will include​​​‌ such collaborations. The local​ scientific environment is particularly​‌ favorable to the setting​​ of interactions. For example,​​​‌ we participate in the​ inter-disciplinary institute IRMIA++ of​‌ the University of Strasbourg,​​ that facilitates collaborations with​​​‌ mathematicians developing high performance​ numerical simulations.

3.1 Semi-automatic​‌ and assisted code optimization​​

Programming languages, as they​​ are used in modern​​​‌ compute-intensive software, are relatively‌ poor in their possibilities‌​‌ to describe all known​​ properties of a particular​​​‌ code. On the one‌ hand, a language construct‌​‌ may over-specify the semantics​​ of the program, for​​​‌ example, imposing a specific‌ execution order for the‌​‌ iterations of a loop​​ whereas any order would​​​‌ have been correct. On‌ the other hand, a‌​‌ language construct may under-specify​​ the semantics of the​​​‌ program, for example, lacking‌ the ability to describe‌​‌ the fact that two​​ pointers must be distinct,​​​‌ or that a given‌ integer value is always‌​‌ less than a small​​ constant.

Modern tools that​​​‌ rewrite code for optimization,‌ be it internally as‌​‌ optimizing compiler passes or​​ externally as source-to-source transformations,​​​‌ miss a lot of‌ opportunities for the programmer‌​‌ to annotate and integrate​​ their knowledge of the​​​‌ code. As a consequence‌ fully-automatic tools, are not‌​‌ easily brought to their​​ full capacity and one-shot​​​‌ platform-specific programmer intervention is‌ required.

To advance this‌​‌ field, we will develop​​ re-usable and traceable features​​​‌ that provide the ability‌ for programmers to specify‌​‌ and control code transformations​​ and to annotate functional​​​‌ interfaces and code blocks‌ with all the meta-knowledge‌​‌ they have.

3.2 Fully-automatic​​ code optimization

We will​​​‌ focus on two main‌ code optimization and parallelization‌​‌ approaches: the polyhedral model,​​ based on a geometrical​​​‌ representation and transformation of‌ loops; and task-based model,‌​‌ based on a runtime​​ resolution of the dependencies​​​‌ between the tasks. Note‌ that these two approaches‌​‌ can potentially be mixed.​​

The polyhedral model is​​​‌ a great source of‌ new developments regarding fundamental‌​‌ mathematical tools dedicated to​​ code analysis and transformation.​​​‌ This model was originally‌ exclusively based on linear‌​‌ algebra. We have proposed​​ in the past some​​​‌ extensions to polynomials, and‌ we are currently investigating‌​‌ extensions to algebraic expressions.​​ In the meantime, we​​​‌ also focus on runtime‌ approaches that allow polyhedral-related‌​‌ techniques to be applied​​ to codes that are​​​‌ not usually well-suited candidates.‌ The motivation of such‌​‌ extensions is obviously to​​ propose new compilation techniques​​​‌ with enlarged scope and‌ better efficiency, that are‌​‌ either static, i.e, applied​​ at compile-time, or dynamic,​​​‌ i.e., applied at runtime.‌

We will also keep‌​‌ studying the task-based method​​ which is complementary to​​​‌ the polyhedral model, and‌ beneficial in scenarios that‌​‌ are not adapted to​​ the polyhedral model. For​​​‌ example, this method can‌ work when the description‌​‌ of the parallelism is​​ entirely performed at runtime,​​​‌ and it is able‌ to parallelize sections with‌​‌ arbitrary structures (i.e., not​​ necessarily loop nests).

In​​​‌ our project, we attempt‌ to bridge the gap‌​‌ between the task-based method​​ and the compiler by​​​‌ designing a novel automatic‌ parallelization mechanism with static‌​‌ source-to-source transformations. We also​​ work on improving the​​​‌ scheduling strategies or the‌ description of the parallelism‌​‌ by designing speculative execution​​ models that operate at​​​‌ runtime.

3.3 Fundamental algorithms‌ & mathematical tools

Regarding‌​‌ our fundamental and theoretical​​ studies, we plan to​​​‌ focus on three main‌ topics: (1) Trahrhe expressions,‌​‌ (2) mechanized metatheory and​​​‌ interactive program verification, and​ (3) programmable polyhedral scheduling.​‌

4 Application domains

High​​ performance computing plays a​​​‌ crucial role in the​ resolution of important problems​‌ of science and industry.​​ Additionally, software development companies,​​​‌ and software developers in​ general, are strongly constrained​‌ by the time-to-market issue,​​ while facing growing complexities​​​‌ related to hardware and​ correctness of the developed​‌ programs. Computers become more​​ and more powerful by​​​‌ integrating numerous and specialized​ processor cores, and programs​‌ taking advantage of such​​ hardware are more and​​​‌ more exposed to correctness​ issues.

Our goal is​‌ to provide automatic and​​ semi-automatic tools that will​​​‌ significantly lower the burden​ on developers. By ensuring​‌ a secured production of​​ correct and well-performing software,​​​‌ developers can mostly concentrate​ on the implemented functionalities,​‌ and produce quality software​​ in reasonable time.

Our​​​‌ scientific contributions are most​ of the time supported​‌ by a related developed​​ software, or an extension​​​‌ of an existing software.​ Its role is to​‌ highlight the automation of​​ the proposed analysis and​​​‌ optimization techniques, to highlight​ their effectiveness by exhibiting​‌ performance improvements on baseline​​ benchmark programs, and to​​​‌ facilitate their application on​ any program that would​‌ be targeted by some​​ potential users. Thus, our​​​‌ software tools must be​ made as accessible as​‌ possible for users of​​ science and industry, for​​​‌ experimenting the implemented optimization​ procedures with their specific​‌ programs. As such, we​​ usually propose a free​​​‌ non-commercial use, through an​ open-source software licence. While​‌ the software is made​​ available in a shape​​​‌ that allows for its​ use in full autonomy,​‌ we expect interested users​​ to contact us for​​​‌ some deeper exchanges related​ to their specific goals.​‌ Such exchanges may be​​ the start of some​​​‌ fruitful collaborations. Publishing our​ proposals in top rated​‌ conferences and journals may​​ obviously also result in​​​‌ a effective impact for​ their adoption and the​‌ use of the related​​ software.

Our contributions in​​​‌ analysis and optimization techniques​ of programs may find​‌ interested users in many​​ international companies, from semi-conductor​​​‌ industry actors, like ARM,​ SiPearl or STMicroelectronics, to​‌ big companies developing high​​ performance or deep learning​​​‌ applications. At a national​ or local level, any​‌ company whose innovative developments​​ require compute or data​​​‌ intensive applications, like Nyx​, or dedicated support​‌ tools, like Atos, may​​ be interested in our​​​‌ work, and potentially collaborate​ with us for more​‌ specific and dedicated research.​​ Since the project-team is​​​‌ hosted by the University​ of Strasbourg, contacts with​‌ many local companies are​​ made easier thanks to​​​‌ the hiring of former​ students, and to their​‌ involvement in teaching duties​​ and supervision of internship​​​‌ students.

5 Highlights of​ the year

  • The third​‌ edition of "Modern C"​​ by Jens Gustedt  35​​​‌ has been published by​ Manning and over all​‌ had about 185000 downloads​​ on HAL.

6 Latest​​​‌ software developments, platforms, open​ data

6.1 Latest software​‌ developments

6.1.1 TRAHRHE

  • Name:​​
    Trahrhe expressions and applications​​​‌ in loop optimization
  • Keywords:​
    Polyhedral compilation, Code optimisation,​‌ Source-to-source compiler
  • Functional Description:​​
    This software includes a​​ mathematic kernel for computing​​​‌ Trahthe expressions related to‌ iteration domains, as well‌​‌ as extensions implementing source-to-source​​ transformations of loops for​​​‌ applying optimizations based on‌ Trahrhe expressions.
  • News of‌​‌ the Year:
    A more​​ robust way of computing​​​‌ the ranking polynomials has‌ been implemented. A quite‌​‌ new version of the​​ software written in C/C++​​​‌ has been be published.‌
  • URL:
  • Publications:
  • Contact:
    Philippe​​​‌ Clauss
  • Participants:
    Clément Rossetti,‌ Philippe Clauss, Marceau Noury‌​‌

6.1.2 openCARP

  • Name:
    Cardiac​​ Electrophysiology Simulator
  • Keyword:
    Cardiac​​​‌ Electrophysiology
  • Functional Description:
    openCARP‌ is an open cardiac‌​‌ electrophysiology simulator for in-silico​​ experiments. Its source code​​​‌ is public and the‌ software is freely available‌​‌ for academic purposes. openCARP​​ is easy to use​​​‌ and offers single cell‌ as well as multiscale‌​‌ simulations from ion channel​​ to organ level. Additionally,​​​‌ openCARP includes a wide‌ variety of functions for‌​‌ pre- and post-processing of​​ data as well as​​​‌ visualization.
  • News of the‌ Year:
    Improvements of the‌​‌ code generation of ionic​​ models (limpetMLIR) : generation​​​‌ of CUDA and AMD‌ kernels. Building of all‌​‌ targets embedded in functions​​ for the runtime interface.​​​‌ StarPU interface to the‌ kernels. Benchmarks (execution time,‌​‌ energy consumption).
  • URL:
  • Publications:
  • Contact:
    Vincent Loechner
  • Participants:‌
    Vincent Loechner, Stephane Genaud,‌​‌ Antoine Pierquin, Adilla Susungi,​​ 3 anonymous participants
  • Partner:​​​‌
    Karlsruhe Institute of Technology‌

6.1.3 SPECX

  • Name:
    SPEculative‌​‌ eXecution task-based runtime system​​
  • Keywords:
    HPC, Parallelization, Task-based​​​‌ algorithm
  • Functional Description:
    Specx‌ (previously SPETABARU) is a‌​‌ task-based runtime system for​​ multi-core architectures that includes​​​‌ speculative execution models. It‌ is a pure C++11‌​‌ product without external dependency.​​ It uses advanced meta-programming​​​‌ and allows for an‌ easy customization of the‌​‌ scheduler. It is also​​ capable to generate execution​​​‌ traces in SVG to‌ better understand the behavior‌​‌ of the applications.
  • News​​ of the Year:
    In​​​‌ 2025, the paper that‌ presents the multi-GPUs and‌​‌ MPI version of Spec​​ has been published: Specx:​​​‌ a C++ task-based runtime‌ system for heterogeneous distributed‌​‌ architectures Paul Cardosi, Bérenger​​ Bramas, PeerJ CS.
  • URL:​​​‌
  • Publication:
  • Contact:‌
    Bérenger Bramas

6.1.4 Autovesk‌​‌

  • Keywords:
    HPC, Vectorization, Source-to-source​​ compiler
  • Functional Description:
    Autovesk​​​‌ is a tool to‌ produce vectorized implementation from‌​‌ static kernels.
  • News of​​ the Year:
    In 2025,​​​‌ Autovesk has been updated‌ to support more complex‌​‌ benchmarks.
  • URL:
  • Contact:​​
    Bérenger Bramas
  • Participant:
    Bérenger​​​‌ Bramas

6.1.5 PolyLib

  • Name:‌
    The Polyhedral Library
  • Keywords:‌​‌
    Rational polyhedra, Library, Polyhedral​​ compilation
  • Scientific Description:
    A​​​‌ C library used in‌ polyhedral compilation, as a‌​‌ basic tool used to​​ analyze, transform, optimize polyhedral​​​‌ loop nests. It has‌ been shipped in the‌​‌ polyhedral tools Cloog and​​ Pluto.
  • Functional Description:
    PolyLib​​​‌ is a C library‌ of polyhedral functions, that‌​‌ can manipulate unions of​​ rational polyhedra of any​​​‌ dimension. It was the‌ first to provide an‌​‌ implementation of the computation​​ of parametric vertices of​​​‌ a parametric polyhedron, and‌ the computation of an‌​‌ Ehrhart polynomial (expressing the​​ number of integer points​​​‌ contained in a parametric‌ polytope) based on an‌​‌ interpolation method.
  • Release Contributions:​​​‌
    Functions to manipulate LBLs​ (linearly bounded lattices) have​‌ been added in 2025.​​ MIT Licence.
  • News of​​​‌ the Year:
    Maintenance, upgrade​ of the user interface,​‌ upgrade of the build​​ process.
  • URL:
  • Publication:​​​‌
  • Contact:
    Vincent Loechner​
  • Participant:
    Vincent Loechner

6.1.6​‌ APOLLO

  • Name:
    Automatic speculative​​ POLyhedral Loop Optimizer
  • Keyword:​​​‌
    Automatic parallelization
  • Scientific Description:​
    APOLLO - Automatic speculative​‌ POLyhedral Loop Optimizer is​​ a compiler framework dedicated​​​‌ to automatic, dynamic and​ speculative parallelization and optimization​‌ of programs' loop nests.​​ This framework allows a​​​‌ user to mark in​ a C/C++ source code​‌ some nested loops of​​ any kind (for, while​​​‌ or do-while loops) in​ order to be handled​‌ by a speculative parallelization​​ process, to take advantage​​​‌ of the underlying multi-core​ processor architecture. The framework​‌ is composed of two​​ main parts: extensions to​​​‌ the CLANG-LLVM compiler and​ a runtime system.
  • Functional​‌ Description:
    APOLLO is dedicated​​ to automatic, dynamic and​​​‌ speculative parallelization of loop​ nests that cannot be​‌ handled efficiently at compile-time.​​ It is composed of​​​‌ a static part consisting​ of specific passes in​‌ the LLVM compiler suite,​​ plus a modified Clang​​​‌ frontend, and a dynamic​ part consisting of a​‌ runtime system. It can​​ apply on-the-fly any kind​​​‌ of polyhedral transformations, including​ tiling, and can handle​‌ nonlinear loops, as while-loops​​ referencing memory through pointers​​​‌ and indirections. Some recent​ extensions enabling dynamic multi-versioning​‌ have been implemented in​​ 2020.
  • News of the​​​‌ Year:
    Apollo has been​ upgraded to LLVM 17​‌ and to pluto 0.12.​​
  • URL:
  • Publications:
  • Contact:
    Philippe Clauss
  • Participants:​​
    Aravind Sukumaran Rajam, Erwan​​​‌ Auer, Raphael Colin, Juan​ Manuel Martinez Caamano, Manuel​‌ Selva, Philippe Clauss

6.1.7​​ OptiTrust

  • Name:
    OptiTrust
  • Keywords:​​​‌
    Code optimisation, Verification
  • Functional​ Description:
    The OptiTrust framework​‌ provides programmers with means​​ of optimizing their programs​​​‌ via user-guided source-to-source transformations.​ It leverages Separation Logic​‌ for checking that both​​ input and output programs​​​‌ satisfy the desired specification.​ The transformations maintain separation​‌ logic derivations, following the​​ concept of proof-carrying code.​​​‌
  • News of the Year:​
    OptiTrust has been extended​‌ to support validation of​​ functional correctness on the​​​‌ output code, following the​ proof-carrying code approach. A​‌ new case study on​​ LLM inference has been​​​‌ developed.
  • URL:
  • Contact:​
    Arthur Charguéraud
  • Participants:
    Arthur​‌ Charguéraud, Thomas Koehler, Guillaume​​ Bertholon

6.1.8 APAC

  • Keywords:​​​‌
    Source-to-source compiler, Automatic parallelization,​ Parallel programming
  • Scientific Description:​‌
    APAC is a compiler​​ for automatic parallelization that​​​‌ transforms C++ source code​ to make it parallel​‌ by inserting tasks. It​​ uses the tasks+dependencies paradigm​​​‌ and relies on OpenMP​ as runtime system. Internally,​‌ it is based on​​ Optitrust (and Clang-LLVM).
  • Functional​​​‌ Description:
    Automatic task-based parallelization​ compiler
  • News of the​‌ Year:
    Additional case studies​​ have been developed.
  • URL:​​​‌
  • Contact:
    Bérenger Bramas​
  • Participants:
    Marek Felsoci, Bérenger​‌ Bramas, Stephane Genaud

6.1.9​​ Rise & Shine

  • Keywords:​​​‌
    Programming language, Compilation
  • Functional​ Description:
    Programming language and​‌ compiler for array computing.​​ Programs are expressed at​​​‌ a high level in​ the RISE language. Programs​‌ are transformed using a​​ set of rewrite rules​​ that encode implementation and​​​‌ optimization choices. The Shine‌ compiler generates high-performance parallel‌​‌ C or OpenCL code​​ while preserving the optimization​​​‌ choices made during rewriting.‌
  • News of the Year:‌​‌
    A prototype Rise to​​ C compiler executable that​​​‌ preserves floating-point semantics was‌ added by Thomas Koehler,‌​‌ as part of his​​ collaboration with Eva Darulova​​​‌ (Uppsala University).
  • URL:
  • Contact:
    Thomas Koehler
  • Participant:‌​‌
    Thomas Koehler
  • Partner:
    Technische​​ Universität Berlin

6.1.10 egg-sketches​​​‌

  • Keyword:
    Program rewriting techniques‌
  • Functional Description:
    egg-sketches is‌​‌ a library adding support​​ for program sketches on​​​‌ top of the egg‌ (e-graphs good) library, an‌​‌ e-graph library optimized for​​ equality saturation. Sketches are​​​‌ program patterns that are‌ satisfied by a family‌​‌ of programs. They can​​ also be seen as​​​‌ incomplete or partial programs‌ as they can leave‌​‌ details unspecified. With egg-sketches,​​ it is possible to​​​‌ perform Guided Equality Saturation:‌ a semi-automatic technique that‌​‌ allows programmers to guide​​ rewriting via program sketches.​​​‌
  • News of the Year:‌
    Upgraded to latest egg‌​‌ dependency, added an extra​​ sketch construct, fixed two​​​‌ bugs.
  • URL:
  • Contact:‌
    Thomas Koehler
  • Participant:
    Thomas‌​‌ Koehler
  • Partner:
    TU Darmstadt​​

6.1.11 slotted-egraphs

  • Keyword:
    Term​​​‌ Rewriting Systems
  • Functional Description:‌
    Implementation of the slotted‌​‌ e-graph data structure, an​​ extension of e-graphs representing​​​‌ terms that differ only‌ in the names of‌​‌ their variables uniquely. With​​ slotted-egraphs, users of languages​​​‌ with variables can perform‌ equality saturation by: (1)‌​‌ defining the term language,​​ representing variables and binders​​​‌ via slots, (2) defining‌ rewrite rules, without having‌​‌ to worry about naming​​ collisions, and leveraging built-in​​​‌ mechanisms for freshness predicates‌ and substitutions, (3) performing‌​‌ equality saturation by initializing​​ a slotted e-graph, growing​​​‌ it by applying rewrites,‌ and extracting from it.‌​‌
  • News of the Year:​​
    Developement started in February​​​‌ 2024, led by Rudi‌ Schneider from TU Berlin.‌​‌ Instigated and supervised by​​ Thomas Koehler and Michel​​​‌ Steuwer. A paper was‌ accepted at PLDI 2025.‌​‌
  • URL:
  • Publication:
  • Contact:
    Thomas Koehler
  • Participant:​​​‌
    Thomas Koehler
  • Partners:
    Technische‌ Universität Berlin, TU Darmstadt‌​‌

6.1.12 Pesto

  • Name:
    Polyhedral​​ flExible loop-neST Optimizer
  • Keyword:​​​‌
    Optimizing compiler
  • Functional Description:‌
    This tool allows to‌​‌ easily apply polyhedral transformation​​ on C code, and​​​‌ particularly algebraic tiling. It‌ is divided in two‌​‌ parts : the command-line​​ interface, and the library.​​​‌
  • News of the Year:‌
    The first usable version‌​‌ of Pesto is now​​ available.
  • URL:
  • Contact:​​​‌
    Clément Rossetti

6.1.13 StrasGPT‌

  • Keywords:
    Polyhedral compilation, LLM,‌​‌ Automatic parallelization, Vectorization
  • Functional​​ Description:
    This program is​​​‌ a direct C implementation‌ of the Qwen3 /‌​‌ LLaMa 3.x / Mistral​​ LLM transformer architecture amongst​​​‌ others, reusing the tokenizer‌ and the sampler of‌​‌ Andrej Karpathy's llama2.c project​​ and its fork by​​​‌ James Delancey llama3.c. Given‌ an input prompt, StrasGPT‌​‌ can generate a text​​ that continues it. It​​​‌ was initially designed as‌ a parallel programming project‌​‌ for master students in​​ 2025 (students had to​​​‌ parallelize it with OpenMP‌ + MPI). It is‌​‌ now getting continued for​​ (polyhedral) compiler research.
  • News​​​‌ of the Year:
    Creation‌
  • Contact:
    Cedric Bastoul
  • Participant:‌​‌
    Cedric Bastoul

7 New​​​‌ results

7.1 Semi-automatic and​ assisted code optimization

7.1.1​‌ OptiTrust: Producing Trustworthy High-Performance​​ Code via Source-to-Source Transformations​​​‌

Participants: Arthur Charguéraud,​ Guillaume Bertholon, Thomas​‌ Koehler, Elian Morel​​, Julien de Castelnau​​​‌.

In 2025, we​ pursued the development of​‌ the OptiTrust prototype framework​​ for producing high-performance code​​​‌ via source-to-source transformations, with​ formal guarantees of correctness.​‌

We have extended the​​ framework to support full​​​‌ functional correctness assertions. OptiTrust​ thereby becomes a modern​‌ implementation of Necula's concept​​ of proof-carrying-code. We generalized​​​‌ two prior case studies,​ namely OpenCV's box-blur and​‌ TVM's matrix multiply, to​​ full functional correctness. This​​​‌ work is described as​ part of the PhD​‌ thesis of Guillaume Bertolon​​ 57. A 60-page​​​‌ journal article describing OptiTrust​ has been recently submitted​‌ it for publication at​​ a top-tier journal. In​​​‌ addition, Guillaume has presented​ a description of OptiTrust'​‌ bidirectional translation between C​​ and its internal lambda-calculus​​​‌ at JFLA'25 national workshop​ 22.

The Master​‌ internship of Elian Morel​​ contributed a case study​​​‌ on an LLM inference​ code. Preliminary results show​‌ that OptiTrust supports code​​ transformation on such a​​​‌ complex, realistic program. Elian​ also contributed an important​‌ technical addition to the​​ typechecker: an elaboration phase​​​‌ to automatically infer the​ numerous annotations (known as​‌ “ghost operations for focusing”),​​ which are needed to​​​‌ typecheck array-manipulating operations in​ separation logic.

The ongoing​‌ Master internship of Julien​​ de Castelnau aims to​​​‌ extend OptiTrust to support​ refinement from CPU to​‌ GPU code, optimization at​​ the GPU level, and​​​‌ extraction from GPU code​ to Cuda syntax.

7.1.2​‌ Sketch-Guided Polyhedral Compilation

Participants:​​ Valeran Maytié, Thomas​​​‌ Koehler, Cedric Bastoul​.

As part of​‌ Valéran Maytié's internship and​​ PhD, we developed a​​​‌ new semi-automatic, sketch-guided compilation​ approach. It enables users​‌ to write sketches that​​ guide the compiler towards​​​‌ key optimizations by describing​ the desired structure of​‌ the optimised code, without​​ worrying about how to​​​‌ get there. We introduce​ a sketch language that​‌ enables expressing the result​​ of imperative loop transformations​​​‌ and a new polyhedral​ algorithm capable of generating​‌ code constrained by both​​ a sketch and a​​​‌ computation specification. This work​ was presented at the​‌ IMPACT'26 workshop 27,​​ and we are working​​​‌ towards a full conference​ paper. This work has​‌ been done in collaboration​​ with Christophe Alias (Inria​​​‌ CASH).

7.1.3 Specx: A​ C++ Task-Based Runtime System​‌ for Heterogeneous Distributed Architectures​​

Participants: Bérenger Bramas.​​​‌

Bérenger Bramas and Paul​ Cardosi completed and submitted​‌ the paper presenting Specx​​ several years after the​​​‌ end of Paul Cardosi's​ contract, this paper has​‌ been published in 2025​​ 16). Specx is​​​‌ now capable of executing​ task graphs on heterogeneous​‌ distributed architectures. It provides​​ an elegant way to​​​‌ define task graphs and​ describe objects that the​‌ runtime system can move​​ or send.

7.1.4 Using​​​‌ the Discrete Wavelet Transform​ for Scientific Data Compression​‌

Participants: Atoli Huppé,​​ Clément Flint, Bérenger​​​‌ Bramas, Stéphane Genaud​, Philippe Helluy.​‌

As a follow up​​ of Clément Flint's thesis​​ work 59, in​​​‌ which we worked on‌ compressing simulation data for‌​‌ a Lattice Boltzmann application,​​ the Atoli Huppe 's​​​‌ PhD work aims to‌ propose a general compressor‌​‌ for scientific data. It​​ addresses the use case​​​‌ in which the data‌ to compress would be‌​‌ generated by a scientific​​ application on the GPU,​​​‌ and then the compression‌ would be directly performed‌​‌ on the data from​​ the GPU memory. The​​​‌ original data compressed into‌ blocks can then be‌​‌ decompressed when they are​​ needed for further computations,​​​‌ or saved from the‌ GPU to the disk‌​‌ through the CPU.

This​​ GPU implementation based on​​​‌ the Discrete Wawelet Transform‌ is, to the best‌​‌ of our knowledge, the​​ first full GPU, single-kernel,​​​‌ implementation using this compression‌ model. Our performance evaluation,‌​‌ conducted at the end​​ of 2025, shows that​​​‌ our compressor achieves a‌ higher compression ratio than‌​‌ the state-of-the-art compressor cuSZp2​​ 60 with comparable compression​​​‌ and decompression throughputs. A‌ preliminary version of this‌​‌ work was presented at​​ a COMPAS 32,​​​‌ and our latest results‌ will be submitted to‌​‌ an international conference.

7.1.5​​ Exploiting Ray Tracing Technology​​​‌ Through OptiX to Compute‌ Particle Interactions with Cutoff‌​‌ in a 3D Environment​​ on GPUs

Participants: Bérenger​​​‌ Bramas.

Bérenger Bramas‌ and David Algis worked‌​‌ on utilizing OptiX for​​ neighbor finding in n-body​​​‌ simulations. Several methods were‌ implemented, including two novel‌​‌ approaches based on new​​ geometric patterns. A preprint​​​‌ demonstrates that these methods‌ can achieve significant speedups‌​‌ when the grid is​​ sparse (i.e., when particles​​​‌ are not uniformly distributed)‌ 15.

7.1.6 Real-time‌​‌ ocean simulation

Participants: Bérenger​​ Bramas.

Bérenger Bramas​​​‌ collaborated with Emmanuelle Darles,‌ Lilian Aveneau and David‌​‌ Algis on real-time ocean​​ simulation. This work led​​​‌ to the development of‌ the Arc Blanc framework‌​‌ 13, 21,​​ a fully described GPU/CPU​​​‌ real-time pipeline for simulating‌ the free ocean surface‌​‌ and solid–fluid interactions while​​ preserving physical realism at​​​‌ large scale. The framework‌ includes improvements such as‌​‌ real-time computation of fluid​​ velocities at arbitrary depth​​​‌ and enhanced solid-to-fluid coupling.‌ In addition, we supported‌​‌ the integration of these​​ simulations into Unity by​​​‌ developing an open-source interoperability‌ tool between Unity compute‌​‌ shaders and CUDA, enabling​​ access to advanced GPU​​​‌ programming features not available‌ in Unity's native environment‌​‌ 14.

7.2 Fully-automatic​​ code optimization

7.2.1 Algebraic​​​‌ tiling

Participants: Clément Rossetti‌, Philippe Clauss.‌​‌

We propose a new​​ loop tiling approach based​​​‌ on the volumes of‌ the tiles, i.e.,‌​‌ the number of iterations​​ delimited by the tiles,​​​‌ instead of the sizes‌ of standard (hyper-)rectangular tiles,‌​‌ i.e., the sizes​​ of the edges of​​​‌ the tiles. In the‌ proposed approach, tiles are‌​‌ dynamically generated and have​​ almost equal volumes, even​​​‌ if their shape and‌ edge sizes may differ.‌​‌ The iteration domain is​​ well covered by a​​​‌ minimum number of tiles‌ that are all almost‌​‌ full. Since the bounds​​ of the generated tiles​​​‌ are not linear and‌ defined by algebraic mathematical‌​‌ expressions, we call this​​​‌ loop tiling technique algebraic​ tiling. It uses​‌ the mathematical engine TRAHRHE​​ also developed in the​​​‌ team.

Algebraic tiles are​ built by successive hierarchical​‌ slicing of the initial​​ iteration domain, from the​​​‌ outermost to the innermost​ depth dimensions of the​‌ target loop nest, in​​ a way ensuring that​​​‌ slices have all quasi-equal​ volumes. The bounds of​‌ the loop nests that​​ are handled must be​​​‌ constants, or linear functions​ of the surrounding loop​‌ iterators and of unknown​​ parameters – which are​​​‌ typically related to the​ data input size. Such​‌ loops are also called​​ polyhedral loops since they​​​‌ may be handled using​ the polyhedral model. Quasi-perfect​‌ load balancing is achieved​​ when each parallel loop​​​‌ is sliced using as​ many slices of quasi-equal​‌ volumes as parallel threads,​​ and when most of​​​‌ the iterations have close​ execution times. Thus, such​‌ dynamic slicing strategy makes​​ the resulting parallel loop​​​‌ scalable regarding the number​ of threads. Good data​‌ locality is reached by​​ slicing profitably the non-parallelized​​​‌ loops, and by slicing​ the parallel loops in​‌ a number of parts​​ equal to a multiple​​​‌ of the number of​ parallel threads. The number​‌ of generated slices for​​ each dimension may stay​​​‌ as a parameter at​ compile-time, making algebraic tiling​‌ a parameterized loop tiling​​ technique, and allowing the​​​‌ produced code to adapt​ to the number of​‌ parallel threads and data​​ layout. Our experiments show​​​‌ that algebraic tiling outperforms​ significantly (hyper-)rectangular tiling when​‌ parallelizing loops with OpenMP​​ using static scheduling, and​​​‌ mostly provides similar or​ lower execution times when​‌ compared to traditionally tiled​​ loops parallelized using dynamic​​​‌ scheduling of OpenMP. Thus,​ algebraic tiling makes dynamic​‌ scheduling fairly purposeless for​​ the handled loop nests.​​​‌

Algebraic tiling has been​ implemented in a source-to-source​‌ automatic code optimizer called​​ Pesto (6.1.12)​​​‌ by Clément Rossetti, who​ defended his thesis on​‌ the 18th of December​​ 2025.

7.2.2 Connecting Kokkos​​​‌ with the Polyhedral Model​

Participants: Ugo Battiston,​‌ Philippe Clauss.

The​​ increasing complexity of HPC​​​‌ hardware forces scientists to​ shift towards performance portable​‌ parallel programming models. Modern​​ C++ libraries, such as​​​‌ Kokkos, have become essential:​ they allow developers to​‌ write a single code​​ that runs efficiently on​​​‌ heterogeneous hardware (CPUs or​ GPUs).

However, this abstraction​‌ comes at a cost.​​ The heavy use of​​​‌ C++ templates and lambda​ functions inside Kokkos hides​‌ the control flow and​​ memory access patterns from​​​‌ the compiler. Consequently, advanced​ static analyzers, specifically those​‌ based on the polyhedral​​ model like LLVM's Polly​​​‌ extension, fail to detect​ optimization opportunities such as​‌ loop tiling or fusion,​​ leaving significant performance on​​​‌ the table.

We propose​ a novel approach to​‌ bridge the gap between​​ high level C++ abstractions​​​‌ and low-level polyhedral optimizations.​ We present a co-design​‌ strategy involving modifications to​​ both the Kokkos library​​​‌ and Polly. First, we​ modify and instrument Kokkos​‌ to produce a cleaner​​ intermediate representation (IR) structure​​​‌ and expose loop and​ data structures at the​‌ LLVM IR level. Second,​​ we extend Polly to​​ recognize these constructs and​​​‌ apply aggressive loop optimizing‌ and parallelizing transformations on‌​‌ Kokkos kernels.

We show​​ that this pipeline enables​​​‌ the automatic application of‌ polyhedral transformations on standard‌​‌ Kokkos codes. Our evaluation​​ on loop kernels from​​​‌ the Polybench benchmark suite,‌ rewritten using Kokkos, shows‌​‌ significant speedups reaching up​​ to 12.3 times over​​​‌ baseline Kokkos usage.

This‌ work has been presented‌​‌ by Ugo Battiston at​​ the Conférence francophone d'informatique​​​‌ en Parallélisme, Architecture et‌ Système (COMPAS 2025). A‌​‌ paper has been recently​​ submitted to an international​​​‌ conference.

7.2.3 Automatic Multi-Versioning‌ of Computation Kernels

Participants:‌​‌ Raphaël Colin, Erwan​​ Auer, Philippe Clauss​​​‌.

Compute-intensive scientific applications‌ usually combine multiple compute‌​‌ kernels. They can go​​ through various different execution​​​‌ phases, where the kernels‌ may operate with different‌​‌ parameters and in different​​ execution contexts. As such,​​​‌ standard compilers and generalist‌ optimization tools fail to‌​‌ optimize compute kernels to​​ the fullest regarding the​​​‌ execution contexts that the‌ application goes through.

Multi-versioning,‌​‌ iterative compilation and auto-tuning​​ are optimization techniques that​​​‌ aim to specialize the‌ optimization parameters of the‌​‌ target code. By using​​ feedback obtained from performance​​​‌ measurements at runtime, they‌ are able to choose‌​‌ the best implementation variant,​​ the best compiler optimizations,​​​‌ or the best set‌ of values for some‌​‌ parameters. However, these techniques​​ mostly generate code by​​​‌ relying on static information,‌ or by relying on‌​‌ the user to provide​​ different implementations of the​​​‌ same kernel, or to‌ reference relevant parameters to‌​‌ tune.

We are currently​​ designing a multi-versioning system​​​‌ which generates different efficient‌ versions of compute kernels‌​‌ at runtime, and selects​​ the best performing one​​​‌ for each encountered execution‌ context. The different versions‌​‌ that are generated result​​ from applying automatic loop​​​‌ optimizing and parallelizing transformations‌ that are based on‌​‌ the polyhedral model to​​ the LLVM intermediate representation.​​​‌ The latter is then‌ compiled on-the-fly using the‌​‌ LLVM Just-In-Time compiler. This​​ multi-versioning system is currently​​​‌ being implemented in the‌ Apollo dynamic parallelizer (‌​‌6.1.6).

The system​​ requires very few annotations​​​‌ from the user, and‌ uses information about the‌​‌ execution context that is​​ gathered at runtime, in​​​‌ order to guide the‌ automatic transformations of the‌​‌ compute kernels.

This work​​ has been presented by​​​‌ Raphaël Colin at the‌ Conférence francophone d'informatique en‌​‌ Parallélisme, Architecture et Système​​ (COMPAS 2025) 31.​​​‌

7.2.4 Dynamic Task Scheduling‌ with Multiple Priorities on‌​‌ Heterogeneous Computing Systems

Participants:​​ Hayfa Tayeb, Bérenger​​​‌ Bramas.

In the‌ context of Albert d'Aviau‌​‌ de Piolant and Hayfa​​ Tayeb's PhD work, Bérenger​​​‌ Bramas collaborated with Mathieu‌ Faverge, Abdou Guermouche, and‌​‌ Amina Guermouche to optimize​​ energy consumption in StarPU-based​​​‌ applications 29, addressing‌ a central challenge in‌​‌ high-performance computing (HPC), namely​​ improving energy efficiency without​​​‌ sacrificing too much performance.‌ A key lever explored‌​‌ in this work is​​ GPU power capping, a​​​‌ technique that enforces a‌ fixed upper power limit‌​‌ on devices such as​​ CPUs and GPUs, with​​​‌ the objective of reducing‌ energy usage while preserving‌​‌ acceptable throughput. The activity​​​‌ focused on evaluating the​ impact of static GPU​‌ power caps in heterogeneous​​ HPC environments, where multiple​​​‌ accelerators with potentially different​ performance characteristics are used​‌ concurrently, and where the​​ performance/energy trade-off becomes a​​​‌ scheduling and resource allocation​ problem rather than a​‌ simple per-device tuning problem.​​ The study first conducted​​​‌ an extensive characterization on​ a compute-intensive reference kernel—GEMM​‌ (matrix multiplication)—across multiple Nvidia​​ GPU architectures, in order​​​‌ to quantify how reducing​ the available power budget​‌ affects both execution time​​ and energy consumption. The​​​‌ results highlight that compute-bound​ kernels can become significantly​‌ more energy-efficient under moderate​​ power constraints: in particular,​​​‌ setting the GPU power​ limit in the range​‌ of 55-70% of the​​ Thermal Design Power (TDP)​​​‌ can yield up to​ 30% energy efficiency improvement​‌ with only limited performance​​ degradation. Building on these​​​‌ observations, the work then​ investigated how applying distinct​‌ power caps to different​​ GPUs within the same​​​‌ heterogeneous node can improve​ the global energy efficiency​‌ of real HPC workloads,​​ focusing on dense linear​​​‌ algebra task-based computations including​ matrix multiplication and Cholesky​‌ factorization. Importantly, the study​​ also demonstrated that the​​​‌ runtime scheduler (StarPU) can​ automatically adapt scheduling decisions​‌ to exploit the resulting​​ heterogeneity induced by different​​​‌ GPU power limits, thereby​ aligning task placement with​‌ each device's effective compute​​ capability under capping. Overall,​​​‌ on a platform equipped​ with four GPUs, applying​‌ power capping across all​​ devices led to substantial​​​‌ end-to-end efficiency gains, improving​ energy efficiency for matrix​‌ multiplication by up to​​ 24.3% in double precision​​​‌ and 33.78% in single​ precision, confirming that power-aware​‌ runtime-driven execution is a​​ practical and effective approach​​​‌ to reduce the energy​ footprint of heterogeneous HPC​‌ applications.

7.2.5 Scheduling multiple​​ task-based applications on distributed​​​‌ heterogeneous computing nodes

Participants:​ Jean-Etienne Ndamlabin, Bérenger​‌ Bramas.

The size,​​ complexity and cost of​​​‌ supercomputers continue to grow​ making any waste more​‌ critical than in the​​ past. Consequently, we need​​​‌ methods to reduce the​ waste coming from the​‌ users' choices, badly optimized​​ applications or heterogeneous workloads​​​‌ during executions. In this​ context, we worked on​‌ the scheduling of several​​ task-based applications on given​​​‌ hardware resources. Specifically, we​ created load balancing heuristics​‌ to distribute the task-graph​​ over the processing units.​​​‌ We validated our approach​ by implementing a super-scheduler​‌ in StarPU 18.​​

7.2.6 Automatic task-based parallelization​​​‌

Participants: Bérenger Bramas,​ Marek Felosci, Julien​‌ Gaupp, Stéphane Genaud​​.

We extended our​​​‌ approach to automatically parallelize​ any application using a​‌ task-based method. We reimplemented​​ APAC using OptiTrust (developed​​​‌ by our team), which​ enables source code transformations​‌ to be expressed compactly.​​ We addressed several challenges​​​‌ related to explicit synchronizations,​ dependency specifications for arrays,​‌ and code duplication required​​ to maintain both sequential​​​‌ and parallel versions. In​ addition, we created a​‌ purely LLVM-based version, which​​ supports a broader subset​​​‌ of the C++ language,​ but at the cost​‌ of more complex transformation​​ implementations 25, 30​​​‌.

7.2.7 Ionic Models​ Code Generation for Heterogeneous​‌ Architectures

Participants: Vincent Loechner​​, Stephane Genaud,​​ Cedric Bastoul, Adilla​​​‌ Susungi, Antoine Pierquin‌.

We participate in‌​‌ the research and development​​ of a cardiac electrophysiology​​​‌ simulator called openCARP (‌6.1.2) in the‌​‌ context of the MICROCARD-2​​ European project (8.3.1​​​‌). Our team provides‌ their optimizing compiler expertise‌​‌ to build a bridge​​ from a high-level DSL​​​‌ language convenient for ionic‌ model experts (EasyML) to‌​‌ a code that will​​ run efficiently on exascale​​​‌ supercomputers, using the MLIR‌ compiler framework. We have‌​‌ extended the capabilities of​​ openCARP for generating multiple​​​‌ parallel versions of the‌ ionic currents computation, hence‌​‌ enabling the exploitation of​​ the various parallel computing​​​‌ units that are available‌ in the target architecture‌​‌ nodes (multicore CPUs with​​ vector units, GPUs, etc.).​​​‌ We have collaborated with‌ members of the STORM‌​‌ team (Inria Bordeaux), also​​ implied in the MICROCARD-2​​​‌ project, to extend the‌ capability of executing simulations‌​‌ simultaneously on multiple CPUs​​ and GPUs.

In 2025,​​​‌ we improved the openCARP‌ software to:

  • robustify the‌​‌ compilation of MLIR generated​​ code;
  • optimize the code​​​‌ to avoid memory transfers‌ between GPUs and main‌​‌ node memory;
  • provide new​​ functions to access local​​​‌ variables of the ionic‌ model and permit the‌​‌ implementation of SDC (spectral​​ deferred correction) methods, in​​​‌ collaboration with our European‌ partner from ZIB (Berlin);‌​‌
  • investigate how to replace​​ the fast linear interpolation​​​‌ to approximate complex formulas‌ by polynomial interpolation, using‌​‌ the Sollya library.

7.2.8​​ Machine Learning Guided Equality​​​‌ Saturation

Participants: Thomas Koehler‌.

In joint work‌​‌ led by Nicole Heinimann​​ (PhD at TU Berlin​​​‌ supervised by Michel Steuwer),‌ Thomas Koehler has been‌​‌ exploring the idea of​​ guiding the equality saturation​​​‌ optimization technique through machine‌ learning. Equality saturation has‌​‌ successfully been applied in​​ many domains. Yet, scaling​​​‌ issues hold back its‌ success in even more‌​‌ applications. Thomas' prior work​​ proposed Guided Equality Saturation​​​‌ as a solution that‌ breaks challenging rewrite problems‌​‌ into a sequence of​​ equality saturations. However, this​​​‌ prior work relied on‌ human experts to provide‌​‌ insights in the form​​ of guides that describe​​​‌ when to stop one‌ equality saturation and start‌​‌ the next. The ongoing​​ effort, presented at the​​​‌ EGRAPHS'25 workshop 54,‌ attempts to reduce the‌​‌ reliance on human experts.​​ The goal of Machine​​​‌ Learning Guided Equality Saturation‌ is to automatically generate‌​‌ guides using a machine​​ learning model. The training​​​‌ setup and machine learning‌ model went through multiple‌​‌ design iterations already, and​​ experiments are ongoing to​​​‌ assess how effective this‌ approach is on challenging‌​‌ workloads.

7.2.9 Combining Optimization​​ and Numerical Analysis of​​​‌ Functional Array Programs

Participants:‌ Thomas Koehler.

In‌​‌ joint work with Eva​​ Darulova (Uppsala University, Sweden),​​​‌ Thomas Koehler is working‌ towards combining program optimization‌​‌ with numerical analysis. Eva​​ and Thomas co-supervised two​​​‌ Master students at Uppsala‌ University who finished their‌​‌ thesis in 2025. Simon​​ Björklund wrote his thesis​​​‌ on Numerical Analysis of‌ Highly Performant Functional Array‌​‌ Programs58. Filip​​ von Knorring wrote his​​​‌ thesis on Exploring Accuracy‌ and Performance Trade-offs in‌​‌ Functional Array Programs64​​​‌. Eva and Thomas​ are now working towards​‌ an international-level publication for​​ this work.

7.3 Fundamental​​​‌ algorithms & mathematical tools​

7.3.1 Trahrhe expressions

Participants:​‌ Philippe Clauss, Clément​​ Rossetti, Marceau Noury​​​‌.

In the mid-1990s,​ Philippe Clauss and Vincent​‌ Loechner introduced the mathematical​​ theory of Ehrhart polynomials​​​‌ in computer science for​ the quantitative analysis of​‌ iterative programs 3,​​ 6. These special​​​‌ mathematical objects give the​ exact number of integer​‌ points contained in a​​ polyhedron depending linearly on​​​‌ parameters. In the context​ of polyhedral modeling of​‌ nested loops, this number​​ can correspond to the​​​‌ total number of iterations,​ the number of parallel​‌ iterations, the number of​​ accessed data, etc.

A​​​‌ particular use of these​ Ehrhart polynomials are ranking​‌ polynomials. Such polynomials​​ give the position, or​​​‌ rank, of an iteration​ of a loop nest,​‌ according to the lexicographic​​ order of execution of​​​‌ the iterations. These polynomials​ are determined by calculating​‌ the number of integer​​ points lexicographically inferior to​​​‌ any point in the​ polyhedral domain of the​‌ iterations. Philippe Clauss has​​ shown a first application​​​‌ of such polynomials to​ data layout transformation for​‌ optimal spatial locality in​​ 2000.

More recently, we​​​‌ have been interested in​ inverting such ranking polynomials,​‌ in order to be​​ able to determine, for​​​‌ a given rank, what​ are the corresponding loop​‌ indices. This unranking problem​​ is particularly challenging from​​​‌ a theoretical and practical​ point of view. Thanks​‌ to the specific properties​​ of ranking polynomials, we​​​‌ have developed a method​ for inverting such polynomials​‌ by solving uni-variate polynomial​​ equations and propagating the​​​‌ integer floors of the​ roots to lower dimensions​‌ 2.

Since 2019,​​ the mathematical engine computing​​​‌ Trahrhe expressions has been​ developed as a software​‌ (TRAHRHE) (​​6.1.1) usable for​​​‌ several loop optimization purposes,​ as non-rectangular loop collapsing​‌ 2 or algebraic loop​​ tiling 66. A​​​‌ completely revised version written​ in C++ and implementing​‌ many improvements has been​​ developed by Marceau Noury​​​‌ , Clément Rossetti and​ Philippe Clauss . It​‌ is now available from​​ the website.

7.3.2 Z-Polyhedra​​​‌ and LBLs in PolyLib​

Participants: Vincent Loechner.​‌

Z-polyhedra were first introduced​​ in PolyLib (6.1.5​​​‌) in 2000, but​ this implementation suffered from​‌ several limitations. Since then,​​ significant advances have been​​​‌ made in defining a​ solid mathematical foundation and​‌ a sound normal form​​ for Z- polyhedra, LBLs​​​‌ (linearly bounded lattices) and​ their unions. We extended​‌ this theoretical work to​​ enable the manipulation of​​​‌ arbitrary union of LBLs​ or Z-polyhedra in PolyLib,​‌ using efficient algorithms to​​ perform set operations and​​​‌ transformations of unions of​ LBLs. When implementing the​‌ LBLs in PolyLib we​​ took special care to​​​‌ ensure safe and efficient​ memory allocations, to write​‌ efficient and robust functions,​​ and to validate them​​​‌ on a broad range​ of verified test examples.​‌ This work was presented​​ at the IMPACT workshop​​​‌ in January 2026 34​.

7.3.3 Polyhedral Scheduling​‌

Participants: Tom Hammer,​​ Vincent Loechner, Stephane​​ Genaud, Alain Ketterlin​​​‌, Cedric Bastoul,‌ Bérenger Bramas.

Scheduling‌​‌ is the central operation​​ in the polyhedral compilation​​​‌ chain, to find the‌ best execution order of‌​‌ loop iterations for parallelizing​​ and optimizing the code.​​​‌ Discovering the best polyhedral‌ schedules remains a challenge‌​‌ due to the huge​​ search space. Moreover, current​​​‌ classes of polyhedral schedulers‌ proceed from outer to‌​‌ inner loops, making them​​ unpractical for enforcing efficient​​​‌ vectorization in innermost loops.‌ We have shown those‌​‌ limitations in our survey​​ on polyhedral compilers 28​​​‌ presented at the HiPEAC‌ 2025 conference.

The PhD‌​‌ work of Tom Hammer​​ is currently investigating if​​​‌ bringing the results produced‌ by an auto-vectorizer can‌​‌ help choose a schedule​​ that enables both thread-parallelism​​​‌ and vectorization. To that‌ end, we have extended‌​‌ Autovesk 67 developed in​​ our team, which implements​​​‌ a Superword Level Parallelism‌ (SLP) algorithm, to track‌​‌ how vectorized instructions relate​​ to the original statement​​​‌ instruction instances. We then‌ use a modified version‌​‌ of the Pluto algorithm​​ to build a parallel​​​‌ schedule under the extra‌ constraints discovered by Autovesk.‌​‌ We finally check if​​ the schedule taking into​​​‌ account the vectorization dimension‌ is legal before generating‌​‌ a transformed loop nest.​​ We have tested this​​​‌ approach on the Polybench/C‌ suite and our findings‌​‌ so far are that​​ it does increase the​​​‌ number of vector instructions‌ generated when compiling with‌​‌ standard compilers (GCC, Clang).​​ However, the benefits of​​​‌ vectorization can be outweighed‌ by losses in data‌​‌ locality, which is favored​​ by the standard Pluto​​​‌ schedule. This work was‌ presented at the IMPACT'26‌​‌ Workshop 26.

7.3.4​​ Integer Polynomials and Polynomial​​​‌ Loops

Participants: Alain Ketterlin‌.

For some time‌​‌ now we have been​​ working on a specific​​​‌ representation of integer polynomials,‌ which has proved to‌​‌ be well fitted for​​ characterizing polyhedral programs properties​​​‌ (like the counts and‌ ranks of their instructions)‌​‌ 63. The same​​ representation has also been​​​‌ used to extend our‌ previous work on loop‌​‌ recognition in traces 9​​, which is now​​​‌ able to produce “polynomial‌ loops” thanks to very‌​‌ efficient polynomial interpolation techniques​​ 33. Both of​​​‌ these research lines have‌ introduced the notion of‌​‌ “polynomial loops”, i.e., loops​​ where all bounds and​​​‌ values are multivariate polynomials‌ in the surrounding loop‌​‌ counters, a model that​​ is, in its full​​​‌ generality, too expressive for‌ our current analysis and‌​‌ optimization abilities, but nicely​​ extends the classical polyhedral​​​‌ model.

This year's work‌ has focused on three‌​‌ more aspects of loops​​ involving integer polynomials in​​​‌ their control and or‌ computations. The first is‌​‌ their ability to be​​ systematically turned into perfect​​​‌ loops (i.e., loops whose‌ bodies are single constructs,‌​‌ either a sub-loop or​​ a single instruction), effectively​​​‌ turning control into computation.‌ Besides the theoretical interest‌​‌ of such loops, we​​ expect this to have​​​‌ an impact on how‌ such loops are executed,‌​‌ especially in the case​​ where dedicated hardware is​​​‌ produced. The second aspect‌ is the efficient execution‌​‌ of polynomial loops, which​​​‌ we have proved is​ only moderately more costly​‌ than their affine, non-perfect​​ counterparts, and may even​​​‌ be as efficient provided​ enough hardware is available.​‌ The third and last​​ aspect of the use​​​‌ of integer polynomials inside​ loops is their compilation​‌ on general purpose processors,​​ where the compiler is​​​‌ in charge of detecting​ their presence inside the​‌ code, and of optimizing​​ their computation. We expect​​​‌ this last aspect to​ have an impact on​‌ many numeric computations, but​​ also on programs manipulating​​​‌ multi-dimensional arrays, where non-linear​ address computations are pervasive.​‌

7.3.5 Slotted E-Graphs

Participants:​​ Thomas Koehler.

In​​​‌ joint work led by​ Rudi Schneider (PhD at​‌ TU Berlin supervised by​​ Michel Steuwer), Thomas Koehler​​​‌ and his collaborators have​ been working on efficiently​‌ representing (bound) variables in​​ e-graphs. An e-graph is​​​‌ a data structure at​ the heart of powerful​‌ optimization and reasoning techniques​​ such as equality saturation,​​​‌ that space-efficiently represents equal​ sub-terms uniquely. In their​‌ paper published at PLDI'25​​ 20, they present​​​‌ a novel approach to​ representing bound variables in​‌ e-graphs by making them​​ a first-class built-in feature​​​‌ of the data structure.​ Their slotted e-graph represents​‌ terms that differ only​​ by (bound or free)​​​‌ variable names uniquely. Slotted​ e-graphs are evaluated on​‌ two case studies from​​ compiler optimization and theorem​​​‌ proving to show that​ performing equality saturation for​‌ languages with bound variables​​ is greatly simplified and​​​‌ that it becomes possible​ to solve practically relevant​‌ problems that could not​​ be solved with e-graphs​​​‌ using names or de​ Bruijn indices.

7.3.6 Improvements​‌ of the C programming​​ language

Participants: Jens Gustedt​​​‌.

The C standards​ committee TC1/SC22/WG14 is now​‌ discussing changes for the​​ next version of the​​​‌ C standard, coined C2y​ at the moment. The​‌ discussion on these new​​ features took place in​​​‌ two face-to-face meetings in​ Graz, Austria, and Brno,​‌ Czech Republic.

In 2025​​ we contributed with several​​​‌ papers to the future​ revision. We contributed to​‌ the following subjects:

  • improvement​​ of syntax and semantics​​​‌ for arrays 55
  • type-safe​ minimum and maximum 39​‌
  • improvement of the preprocessor​​ 52
  • improvement of some​​​‌ problem spots concerning undefined​ behavior 534243​‌4548,
  • revision​​ of the thread and​​​‌ atomics features 3840​495046
  • continued​‌ work on function attributes​​ 44, 51
  • the​​​‌ new defer feature 47​,
  • C semantics for​‌ contracts 41

In addition​​ to the C standard,​​​‌ 62, the technical​ specification TS 6010 for​‌ a sound and verifiable​​ memory model that is​​​‌ based on provenance 61​ has now been published.​‌ Jens Gustedt had been​​ an editor and major​​​‌ contributor to this specification.​

To promote the new​‌ C standard, we also​​ published a C23 edtion​​​‌ of the book Modern​ C, 35, which​‌ finally appeared in print​​ in 2025. By keeping​​​‌ the rights also for​ this edition, we were​‌ able to maintain a​​ free online version on​​​‌ HAL which has again​ been a great success,​‌ with now (Jan. 2026)​​ more than 185000 downloads​​ in total.

7.3.7 Towards​​​‌ Pen-and-Paper-Style Equational Reasoning in‌ Interactive Theorem Provers by‌​‌ Equality Saturation

Participants: Thomas​​ Koehler.

Equations are​​​‌ ubiquitous in mathematical reasoning.‌ Often, however, they only‌​‌ hold under certain conditions.​​ As these conditions are​​​‌ usually clear from context‌ mathematicians regularly omit them‌​‌ when performing equational reasoning​​ on paper. In contrast,​​​‌ interactive theorem provers pedantically‌ insist on every detail‌​‌ to be convinced that​​ a theorem holds, hindering​​​‌ equational reasoning at the‌ more abstract level of‌​‌ pen-and-paper mathematics.

In joint​​ work led by Marcus​​​‌ Rossel (PhD at TU‌ Darmstadt supervised by Andrés‌​‌ Goens), we address this​​ issue by raising the​​​‌ level of equational reasoning‌ to enable pen-and-paper style‌​‌ in interactive theorem provers.​​ We achieve this by​​​‌ interpreting theorems as conditional‌ rewrite rules, and use‌​‌ equality saturation to automatically​​ derive equational proofs. Conditions​​​‌ that cannot be automatically‌ proven may be surfaced‌​‌ as proof obligations. Concretely,​​ we present how to​​​‌ interpret theorems as conditional‌ rewrite rules for a‌​‌ significant class of theorems.​​ Handling these theorems goes​​​‌ beyond simple syntactic rewriting,‌ and deals with aspects‌​‌ like propositional conditions and​​ type classes. We evaluate​​​‌ our approach by implementing‌ it as a tactic‌​‌ in Lean, using the​​ egg library for equality​​​‌ saturation with e-graphs. We‌ show four use cases‌​‌ demonstrating the efficacy of​​ this higher level of​​​‌ abstraction for equational reasoning.‌ This work is published‌​‌ at POPL'26 19.​​

7.3.8 Formal Proof of​​​‌ Space Bounds for Concurrent,‌ Garbage-Collected Programs

Participants: Arthur‌​‌ Charguéraud, Alexandre Moine​​.

Alexandre Moine, co-advised​​​‌ by Arthur Charguéraud and‌ François Pottier (Inria Paris)‌​‌ have presented a novel,​​ high-level program logic for​​​‌ establishing space bounds in‌ Separation Logic, for programs‌​‌ that execute with a​​ garbage collector. A key​​​‌ challenge is to design‌ sound, modular, lightweight mechanisms‌​‌ for establishing the unreachability​​ of a block. In​​​‌ the setting of a‌ high-level, ML-style language, a‌​‌ key problem is to​​ identify and reason about​​​‌ the memory locations that‌ the garbage collector considers‌​‌ as roots. Our recent​​ work has focused on​​​‌ generalizing our previous results‌ to handle concurrent programs.‌​‌ A key challenge is​​ to handle the fact​​​‌ that if an allocation‌ lacks free space, then‌​‌ it is blocked until​​ all other threads exit​​​‌ their critical section. Only‌ at that point may‌​‌ a GC execute and​​ free the requested space.​​​‌ To handle this challenge,‌ we propose to combine‌​‌ two language constructs: protected​​ sections (during which the​​​‌ GC cannot be triggered)‌ and polling points (where‌​‌ a thread pauses if​​ other threads request a​​​‌ GC execution). Our article‌ describing the results has‌​‌ appeared in the premier​​ journal TOPLAS 17.​​​‌

7.3.9 Typechecking of Overloading‌

Participants: Arthur Charguéraud.‌​‌

In joint work with​​ Martin Bodin from Inria​​​‌ Grenoble and Jana Dunfield‌ from Queen's School of‌​‌ Computing (Canada), Arthur Charguéraud​​ has been working on​​​‌ a typechecking algorithm for‌ resolving overloaded symbols. Overloading‌​‌ consists of using the​​ same symbol to refer​​​‌ to several functions, or‌ the same name to‌​‌ refer to several constants.​​​‌ Overloading is ubiquitous in​ mathematics. It also appears​‌ in numerous programming languages​​ that resolve overloading statically,​​​‌ as opposed to languages​ that rely on dynamic​‌ dispatch during program execution.​​ A key question is​​​‌ how to determine, for​ every occurrence of an​‌ overloaded symbol, which function​​ it refers to. Static​​​‌ resolution of overloading is​ intrinsically intertwined with typechecking:​‌ overloading resolution depends on​​ types, but the types​​​‌ of the overloaded symbols​ depend on how they​‌ are resolved.

We present​​ the first overloading resolution​​​‌ algorithm accompanied with a​ polynomial complexity bound. The​‌ bound is expressed in​​ terms of the size​​​‌ of the description of​ the instances, as well​‌ as of the size​​ of the typed tree​​​‌ to which the program​ resolves. In our algorithm,​‌ resolution is guided not​​ only by the type​​​‌ of function arguments, but​ also by the type​‌ expected by the context.​​ We allow candidate instances​​​‌ to have dependencies (assumptions).​ As in certain previously​‌ proposed algorithms, we take​​ a non-backtracking approach, which​​​‌ avoids exponential search.

Our​ implementation parses OCaml-style syntax​‌ where functions, constants, constructors​​ and record fields can​​​‌ be overloaded. We assume​ explicit quantification of polymorphic​‌ type variables. If all​​ overloaded symbols can be​​​‌ unambiguously resolved, our tool​ produces standard OCaml code,​‌ in which every overloaded​​ symbol is replaced with​​​‌ the value or name​ that it resolves to.​‌ Preliminary results have been​​ presented at the JFLA'25​​​‌ French workshop 23.​ An article submission is​‌ under preparation.

7.3.10 Binding​​ Boolean Expressions and Extended​​​‌ Pattern Matching

Participants: Arthur​ Charguéraud, Yanni Lefki​‌.

Functional programming languages​​ include various pattern matching​​​‌ features, such as guarded​ patterns, matching by custom​‌ predicate, active patterns, synonymous​​ patterns, etc. Besides, several​​​‌ languages include mechanisms for​ binding names as part​‌ of a boolean expression​​ that appears in either​​​‌ an if-statement, a while-loop​ condition, or a pattern​‌ guard. These names may​​ be bound either with​​​‌ a simple let-binding or​ via a test performed​‌ using pattern-matching. All these​​ features are useful in​​​‌ practice, yet it appears​ that no mainstream language​‌ supports them all at​​ once. In this work,​​​‌ we present a core​ language that consists of​‌ a small number of​​ constructs that suffice to​​​‌ encode and combine all​ the desired features of​‌ pattern matching and binding​​ boolean expressions. Thereby, we​​​‌ hope to consolidate existing​ knowledge on the topics​‌ of pattern matching and​​ generalized forms of boolean​​​‌ expressions, through a streamlined​ presentation. We expect it​‌ to be useful not​​ only for pedagogical purposes,​​​‌ but also potentially for​ simplifying the work of​‌ compiler developers. This work​​ has been presented at​​​‌ the ML family workshop​ (ML'25) colocated with ICFP.​‌ An article submission is​​ under preparation.

8 Partnerships​​​‌ and cooperations

8.1 International​ initiatives

8.1.1 Participation in​‌ other International Programs

CrOptAI​​ (Sophie-Germain Program)

 

Participants: Thomas​​​‌ Koehler, Valeran Maytie​, Cedric Bastoul.​‌

  • Title:
    Cross-Stack Optimisation for​​ AI
  • Partner Institutions:
    LIB​​​‌ UR 7534, Université Bourgogne​ Europe, France; University of​‌ Edinburgh, United Kingdom.
  • Date/Duration:​​
    from November 1, 2025​​ to October 31, 2026​​​‌ (1 Year).
  • Principal Investigators:‌
    Thomas Koehler, Annabelle Gillet‌​‌ (LIB), Eric Leclercq (LIB)​​
  • Funding Impact:
    This funding​​​‌ will enable collaboration and‌ synchronisation between the partners‌​‌ and their PhD researchers:​​ research visits, conference trips,​​​‌ hardware acquisition. We will‌ lay the foundations for‌​‌ further collaboration and funding.​​
  • Research Project:
    Improving the​​​‌ efficiency of artificial intelligence‌ computing is critical. Further,‌​‌ best performance is achieved​​ through optimisation decisions that​​​‌ cut through the entire‌ software stack, from high-level‌​‌ algorithmic choices down to​​ hardware execution choices. Our​​​‌ project is to explore‌ novel approaches to cross-stack‌​‌ optimisation, in order to​​ improve artificial intelligence performance​​​‌ while lowering engineering costs.‌

8.2 International research visitors‌​‌

8.2.1 Visits of international​​ scientists

  • Bastian Köpcke (postdoc​​​‌ at TU Berlin, Germany)‌ visited CAMUS to collaborate‌​‌ with Julien De Castelnau,​​ Thomas Koehler and Arthur​​​‌ Charguéraud on verified code‌ optimization for GPUs (1‌​‌ week research stay).
  • Reuben​​ Carolan (PhD at University​​​‌ of Edinburgh, UK) visited‌ CAMUS to collaborate with‌​‌ Valéran Maytie, Thomas Koehler​​ and Cedric Bastoul on​​​‌ sketch-guided polyhedral compilation (1‌ week research stay).

8.2.2‌​‌ Visits to international teams​​

  • Thomas Koehler visited Eva​​​‌ Darulova and her Datalogi‌ team at Uppsala University,‌​‌ Sweden for one week​​ in September 2025. Eva​​​‌ and Thomas are working‌ towards a publication on‌​‌ combining program optimization with​​ numerical analysis. This comes​​​‌ after co-supervising two Master‌ students, Simon Björklund and‌​‌ Filip von Knorring, who​​ finished their thesis in​​​‌ 2025.

8.3 European initiatives‌

8.3.1 Horizon Europe

MICROCARD-2‌​‌ Centre of Excellence (EuroHPC​​ and ANR)

 

Participants: Vincent​​​‌ Loechner, Stephane Genaud‌, Cedric Bastoul,‌​‌ Adilla Susungi, Antoine​​ Pierquin.

  • Title:
    MICROCARD-2:​​​‌ numerical modeling of cardiac‌ electrophysiology at the cellular‌​‌ scale
  • Duration:
    from November​​ 1, 2024 to April​​​‌ 30, 2027
  • Partners:
    Inria,‌ France; Karlsruher Institut Für‌​‌ Technologie, Germany; Megware, Germany;​​ Simula Research Laboratory (Simula),​​​‌ Norway; Technical University München‌ (TUM), Germany; Università degli‌​‌ Studi di Pavia, Italy;​​ Università di Trento (UTrento),​​​‌ Italy; Université de Bordeaux,‌ France; Université de Strasbourg,‌​‌ France.
  • Coordinator:
    Mark Potse​​, Université de Bordeaux​​​‌
  • WP4 leader:
    Vincent Loechner‌
  • Summary:

    The MICROCARD-2 project‌​‌ is coordinated by Université​​ de Bordeaux and involves​​​‌ the Inria teams Carmen,‌ Cardamom, Storm and TADaaM‌​‌ in Bordeaux, and CAMUS​​ in Strasbourg, among a​​​‌ total of ten partner‌ institutions in France, Germany,‌​‌ Italy, and Norway. This​​ Centre of Excellence for​​​‌ numerical modeling of cardiac‌ electrophysiology at the cellular‌​‌ scale builds on the​​ MICROCARD(-1) project (2021–2024), and​​​‌ has the same website‌.

    The modelling of‌​‌ cardiac electrophysiology at the​​ cellular scale requires thousands​​​‌ of model elements per‌ cell, of which there‌​‌ are billions in a​​ human heart. Even for​​​‌ small tissue samples such‌ models require at least‌​‌ exascale supercomputers. In addition​​ the production of meshes​​​‌ of the complex tissue‌ structure is extremely challenging,‌​‌ even more so at​​ this scale. MICROCARD-2 works,​​​‌ in concert, on every‌ aspect of this problem:‌​‌ tailored numerical schemes, linear-system​​ solvers, and preconditioners; dedicated​​​‌ compilers to produce efficient‌ system code for different‌​‌ CPU and GPU architectures​​​‌ (including the EPI and​ other ARM architectures); mitigation​‌ of energy usage; mesh​​ production and partitioning; simulation​​​‌ workflows; and benchmarking.

    The​ contribution of the CAMUS​‌ team concerns code optimization​​ of the ionic models,​​​‌ and implies the MLIR​ compiler frontend and SIMD​‌ code generation for CPUs,​​ plus GPU (Nvidia and​​​‌ AMD) code generation. An​ engineer and a junior​‌ researcher have been hired​​ from Jan./Feb. 2025.

8.4​​​‌ National initiatives

8.4.1 ANR​ OptiTrust

Participants: Arthur Charguéraud​‌, Thomas Koehler,​​ Guillaume Bertholon, Elian​​​‌ Morel, Julien François​ de Castelnau, Jens​‌ Gustedt.

Turning a​​ high-level, unoptimized algorithm into​​​‌ a high-performance code can​ take weeks, if not​‌ months, for an expert​​ programmer. The challenge is​​​‌ to take full advantage​ of vectorized instructions, of​‌ all the cores and​​ all the servers available,​​​‌ as well as to​ optimize the data layout,​‌ maximize data locality, and​​ avoid saturating the memory​​​‌ bandwidth. In general, annotating​ the code with "pragmas"​‌ is insufficient, and domain-specific​​ languages are too restrictive.​​​‌ Thus, in most cases,​ the programmer needs to​‌ write, by hand, a​​ low-level code that combines​​​‌ dozens of optimizations. This​ approach is not only​‌ tedious and time-consuming, it​​ also degrades code readibility,​​​‌ harms code maintenance, and​ can result in the​‌ introduction of bugs. A​​ promising approach consists of​​​‌ deriving an HPC code​ via a series of​‌ source-to-source transformations guided by​​ the programmer. This approach​​​‌ has been successfully applied​ in niche domains, such​‌ as image processing and​​ machine learning. We aim​​​‌ to generalize this approach​ to optimize arbitrary code.​‌ Furthermore, the OptiTrust project​​ aims at obtaining formal​​​‌ guarantees on the output​ code. A number of​‌ these transformations are correct​​ only under specific hypotheses.​​​‌ We will formalize these​ hypotheses, and investigate which​‌ of them can be​​ verified by means of​​​‌ static analysis. To handle​ the more complex hypotheses,​‌ we will transform not​​ just code but also​​​‌ formal invariants attached to​ the code. Doing so​‌ will allow exploiting invariants​​ expressed on the original​​​‌ code for justifying transformations​ performed at the n-th​‌ step of the transformation​​ chain.

  • Funding: ANR
  • Start:​​​‌ October 2022
  • End: September​ 2028
  • Coordinator: Arthur Charguéraud​‌ (Inria)
  • Partners: Inria team​​ Camus (Strasbourg), Inria team​​​‌ MACARON (formerly TONUS) (Strasbourg),​ Inria team Cambium (Paris),​‌ Inria team CASH (Lyon),​​ CEA team LIST

8.4.2​​​‌ ANR AUTOSPEC

Participants: Bérenger​ Bramas, Philippe Clauss​‌, Stéphane Genaud,​​ Marek Felosci, Anastasios​​​‌ Souris.

The AUTOSPEC​ project aims to create​‌ methods for automatic task-based​​ parallelization and to improve​​​‌ this paradigm by increasing​ the degree of parallelism​‌ using speculative execution. The​​ project will focus on​​​‌ source-to-source transformations for automatic​ parallelization, speculative execution models,​‌ DAG scheduling, and the​​ activation mechanisms for speculative​​​‌ execution. With this aim,​ the project will rely​‌ on a source-to-source compiler​​ that targets the C++​​​‌ language, a runtime system​ with speculative execution capabilities,​‌ and an editor (IDE)​​ to enable compiler-guided development.​​​‌ The outcomes from the​ project will be open-source​‌ with the objective of​​ developing a user community.​​ The benefits will be​​​‌ of great interest both‌ for developers who want‌​‌ to use an automatic​​ parallelization method, but also​​​‌ for high-performance programming experts‌ who will benefit from‌​‌ improvements of the task-based​​ programming. The results of​​​‌ this project will be‌ validated in various applications‌​‌ such as a protein​​ complexes simulation software, and​​​‌ widely used open-source software.‌ The aim will be‌​‌ to cover a wide​​ range of applications to​​​‌ demonstrate the potential of‌ the methods derived from‌​‌ this project while trying​​ to establish their limitations​​​‌ to open up new‌ research perspectives.

  • Funding: ANR‌​‌ (JCJC)
  • Start: October 2021​​
  • End: September 2025
  • Coordinator:​​​‌ Bérenger Bramas

8.4.3 Exa-SofT‌ project, PEPR NumPEx

Participants:‌​‌ Bérenger Bramas, Philippe​​ Clauss, Raphael Colin​​​‌, Ugo Battiston,‌ Erwan Auer.

Though‌​‌ significant efforts have been​​ devoted to the implementation​​​‌ and optimization of several‌ crucial parts of a‌​‌ typical HPC software stack.​​ Most HPC experts agree​​​‌ that exascale supercomputers will‌ raise new challenges, mostly‌​‌ because the trend in​​ exascale compute-node hardware is​​​‌ toward heterogeneity and scalability.‌ Compute nodes of future‌​‌ systems will have a​​ combination of regular CPUs​​​‌ and accelerators (typically GPUs),‌ along with a diversity‌​‌ of GPU architectures. Meeting​​ the needs of complex​​​‌ parallel applications and the‌ requirements of exascale architectures‌​‌ raises numerous challenges which​​ are still left unaddressed.​​​‌ As a result, several‌ parts of the software‌​‌ stack must evolve to​​ better support these architectures.​​​‌ More importantly, the links‌ between these parts must‌​‌ be strengthened to form​​ a coherent, tightly integrated​​​‌ software suite. The Exa-SofT‌ project aims at consolidating‌​‌ the exascale software ecosystem​​ by providing a coherent,​​​‌ exascale-ready software stack featuring‌ breakthrough research advances enabled‌​‌ by multidisciplinary collaborations between​​ researchers. The main scientific​​​‌ challenges we intend to‌ address are: productivity, performance‌​‌ portability, heterogeneity, scalability and​​ resilience, performance, and energy​​​‌ efficiency.

Philippe Clauss is‌ managing the work package‌​‌ 2 "Just-in-Time code optimization​​ with continuous feedback loop"​​​‌ of this project. He‌ is also involved in‌​‌ two major tasks of​​ this package devoted (1)​​​‌ to the integration of‌ polyhedral optimization techniques in‌​‌ the Kokkos framework and​​ (2) to the development​​​‌ of an dynamic multi-versioning‌ system.

  • Funding: PEPR NumPEx‌​‌
  • Start: September 2023
  • End:​​ August 2028
  • Coordinator: Raymond​​​‌ Namyst (Inria STORM)
  • WP2‌ co-leader: Philippe Clauss

8.4.4‌​‌ PEPR CAMELIA

Participants: Thomas​​ Koehler, Cedric Bastoul​​​‌, Arthur Charguéraud.‌

  • Funding:
    PEPR (3rd type)‌​‌
  • Title:
    Composants pour l'Accélération​​ Matérielle et Logicielle de​​​‌ l'IA
  • Duration:
    from March‌ 2026 to 2032 (6‌​‌ years).
  • Coordinators:
    Cédric Auliac​​ (CEA), Olivier Sentieys (Inria​​​‌ TARAN)
  • WP4 Coordinators:
    Fabrice‌ Rastello (Inria CORSE), H.P.‌​‌ Charles (CEA)
  • WP4.2 Coordinator:​​
    Thomas Koehler
  • Summary:

    The​​​‌ French government requires sovereign‌ access to key components‌​‌ required for AI and​​ its acceleration. In this​​​‌ context, the ASIC and‌ Numeric program agencies backed‌​‌ by CEA and Inria​​ were trusted with proposing​​​‌ a research and attractivity‌ strategy. This research program‌​‌ complements other national initiatives,​​ with a focus on​​​‌ developing modular hardware acceleration‌ components and their software‌​‌ stack. WP4 focuses on​​​‌ the software aspect of​ the project. WP4.2 tackles​‌ program representation and compilation​​ challenges: from high-level domain-specific​​​‌ languages down to low-level​ hardware targets, their runtime​‌ and ISAs.

    The contribution​​ of the CAMUS team​​​‌ is to coordinate WP4.2​ and to develop new​‌ compilation techniques that facilitate​​ prototyping AI code optimizations​​​‌ at all abstraction levels,​ from tensor expressions down​‌ to hardware ISAs. Ideally,​​ these techniques enable producing​​​‌ highly optimized AI code​ for new accelerators without​‌ having to rewrite hand-optimized​​ libraries or to redesign​​​‌ optimizing compilers.

9 Dissemination​

9.1 Promoting scientific activities​‌

9.1.1 Scientific events: organisation​​

General chair, scientific chair​​​‌
  • Thomas Koehler : Séminaire​ Pile Logicielle et Compilation​‌ pour l'IA, Aussois

9.1.2​​ Scientific events: selection

Member​​​‌ of the conference program​ committees
  • Arthur Charguéraud :​‌ SPAA'25 (ACM Symposium on​​ Parallelism in Algorithms and​​​‌ Architectures)
  • Arthur Charguéraud :​ ML'25 (ML family workshop,​‌ colocated with ICFP)
  • Arthur​​ Charguéraud : PLDI'25 (ACM​​​‌ Conference on Programming Language​ Design and Implementation)
  • Thomas​‌ Koehler : PLDI'25 (ACM​​ Conference on Programming Language​​​‌ Design and Implementation)
  • Cedric​ Bastoul : CC'26 (Intl​‌ Conference on Compiler Construction)​​
Reviewer
  • Thomas Koehler :​​​‌ EGRAPHS'25 workshop at PLDI'25​

9.1.3 Journal

Reviewer -​‌ reviewing activities
  • Thomas Koehler​​ : TACO (ACM)
  • Vincent​​​‌ Loechner : TACO (ACM),​ Journal of Symbolic Computation​‌ (Elsevier)
  • Philippe Clauss :​​ Journal of Supercomputing
  • Arthur​​​‌ Charguéraud : Journal of​ Functional Programming

9.1.4 Invited​‌ talks

  • Philippe Clauss has​​ been invited as keynote​​​‌ speaker to the 15th​ International Workshop on Polyhedral​‌ Compilation Techniques (IMPACT 2025),​​ January 22, 2025, Barcelona,​​​‌ Spain : Counting-based Loop​ Optimization.
  • Thomas Koehler​‌ : Guided Equality Saturation,​​ AST Lab, ETH Zürich,​​​‌ Switzerland
  • Thomas Koehler :​ A Case For Interactive​‌ Optimization Assistants, User-Schedulable Languages​​ Workshop, ASPLOS, Rotterdam, Netherlands​​​‌
  • Arthur Charguéraud : Binding​ Boolean Expressions and Extended​‌ Pattern Matching, Inria Cambium​​ team, Paris, France.

9.1.5​​​‌ Scientific expertise

  • Arthur Charguéraud​ has been reviewer for​‌ 2 ANR projects.
  • Jens​​ Gustedt is a member​​​‌ of the ISO/IEC working​ groups ISO/IEC PL1/SC22/WG14 and​‌ WG21 for the standardization​​ of the C and​​​‌ C++ programming languages, respectively.​

9.1.6 Research administration

  • Stéphane​‌ Genaud is the head​​ of the ICPS team​​​‌ for the ICube lab.​ Arthur Charguéraud is vice-head.​‌
  • Jens Gustedt is deputy​​ director of the ICube​​​‌ lab, responsible for the​ IT and CS policy​‌ and for the coordination​​ between the lab and​​​‌ the Inria center. In​ that function, he also​‌ represents ICube on the​​ board of the project​​​‌ committee of the Inria​ Centre at Université de​‌ Lorraine.
  • Jens Gustedt is​​ member of the steering​​​‌ committee of the interdisciplanary​ institute IRMIA++ of Strasbourg​‌ University.
  • Jens Gustedt is​​ (together with Philippe Helluy​​​‌ of the IRMA lab)​ responsible for the Inria​‌ PIQ program for the​​ Strasbourg site.
  • Arthur Charguéraud​​​‌ is a member of​ the COMIPERS jury for​‌ PhD and postdoc grants​​ at Inria Nancy Grand-Est.​​​‌
  • Arthur Charguéraud represents Inria​ at the meetings of​‌ the MSII doctoral school​​ (Mathématiques, Sciences de l'Information​​​‌ et de l'Ingénieur, ED269)​ in Strasbourg.
  • Bérenger Bramas​‌ is a member of​​ the CDT and IES​​ committee at Inria Nancy​​​‌ Grand-Est.

9.2 Teaching -‌ Supervision - Juries -‌​‌ Educational and pedagogical outreach​​

9.2.1 Teaching

  • Licence:
    • Philippe​​​‌ Clauss , Computer architecture,‌ 18h, L2, Université de‌​‌ Strasbourg, France
    • Vincent Loechner​​ , Algorithmics and programmation,​​​‌ 82h, L1, Université de‌ Strasbourg, France
    • Vincent Loechner‌​‌ , System administration, 40h,​​ Licence Pro, Université de​​​‌ Strasbourg, France
    • Vincent Loechner‌ , Parallel programming, 18h,‌​‌ M1, Université de Strasbourg,​​ France
    • Bérenger Bramas ,​​​‌ System programming, 24h, L2,‌ UFAZ, France-Azerbadjian
    • Alain Ketterlin‌​‌ , Culture et pratique​​ de l'Informatique, L1 Math-Info,​​​‌ 48h, Université de Strasbourg,‌ France
    • Alain Ketterlin ,‌​‌ Programmation système, L2 Math-Info,​​ 40h, Université de Strasbourg,​​​‌ France
    • Alain Ketterlin ,‌ Algorithmique et programmation, L1‌​‌ Math-Info, 66h, Université de​​ Strasbourg, France
    • Alain Ketterlin​​​‌ , Software Engineering (an‌ Anglais), L2 Math-Info, 64h,‌​‌ Université de Strasbourg, France​​
    • Stéphane Genaud , Algorithmics​​​‌ and programmation, 82h, L1,‌ Université de Strasbourg, France‌​‌
    • Stéphane Genaud , Data​​ Structures & Algorithms 2,​​​‌ 25h, L2, UFAZ, France-Azerbadjian‌
    • Stéphane Genaud , Parallel‌​‌ programming, 30h, M1, Université​​ de Strasbourg, France
  • Master:​​​‌
    • Philippe Clauss , Compilation,‌ 132h, M1, Université de‌​‌ Strasbourg, France
    • Philippe Clauss​​ , Real-time programming and​​​‌ system, 37h, M1, Université‌ de Strasbourg, France
    • Philippe‌​‌ Clauss , Code optimization​​ and transformation, 31h, M1,​​​‌ Université de Strasbourg, France‌
    • Vincent Loechner , Real-time‌​‌ systems, 12h, M1, Université​​ de Strasbourg, France
    • Bérenger​​​‌ Bramas , Compilation and‌ Performance, 24h, M2, Université‌​‌ de Strasbourg, France
    • Bérenger​​ Bramas , Compilation, 24h,​​​‌ M1, Université de Strasbourg,‌ France
    • Cedric Bastoul ,‌​‌ Parallel programming, 10h, M1,​​ Université de Strasbourg, France​​​‌
    • Cedric Bastoul , Compilation,‌ 36h, M1, Université de‌​‌ Strasbourg, France
    • Cedric Bastoul​​ , Research & Development​​​‌ Project, 20h, M2, Université‌ de Strasbourg, France
    • Stéphane‌​‌ Genaud , Cloud and​​ Virtualization, 12h, M1, Université​​​‌ de Strasbourg, France
    • Stéphane‌ Genaud , Large-Scale Data‌​‌ Processing, 15h, M1, Université​​ de Strasbourg, France
    • Stéphane​​​‌ Genaud , Distributed Storage‌ and Processing, 15h, M2,‌​‌ Université de Strasbourg, France​​
  • Eng. School:
    • Vincent Loechner​​​‌ , Parallel programming, 20h,‌ Telecom Physique Strasbourg -‌​‌ 3rd year, Université de​​ Strasbourg, France
    • Stéphane Genaud​​​‌ , Introduction to Operating‌ Systems, 16h, Telecom Physique‌​‌ Strasbourg - 1st year,​​ Université de Strasbourg, France​​​‌
    • Stéphane Genaud , Object-Oriented‌ Programming, 60h, Telecom Physique‌​‌ Strasbourg - 1st year,​​ Université de Strasbourg, France​​​‌
  • Free online course: Arthur‌ Charguéraud has made publicly‌​‌ available the solutions to​​ the 125+ exercises of​​​‌ his all-in-Rocq course on‌ the Foundations of Separation‌​‌ Logic.
  • DU IRMIA++ interdisciplinary​​ seminar, as well as​​​‌ seminar of the doctoral‌ school ED269 : Arthur‌​‌ Charguéraud , Introductory course​​ to Interactive Program Verification​​​‌ (3h), Université de Strasbourg,‌ France
  • Corps des mines:‌​‌ Arthur Charguéraud , Design​​ and Implementation of Educational​​​‌ Software (10h), Paris, France‌

Teaching tracks:

  • Philippe Clauss‌​‌ is in charge of​​ the master's degree in​​​‌ Computer Science of the‌ University of Strasbourg, since‌​‌ Sept. 2020.
  • Stéphane Genaud​​ is in charge of​​​‌ the Bachelor in Computer‌ Science and co-head of‌​‌ Master Data Science and​​ Artificial Intelligence at UFAZ​​​‌ (Baku, Azerbadjian) who delivers‌ Unistra diplomas. Since resp.‌​‌ Aug. 2023 and Aug.​​​‌ 2024.
  • Cedric Bastoul is​ in charge of the​‌ Software Science and Engineering​​ track of the Master's​​​‌ degree in Computer Science​ of the University of​‌ Strasbourg, since Sept. 2025.​​

9.2.2 Supervision

PhD completed:​​​‌

  • PhD defended in 2025:​ Guillaume Bertholon, Interactive Compilation​‌ via Trusworthy Source-to-Source Transformations,​​ advised by Arthur Charguéraud​​​‌ , since Sept 2022.​
  • PhD defended in 2025:​‌ Clément Rossetti, Algebraic Tiling:​​ Volume-guided Tiling of Parallel​​​‌ Loops for Near-Perfect Load​ Balancing, advised by Philippe​‌ Clauss , since Oct​​ 2022.
  • PhD defended in​​​‌ 2025: Hayfa Tayeb, Efficient​ scheduling strategies for the​‌ task-based parallelization, advised by​​ Bérenger Bramas , Abdou​​​‌ Guermouche (Inria project-team TOPAL),​ Mathieu Faverge (Inria project-team​‌ TOPAL), since Nov 2021.​​
  • PhD defended in 2025:​​​‌ David Algis, Hybridization of​ the Tessendorf method and​‌ Smoothed Particle Hydrodynamics for​​ real-time ocean simulation., advised​​​‌ by Bérenger Bramas ,​ Emmanuelle Darles (XLim), Lilian​‌ Aveneau (XLim lab), since​​ Oct 2022.

PhD in​​​‌ progress:

  • PhD in progress:​ Yanni Lefki, Foundational Verification​‌ of Interactively Optimized Programs,​​ is advised by Arthur​​​‌ Charguéraud , since Sept​ 2025.
  • PhD in progress:​‌ Raphaël Colin, Runtime multi-versioning​​ of parallel tasks, advised​​​‌ by Philippe Clauss and​ Thierry Gautier (Inria project-team​‌ Avalon), since Oct. 2023.​​
  • PhD in progress: Ugo​​​‌ Battiston, C++ complexity disambiguation​ for advanced optimizing and​‌ parallelizing code transformations, advised​​ by Philippe Clauss and​​​‌ Marc Pérache (CEA), since​ Oct. 2023.
  • PhD in​‌ progress: Tom Hammer, Synergie​​ entre ordonnancement et optimisation​​​‌ des accès mémoire dans​ le modèle polyédrique, advised​‌ by Stéphane Genaud and​​ Vincent Loechner , since​​​‌ Sept 2023.
  • PhD in​ progress: Valéran Maytie ,​‌ Optimizing LLMs with Sketch-Guided​​ Polyhedral Compilation, advised by​​​‌ Thomas Koehler and Cedric​ Bastoul , since Oct​‌ 2025.

9.2.3 Juries

  • Cedric​​ Bastoul has been reviewer​​​‌ and member of the​ jury for the PhD​‌ thesis of Vincent Alba​​ at the University of​​​‌ Bordeaux
  • Cedric Bastoul has​ been president of the​‌ jury for the PhD​​ thesis of Lana Scravaglieri​​​‌ at the University of​ Bordeaux
  • Cedric Bastoul has​‌ been president of the​​ jury for the Habilitation​​​‌ thesis of Quentin Bramas​ at the University of​‌ Strasbourg
  • Arthur Charguéraud has​​ been reviewer and member​​​‌ of the jury for​ the PhD thesis of​‌ Josué Moreau at the​​ University Paris–Saclay
  • Arthur Charguéraud​​​‌ has been garant and​ member of the jury​‌ for the Habilitation thesis​​ of Bérenger Bramas at​​​‌ the University of Strasbourg​
  • Jens Gustedt has been​‌ reviewer and member of​​ the jury for the​​​‌ thesis of Sébastien Michelland​ at the Université Grenoble​‌ Alpes

9.3 Popularization

9.3.1​​ Specific official responsibilities in​​​‌ science outreach structures

  • Arthur​ Charguéraud is co-founder and​‌ vice-president of the non-profit​​ organization France-ioi. This​​​‌ organization is in charge​ of the French participation​‌ to international olympiads in​​ informatics. It also organizes​​​‌ numerous contests, such as​ the Concours Castor, Concours​‌ Algorea, concours Alkindi, and​​ the French Olympiads in​​​‌ Informatics.
  • Arthur Charguéraud is​ a co-organizer of the​‌ Concours Castor informatique.​​ The purpose of the​​​‌ Concours Castor in to​ introduce pupils, from CM1​‌ to Terminale, to computer​​ sciences. 650,000 teenagers played​​ with the interactive exercises​​​‌ in November and December‌ 2025.

10 Scientific production‌​‌

10.1 Major publications

  • 1​​ inproceedingsU. A.Umut​​​‌ A Acar, V.‌Vitaly Aksenov, A.‌​‌Arthur Charguéraud and M.​​Mike Rainey. Provably​​​‌ and Practically Efficient Granularity‌ Control.PPoPP 2019‌​‌ - Principles and Practice​​ of Parallel ProgrammingWashington​​​‌ DC, United StatesFebruary‌ 2019HALDOI
  • 2‌​‌ inproceedingsP.Philippe Clauss​​, E.Ervin Altintas​​​‌ and M.Matthieu Kuhn‌. Automatic Collapsing of‌​‌ Non-Rectangular Loops.Parallel​​ and Distributed Processing Symposium​​​‌ (IPDPS), 2017Orlando, United‌ StatesIEEE InternationalMay‌​‌ 2017, 778 -​​ 787HALDOIback​​​‌ to textback to‌ text
  • 3 inproceedingsP.‌​‌Philippe Clauss. Counting​​ Solutions to Linear and​​​‌ Nonlinear Constraints Through Ehrhart‌ Polynomials: Applications to Analyze‌​‌ and Transform Scientific Programs​​.ICS, International Conference​​​‌ on SupercomputingACM International‌ Conference on Supercomputing 25th‌​‌ Anniversary VolumeMunich, Germany​​2014HALDOIback​​​‌ to text
  • 4 article‌P.Philippe Clauss,‌​‌ F. J.Federico Javier​​ Fernández, D.Diego​​​‌ Garbervetsky and S.Sven‌ Verdoolaege. Symbolic polynomial‌​‌ maximization over convex sets​​ and its application to​​​‌ memory requirement estimation.‌IEEE Transactions on Very‌​‌ Large Scale Integration (VLSI)​​ Systems178August​​​‌ 2009, 983-996HAL‌DOI
  • 5 articleP.-N.‌​‌Pierre-Nicolas Clauss and J.​​Jens Gustedt. Iterative​​​‌ Computations with Ordered Read-Write‌ Locks.Journal of‌​‌ Parallel and Distributed Computing​​7052010,​​​‌ 496­-504HALDOI
  • 6‌ articleP.Philippe Clauss‌​‌ and V.Vincent Loechner​​. Parametric Analysis of​​​‌ Polyhedral Iteration Spaces.‌Journal of Signal Processing‌​‌ Systems192July​​ 1998, 179-194HAL​​​‌DOIback to text‌
  • 7 bookJ.Jens‌​‌ Gustedt. Modern C​​.ManningNovember 2019​​​‌HAL
  • 8 articleA.‌Alexandra Jimborean, P.‌​‌Philippe Clauss, J.-F.​​Jean-François Dollinger, V.​​​‌Vincent Loechner and M.‌Martinez Juan Manuel.‌​‌ Dynamic and Speculative Polyhedral​​ Parallelization Using Compiler-Generated Skeletons​​​‌.International Journal of‌ Parallel Programming424‌​‌August 2014, 529-545​​HAL
  • 9 inproceedingsA.​​​‌Alain Ketterlin and P.‌Philippe Clauss. Prediction‌​‌ and trace compression of​​ data access addresses through​​​‌ nested loop recognition.‌6th annual IEEE/ACM international‌​‌ symposium on Code generation​​ and optimizationProceedings of​​​‌ the 6th annual IEEE/ACM‌ international symposium on Code‌​‌ generation and optimizationBoston,​​ United StatesACMApril​​​‌ 2008, 94-103HAL‌DOIback to text‌​‌
  • 10 inproceedingsA.Alain​​ Ketterlin and P.Philippe​​​‌ Clauss. Profiling Data-Dependence‌ to Assist Parallelization: Framework,‌​‌ Scope, and Optimization.​​MICRO-45, The 45th Annual​​​‌ IEEE/ACM International Symposium on‌ MicroarchitectureVancouver, CanadaDecember‌​‌ 2012HAL
  • 11 article​​B.Benoit Pradelle,​​​‌ A.Alain Ketterlin and‌ P.Philippe Clauss.‌​‌ Polyhedral parallelization of binary​​ code.ACM Transactions​​​‌ on Architecture and Code‌ Optimization84January‌​‌ 2012, 39:1--39:21HAL​​DOI
  • 12 articleA.​​​‌Aravind Sukumaran-Rajam and P.‌Philippe Clauss. The‌​‌ Polyhedral Model of Nonlinear​​ Loops.ACM Transactions​​​‌ on Architecture and Code‌ Optimization124January‌​‌ 2016HALDOI

10.2​​​‌ Publications of the year​

International journals

Invited conferences

International peer-reviewed​‌ conferences

Conferences without proceedings​​​‌

Scientific books​​​‌

Doctoral​‌ dissertations and habilitation theses​​

Reports & preprints​​​‌

Other scientific publications

  • 56​​​‌ inproceedingsA.Antoine Gicquel‌, O.Olivier Coulaud‌​‌ and B.Bérenger Bramas​​. Towards a composable​​​‌ abstraction of hierarchical methods‌ for matrix-vector product acceleration‌​‌.COMPAS 2025 -​​ Conférence francophone d'informatique en​​​‌ Parallélisme, Architecture et Système‌Bordeaux, FranceJune 2025‌​‌HAL

10.3 Cited publications​​

  • 57 phdthesisG.Guillaume​​​‌ Bertholon. Interactive compilation‌ via trustworthy source-to-source transformations‌​‌.Université de Strasbourg​​September 2025HALback​​​‌ to text
  • 58 mastersthesis‌S.Simon Björklund.‌​‌ Numerical Analysis of Highly​​ Performant Functional Array Programs​​​‌.MA ThesisUppsala‌ University, Department of Information‌​‌ TechnologyUppsala University, Department​​ of Information Technology2025​​​‌, 37back to‌ text
  • 59 phdthesisC.‌​‌Clément Flint. Efficient​​ data compression for high-performance​​​‌ PDE solvers.Université‌ de StrasbourgOctober 2024‌​‌HALback to text​​
  • 60 inproceedingsY.Yafan​​​‌ Huang, S.Sheng‌ Di, G.Guanpeng‌​‌ Li and F.Franck​​ Cappello. cuSZp2: A​​​‌ GPU Lossy Compressor with‌ Extreme Throughput and Optimized‌​‌ Compression Ratio.Proceedings​​ of the International Conference​​​‌ for High Performance Computing,‌ Networking, Storage, and Analysis‌​‌SC '24Atlanta, GA,​​ USAIEEE Press2024​​​‌, URL: https://doi.org/10.1109/SC41406.2024.00021DOI‌back to text
  • 61‌​‌ bookISO/IEC IS 9899:2024:​​ Programming languages - C​​​‌ - A provenance-aware memory‌ object model for C‌​‌.pub-ISO:adrpub-ISOMay​​ 2025, 23back​​​‌ to text
  • 62 book‌ISO/IEC IS 9899:2024: Programming‌​‌ languages - C.​​pub-ISO:adrpub-ISOOctober 2024​​​‌, 758back to‌ text
  • 63 inproceedingsA.‌​‌Alain Ketterlin. Easy​​ Counting and Ranking for​​​‌ Simple Loops.IMPACT‌ 2024 -- 14th International‌​‌ Workshop on Polyhedral Compilation​​ TechniquesMünich, GermanyJanuary​​​‌ 2024HALback to‌ text
  • 64 miscF.‌​‌Filip von Knorring.​​ Exploring Accuracy and Performance​​​‌ Trade-offs in Functional Array‌ Programs.Uppsala University,‌​‌ Computing Science2025back​​ to text
  • 65 inproceedings​​​‌V.Valeran Maytié,‌ R.Reuben Carolan,‌​‌ C.Christophe Alias,​​ C.Cedric Bastoul and​​​‌ T.Thomas Koehler.‌ Towards Optimising Programs with‌​‌ Sketch-Guided Polyhedral Compilation.​​IMPACT 2026 - International​​​‌ Workshop on Polyhedral Compilation‌ TechniquesCracovie (PL), Poland‌​‌January 2026HALback​​ to text
  • 66 inproceedings​​​‌C.Clément Rossetti and‌ P.Philippe Clauss.‌​‌ Algebraic Tiling.IMPACT​​ 2023, 13th International Workshop​​​‌ on Polyhedral Compilation Techniques‌Toulouse, FranceJanuary 2023‌​‌HALback to text​​
  • 67 articleH.Hayfa​​​‌ Tayeb, L.Ludovic‌ Paillat and B.Bérenger‌​‌ Bramas. Autovesk: Automatic​​ Vectorized Code Generation from​​​‌ Unstructured Static Kernels Using‌ Graph Transformations.ACM‌​‌ Trans. Archit. Code Optim.​​211December 2023​​​‌, URL: https://doi.org/10.1145/3631709DOI‌back to text