CAMUS

CAMUS - 2025

2025Activity report‌‌Project-TeamCAMUS

RNSR: 200920957V

Research center Inria Branch‌ at the University of‌ Strasbourg
In partnership with:‌‌Université de Strasbourg
Team name: Compilation for multi-processor‌ and multi-core architectures
In‌ collaboration with:Laboratoire des‌‌ sciences de l'ingénieur, de l'informatique et de l'imagerie‌

Creation of the Project-Team:‌ 2023 October 01

Each‌‌ year, Inria research teams‌ publish an Activity Report presenting their work and‌ results over the reporting period. These reports follow‌ a common structure, with some optional sections depending‌ on the specific team. They typically begin by‌ outlining the overall objectives and research programme, including‌ the main research themes, goals, and methodological approaches.‌ They also describe the application domains targeted by‌ the team, highlighting the scientific or societal contexts‌ in which their work is situated.

The reports‌ then present the highlights of the year, covering‌ major scientific achievements, software developments, or teaching contributions.‌ When relevant, they include sections on software, platforms,‌ and open data, detailing the tools developed and‌ how they are shared. A substantial part is‌ dedicated to new results, where scientific contributions are‌ described in detail, often with subsections specifying participants‌ and associated keywords.

Finally, the Activity Report addresses‌ funding, contracts, partnerships, and collaborations at various levels,‌ from industrial agreements to international cooperations. It also‌ covers dissemination and teaching activities, such as participation‌ in scientific events, outreach, and supervision. The document‌ concludes with a presentation of scientific production, including‌ major publications and those produced during the year.‌

Keywords

Computer Science and Digital Science

A1.1.1. Multicore,‌ Manycore
A1.1.2. Hardware accelerators (GPGPU, FPGA, etc.)
A1.1.4.‌ High performance computing
A2.1.1. Semantics of programming languages‌
A2.1.6. Concurrent programming
A2.1.7. Distributed programming
A2.1.10. Domain-specific‌ languages
A2.2.1. Static analysis
A2.2.4. Parallel architectures
A2.2.5.‌ Run-time systems
A2.2.6. GPGPU, FPGA...
A2.2.7. Adaptive compilation‌
A2.2.8. Code generation
A4.5. Formal method for verification,‌ reliability, certification

1 Team members, visitors, external collaborators

Research‌ Scientists

Bérenger Bramas [INRIA, Researcher]‌
Arthur Charguéraud [INRIA, Senior Researcher,‌ HDR]
Jens Gustedt [INRIA, Senior‌ Researcher, HDR]
Thomas Koehler [CNRS‌, Researcher]

Faculty Members

Philippe Clauss [‌Team leader, UNIV STRASBOURG, Professor,‌ HDR]
Cedric Bastoul [UNIV STRASBOURG,‌ Professor, from Apr 2025]
Stephane Genaud‌ [UNIV STRASBOURG, Professor, HDR]‌
Alain Ketterlin [UNIV STRASBOURG, Associate Professor‌]
Vincent Loechner [UNIV STRASBOURG, Associate‌ Professor]
Eric Violard [UNIV STRASBOURG,‌ Associate Professor, HDR]

Post-Doctoral Fellow

Clément‌ Flint [INRIA, Post-Doctoral Fellow, until‌ Jul 2025]

PhD Students

Ugo Battiston [‌INRIA]
Guillaume Bertholon [UNIV STRASBOURG,‌ until Aug 2025]
Raphael Colin [INRIA‌]
Tom Hammer [UNIV STRASBOURG]
Atoli‌ Huppe [INRIA]
Yanni Lefki [INRIA‌, from Oct 2025]
Valeran Maytie [‌ UNIV STRASBOURG, from Oct 2025]
Clément‌ Rossetti [UNIV STRASBOURG, until Oct 2025‌]

Technical Staff

Erwan Auer [INRIA,‌ Engineer]
Antoine Pierquin [UNIV STRASBOURG,‌ Engineer]
Adilla Susungi [UNIV STRASBOURG,‌ Engineer, from Feb 2025]

Interns and‌ Apprentices

Julien De Curieres De Castelnau [INRIA, Intern, from‌ Sep 2025]
Julien‌ Gaupp [INRIA,‌‌ Intern, until Aug 2025]
Ilyas Kermad‌ [INRIA, Intern‌, from Jun 2025‌‌ until Jul 2025]
Yanni Lefki [INRIA‌, Intern, from‌ Mar 2025 until Aug‌‌ 2025]
Valeran Maytie [INRIA, Intern‌, from Mar 2025‌ until Aug 2025]‌‌
Elian Morel [INRIA, Intern, from‌ May 2025]
Marceau‌ Noury [INRIA,‌‌ Intern, until Jan 2025]

Administrative Assistants‌

Marine Dufourmantelle [INRIA‌]
Sylvie Hilbert [‌‌CNRS]

2 Overall objectives

The CAMUS team‌ is focusing on developing,‌ adapting and extending automatic‌‌ and semi-automatic parallelization and optimization techniques, as well‌ as proof and certification‌ methods, for accelerating applications‌‌ with the efficient use of current and future‌ multi-processor and multicore hardware‌ platforms.

The team's research‌‌ activities are organized into three main axes which‌ are: (1) semi-automatic and‌ assisted code optimization, (2)‌‌ fully-automatic code optimization, and (3) fundamental algorithms and‌ mathematical tools. Axes (1)‌ and (2) include two‌‌ sub-axes each: (1.1) interactive program transformation, (1.2) new‌ language constructs, (2.1) runtime‌ systems and dynamic analysis‌‌ & optimization, and (2.2) static analysis & optimization.‌ Every axis may include‌ some activities related to‌‌ interdisciplinary collaborations focusing on high performance computing.

3‌ Research program

While trusted‌ and fully automatic code‌‌ optimizations are generally the most convenient solutions for‌ developers, the growing complexity‌ of software and hardware‌‌ obviously impacts their scope and effectiveness. Although fully‌ automatic techniques can be‌ successfully applied in restricted‌‌ contexts, it is often beneficial to let expert‌ developers make some decisions‌ on their own. Moreover,‌‌ some expert knowledge, contextual requirements, and hardware novelties‌ cannot be immediately integrated‌ into automatic tools.

Thus,‌‌ besides automatic optimizers that play undoubtedly an important‌ role, semi-automatic optimizers providing‌ helpful assistance to expert‌‌ developers are also essential for reaching high performance.‌ Note that such semi-automatic‌ tools must ideally invoke‌‌ fully automatic sub-parts, including dependence analyzers, code generators,‌ correctness checkers or performance‌ evaluators, in order to‌‌ save the user from the burden of these‌ tasks and expand the‌ scope of the tools.‌‌ Fully automatic tools may either be used as‌ standalone solutions, when targeting‌ the corresponding restricted codes,‌‌ or used as satellite tools for semi-automatic environments.‌ Fully automatic mechanisms are‌ the elementary pieces of‌‌ any more ambitious semi-automatic optimizing tool.

Figure 1:‌ General view of CAMUS'‌‌ research objectives.

CAMUS' main research axes are depicted‌ in Figure 1.‌ Semi-automatic methods for code‌‌ optimization will be implemented either as interactive transformation‌ tools, or as language‌ extensions allowing users to‌‌ control the way programs are transformed. Both approaches‌ will be supported by‌ fully automatic processes devoted‌‌ to baseline code analysis‌ and transformation schemes. Such schemes may be either‌ static, i.e. applied at compile-time, or dynamic, i.e.‌ applied while the target code runs. Note that‌ these characteristics are not mutually exclusive: one optimization‌ process may include simultaneously a static and a‌ dynamic part. Note also that the invoked fully‌ automatic processes may be very ambitious frameworks on‌ their own, as for instance implementing advanced speculative‌ optimization strategies.

Strong advances in code analysis and‌ transformation are often due to fundamental algorithms and‌ mathematical tools, that enable the extraction of important‌ properties of programs, through a constructive conceptual modeling.‌ We believe that the investment in core mathematics‌ and computer science research must be permanent in‌ the following directions:

Mathematics are obviously a great‌ pool of modeling and computing methods that may‌ have a high impact in the field of‌ program analysis and transformation. Additionally, mathematical results must‌ be adapted and transformed into algorithms which are‌ usable for our purpose. This task may require‌ some mathematical extensions and the creation of fast‌ and reliable algorithms and implementations.
Some new contexts‌ of use require the conception of new algorithms‌ dedicated to well-known fundamental and essential tasks. For‌ instance, many standard code analysis and transformation algorithms,‌ originally developed to be exclusively used at compile-time,‌ need to be revised to be used at‌ runtime. Indeed, their respective execution times may not‌ be acceptable when analyzing and optimizing code on-the-fly.‌ The time-overhead must be dramatically lowered, while the‌ ambitions may be adjusted to the new context.‌ Typically, “optimal” solutions resulting from time-consuming computations may‌ not be the final goal of runtime optimization‌ strategies. Sub-optimal solutions may suffice, since the performance‌ of a dynamically optimized code includes the time‌ overhead of the runtime optimization process.
It is‌ always useful to identify a restricted class of‌ programs to which very efficient optimizations may be‌ applied. Such a restricted class usually takes advantage‌ of an accurate model. Conversely, it may also‌ be fruitful to target the removal of some‌ restrictions regarding the class of programs that are‌ candidates for efficient optimizations.
Other scientific disciplines may‌ also provide fundamental strategies to track code optimization‌ issues. However, they may also require some prior‌ adaptation. For instance, machine learning techniques are more‌ and more considered in the area of code‌ optimization.

Collaborations with researchers whose applications require high‌ performance will be developed. Besides offering our expertise,‌ we will especially use their applications as an‌ inspiration for new developments of optimization techniques. Those‌ colleagues from other teams will also play the‌ role of beta testers for our semi-automatic code‌ optimizers. Most research axes of CAMUS will include‌ such collaborations. The local scientific environment is particularly‌ favorable to the setting of interactions. For example,‌ we participate in the inter-disciplinary institute IRMIA++ of‌ the University of Strasbourg, that facilitates collaborations with‌ mathematicians developing high performance numerical simulations.

3.1 Semi-automatic‌ and assisted code optimization

Programming languages, as they are used in modern‌ compute-intensive software, are relatively‌ poor in their possibilities‌‌ to describe all known properties of a particular‌ code. On the one‌ hand, a language construct‌‌ may over-specify the semantics of the program, for‌ example, imposing a specific‌ execution order for the‌‌ iterations of a loop whereas any order would‌ have been correct. On‌ the other hand, a‌‌ language construct may under-specify the semantics of the‌ program, for example, lacking‌ the ability to describe‌‌ the fact that two pointers must be distinct,‌ or that a given‌ integer value is always‌‌ less than a small constant.

Modern tools that‌ rewrite code for optimization,‌ be it internally as‌‌ optimizing compiler passes or externally as source-to-source transformations,‌ miss a lot of‌ opportunities for the programmer‌‌ to annotate and integrate their knowledge of the‌ code. As a consequence‌ fully-automatic tools, are not‌‌ easily brought to their full capacity and one-shot‌ platform-specific programmer intervention is‌ required.

To advance this‌‌ field, we will develop re-usable and traceable features‌ that provide the ability‌ for programmers to specify‌‌ and control code transformations and to annotate functional‌ interfaces and code blocks‌ with all the meta-knowledge‌‌ they have.

3.2 Fully-automatic code optimization

We will‌ focus on two main‌ code optimization and parallelization‌‌ approaches: the polyhedral model, based on a geometrical‌ representation and transformation of‌ loops; and task-based model,‌‌ based on a runtime resolution of the dependencies‌ between the tasks. Note‌ that these two approaches‌‌ can potentially be mixed.

The polyhedral model is‌ a great source of‌ new developments regarding fundamental‌‌ mathematical tools dedicated to code analysis and transformation.‌ This model was originally‌ exclusively based on linear‌‌ algebra. We have proposed in the past some‌ extensions to polynomials, and‌ we are currently investigating‌‌ extensions to algebraic expressions. In the meantime, we‌ also focus on runtime‌ approaches that allow polyhedral-related‌‌ techniques to be applied to codes that are‌ not usually well-suited candidates.‌ The motivation of such‌‌ extensions is obviously to propose new compilation techniques‌ with enlarged scope and‌ better efficiency, that are‌‌ either static, i.e, applied at compile-time, or dynamic,‌ i.e., applied at runtime.‌

We will also keep‌‌ studying the task-based method which is complementary to‌ the polyhedral model, and‌ beneficial in scenarios that‌‌ are not adapted to the polyhedral model. For‌ example, this method can‌ work when the description‌‌ of the parallelism is entirely performed at runtime,‌ and it is able‌ to parallelize sections with‌‌ arbitrary structures (i.e., not necessarily loop nests).

In‌ our project, we attempt‌ to bridge the gap‌‌ between the task-based method and the compiler by‌ designing a novel automatic‌ parallelization mechanism with static‌‌ source-to-source transformations. We also work on improving the‌ scheduling strategies or the‌ description of the parallelism‌‌ by designing speculative execution models that operate at‌ runtime.

3.3 Fundamental algorithms‌ & mathematical tools

Regarding‌‌ our fundamental and theoretical studies, we plan to‌ focus on three main‌ topics: (1) Trahrhe expressions,‌‌ (2) mechanized metatheory and‌ interactive program verification, and (3) programmable polyhedral scheduling.‌

4 Application domains

High performance computing plays a‌ crucial role in the resolution of important problems‌ of science and industry. Additionally, software development companies,‌ and software developers in general, are strongly constrained‌ by the time-to-market issue, while facing growing complexities‌ related to hardware and correctness of the developed‌ programs. Computers become more and more powerful by‌ integrating numerous and specialized processor cores, and programs‌ taking advantage of such hardware are more and‌ more exposed to correctness issues.

Our goal is‌ to provide automatic and semi-automatic tools that will‌ significantly lower the burden on developers. By ensuring‌ a secured production of correct and well-performing software,‌ developers can mostly concentrate on the implemented functionalities,‌ and produce quality software in reasonable time.

Our‌ scientific contributions are most of the time supported‌ by a related developed software, or an extension‌ of an existing software. Its role is to‌ highlight the automation of the proposed analysis and‌ optimization techniques, to highlight their effectiveness by exhibiting‌ performance improvements on baseline benchmark programs, and to‌ facilitate their application on any program that would‌ be targeted by some potential users. Thus, our‌ software tools must be made as accessible as‌ possible for users of science and industry, for‌ experimenting the implemented optimization procedures with their specific‌ programs. As such, we usually propose a free‌ non-commercial use, through an open-source software licence. While‌ the software is made available in a shape‌ that allows for its use in full autonomy,‌ we expect interested users to contact us for‌ some deeper exchanges related to their specific goals.‌ Such exchanges may be the start of some‌ fruitful collaborations. Publishing our proposals in top rated‌ conferences and journals may obviously also result in‌ a effective impact for their adoption and the‌ use of the related software.

Our contributions in‌ analysis and optimization techniques of programs may find‌ interested users in many international companies, from semi-conductor‌ industry actors, like ARM, SiPearl or STMicroelectronics, to‌ big companies developing high performance or deep learning‌ applications. At a national or local level, any‌ company whose innovative developments require compute or data‌ intensive applications, like Nyx, or dedicated support‌ tools, like Atos, may be interested in our‌ work, and potentially collaborate with us for more‌ specific and dedicated research. Since the project-team is‌ hosted by the University of Strasbourg, contacts with‌ many local companies are made easier thanks to‌ the hiring of former students, and to their‌ involvement in teaching duties and supervision of internship‌ students.

5 Highlights of the year

The third‌ edition of "Modern C" by Jens Gustedt 35‌ has been published by Manning and over all‌ had about 185000 downloads on HAL.

6 Latest‌ software developments, platforms, open data

6.1 Latest software‌ developments

6.1.1 TRAHRHE

Name:
Trahrhe expressions and applications‌ in loop optimization
Keywords:
Polyhedral compilation, Code optimisation,‌ Source-to-source compiler
Functional Description:
This software includes a mathematic kernel for computing‌ Trahthe expressions related to‌ iteration domains, as well‌‌ as extensions implementing source-to-source transformations of loops for‌ applying optimizations based on‌ Trahrhe expressions.
News of‌‌ the Year:
A more robust way of computing‌ the ranking polynomials has‌ been implemented. A quite‌‌ new version of the software written in C/C++‌ has been be published.‌
URL:
https://webpages.gitlabpages.inria.fr/trahrhe
Publications:
hal-04379037‌‌, hal-03944790, hal-02425752, hal-01581081
Contact:
Philippe‌ Clauss
Participants:
Clément Rossetti,‌ Philippe Clauss, Marceau Noury‌‌

6.1.2 openCARP

Name:
Cardiac Electrophysiology Simulator
Keyword:
Cardiac‌ Electrophysiology
Functional Description:
openCARP‌ is an open cardiac‌‌ electrophysiology simulator for in-silico experiments. Its source code‌ is public and the‌ software is freely available‌‌ for academic purposes. openCARP is easy to use‌ and offers single cell‌ as well as multiscale‌‌ simulations from ion channel to organ level. Additionally,‌ openCARP includes a wide‌ variety of functions for‌‌ pre- and post-processing of data as well as‌ visualization.
News of the‌ Year:
Improvements of the‌‌ code generation of ionic models (limpetMLIR) : generation‌ of CUDA and AMD‌ kernels. Building of all‌‌ targets embedded in functions for the runtime interface.‌ StarPU interface to the‌ kernels. Benchmarks (execution time,‌‌ energy consumption).
URL:
https://opencarp.org/
Publications:
hal-04206195, hal-03977688‌
Contact:
Vincent Loechner
Participants:‌
Vincent Loechner, Stephane Genaud,‌‌ Antoine Pierquin, Adilla Susungi, 3 anonymous participants
Partner:‌
Karlsruhe Institute of Technology‌

6.1.3 SPECX

Name:
SPEculative‌‌ eXecution task-based runtime system
Keywords:
HPC, Parallelization, Task-based‌ algorithm
Functional Description:
Specx‌ (previously SPETABARU) is a‌‌ task-based runtime system for multi-core architectures that includes‌ speculative execution models. It‌ is a pure C++11‌‌ product without external dependency. It uses advanced meta-programming‌ and allows for an‌ easy customization of the‌‌ scheduler. It is also capable to generate execution‌ traces in SVG to‌ better understand the behavior‌‌ of the applications.
News of the Year:
In‌ 2025, the paper that‌ presents the multi-GPUs and‌‌ MPI version of Spec has been published: Specx:‌ a C++ task-based runtime‌ system for heterogeneous distributed‌‌ architectures Paul Cardosi, Bérenger Bramas, PeerJ CS.
URL:‌
https://gitlab.inria.fr/bramas/specx
Publication:
hal-04191350
Contact:‌
Bérenger Bramas

6.1.4 Autovesk‌‌

Keywords:
HPC, Vectorization, Source-to-source compiler
Functional Description:
Autovesk‌ is a tool to‌ produce vectorized implementation from‌‌ static kernels.
News of the Year:
In 2025,‌ Autovesk has been updated‌ to support more complex‌‌ benchmarks.
URL:
https://gitlab.inria.fr/bramas/autovesk
Contact:
Bérenger Bramas
Participant:
Bérenger‌ Bramas

6.1.5 PolyLib

Name:‌
The Polyhedral Library
Keywords:‌‌
Rational polyhedra, Library, Polyhedral compilation
Scientific Description:
A‌ C library used in‌ polyhedral compilation, as a‌‌ basic tool used to analyze, transform, optimize polyhedral‌ loop nests. It has‌ been shipped in the‌‌ polyhedral tools Cloog and Pluto.
Functional Description:
PolyLib‌ is a C library‌ of polyhedral functions, that‌‌ can manipulate unions of rational polyhedra of any‌ dimension. It was the‌ first to provide an‌‌ implementation of the computation of parametric vertices of‌ a parametric polyhedron, and‌ the computation of an‌‌ Ehrhart polynomial (expressing the number of integer points‌ contained in a parametric‌ polytope) based on an‌‌ interpolation method.
Release Contributions:‌
Functions to manipulate LBLs (linearly bounded lattices) have‌ been added in 2025. MIT Licence.
News of‌ the Year:
Maintenance, upgrade of the user interface,‌ upgrade of the build process.
URL:
http://icps.u-strasbg.fr/PolyLib/
Publication:‌
hal-05464193
Contact:
Vincent Loechner
Participant:
Vincent Loechner

6.1.6‌ APOLLO

Name:
Automatic speculative POLyhedral Loop Optimizer
Keyword:‌
Automatic parallelization
Scientific Description:
APOLLO - Automatic speculative‌ POLyhedral Loop Optimizer is a compiler framework dedicated‌ to automatic, dynamic and speculative parallelization and optimization‌ of programs' loop nests. This framework allows a‌ user to mark in a C/C++ source code‌ some nested loops of any kind (for, while‌ or do-while loops) in order to be handled‌ by a speculative parallelization process, to take advantage‌ of the underlying multi-core processor architecture. The framework‌ is composed of two main parts: extensions to‌ the CLANG-LLVM compiler and a runtime system.
Functional‌ Description:
APOLLO is dedicated to automatic, dynamic and‌ speculative parallelization of loop nests that cannot be‌ handled efficiently at compile-time. It is composed of‌ a static part consisting of specific passes in‌ the LLVM compiler suite, plus a modified Clang‌ frontend, and a dynamic part consisting of a‌ runtime system. It can apply on-the-fly any kind‌ of polyhedral transformations, including tiling, and can handle‌ nonlinear loops, as while-loops referencing memory through pointers‌ and indirections. Some recent extensions enabling dynamic multi-versioning‌ have been implemented in 2020.
News of the‌ Year:
Apollo has been upgraded to LLVM 17‌ and to pluto 0.12.
URL:
https://webpages.gitlabpages.inria.fr/apollo
Publications:
hal-01244464‌, hal-01533692, hal-01155172, hal-02457425, hal-01377656‌
Contact:
Philippe Clauss
Participants:
Aravind Sukumaran Rajam, Erwan‌ Auer, Raphael Colin, Juan Manuel Martinez Caamano, Manuel‌ Selva, Philippe Clauss

6.1.7 OptiTrust

Name:
OptiTrust
Keywords:‌
Code optimisation, Verification
Functional Description:
The OptiTrust framework‌ provides programmers with means of optimizing their programs‌ via user-guided source-to-source transformations. It leverages Separation Logic‌ for checking that both input and output programs‌ satisfy the desired specification. The transformations maintain separation‌ logic derivations, following the concept of proof-carrying code.‌
News of the Year:
OptiTrust has been extended‌ to support validation of functional correctness on the‌ output code, following the proof-carrying code approach. A‌ new case study on LLM inference has been‌ developed.
URL:
http://optitrust.inria.fr
Contact:
Arthur Charguéraud
Participants:
Arthur‌ Charguéraud, Thomas Koehler, Guillaume Bertholon

6.1.8 APAC

Keywords:‌
Source-to-source compiler, Automatic parallelization, Parallel programming
Scientific Description:‌
APAC is a compiler for automatic parallelization that‌ transforms C++ source code to make it parallel‌ by inserting tasks. It uses the tasks+dependencies paradigm‌ and relies on OpenMP as runtime system. Internally,‌ it is based on Optitrust (and Clang-LLVM).
Functional‌ Description:
Automatic task-based parallelization compiler
News of the‌ Year:
Additional case studies have been developed.
URL:‌
https://gitlab.inria.fr/bramas/apac
Contact:
Bérenger Bramas
Participants:
Marek Felsoci, Bérenger‌ Bramas, Stephane Genaud

6.1.9 Rise & Shine

Keywords:‌
Programming language, Compilation
Functional Description:
Programming language and‌ compiler for array computing. Programs are expressed at‌ a high level in the RISE language. Programs‌ are transformed using a set of rewrite rules that encode implementation and‌ optimization choices. The Shine‌ compiler generates high-performance parallel‌‌ C or OpenCL code while preserving the optimization‌ choices made during rewriting.‌
News of the Year:‌‌
A prototype Rise to C compiler executable that‌ preserves floating-point semantics was‌ added by Thomas Koehler,‌‌ as part of his collaboration with Eva Darulova‌ (Uppsala University).
URL:
http://rise-lang.org‌
Contact:
Thomas Koehler
Participant:‌‌
Thomas Koehler
Partner:
Technische Universität Berlin

6.1.10 egg-sketches‌

Keyword:
Program rewriting techniques‌
Functional Description:
egg-sketches is‌‌ a library adding support for program sketches on‌ top of the egg‌ (e-graphs good) library, an‌‌ e-graph library optimized for equality saturation. Sketches are‌ program patterns that are‌ satisfied by a family‌‌ of programs. They can also be seen as‌ incomplete or partial programs‌ as they can leave‌‌ details unspecified. With egg-sketches, it is possible to‌ perform Guided Equality Saturation:‌ a semi-automatic technique that‌‌ allows programmers to guide rewriting via program sketches.‌
News of the Year:‌
Upgraded to latest egg‌‌ dependency, added an extra sketch construct, fixed two‌ bugs.
URL:
https://github.com/Bastacyclop/egg-sketches
Contact:‌
Thomas Koehler
Participant:
Thomas‌‌ Koehler
Partner:
TU Darmstadt

6.1.11 slotted-egraphs

Keyword:
Term‌ Rewriting Systems
Functional Description:‌
Implementation of the slotted‌‌ e-graph data structure, an extension of e-graphs representing‌ terms that differ only‌ in the names of‌‌ their variables uniquely. With slotted-egraphs, users of languages‌ with variables can perform‌ equality saturation by: (1)‌‌ defining the term language, representing variables and binders‌ via slots, (2) defining‌ rewrite rules, without having‌‌ to worry about naming collisions, and leveraging built-in‌ mechanisms for freshness predicates‌ and substitutions, (3) performing‌‌ equality saturation by initializing a slotted e-graph, growing‌ it by applying rewrites,‌ and extracting from it.‌‌
News of the Year:
Developement started in February‌ 2024, led by Rudi‌ Schneider from TU Berlin.‌‌ Instigated and supervised by Thomas Koehler and Michel‌ Steuwer. A paper was‌ accepted at PLDI 2025.‌‌
URL:
https://github.com/memoryleak47/slotted-egraphs
Publication:
hal-05127023
Contact:
Thomas Koehler
Participant:‌
Thomas Koehler
Partners:
Technische‌ Universität Berlin, TU Darmstadt‌‌

6.1.12 Pesto

Name:
Polyhedral flExible loop-neST Optimizer
Keyword:‌
Optimizing compiler
Functional Description:‌
This tool allows to‌‌ easily apply polyhedral transformation on C code, and‌ particularly algebraic tiling. It‌ is divided in two‌‌ parts : the command-line interface, and the library.‌
News of the Year:‌
The first usable version‌‌ of Pesto is now available.
URL:
https://gitlab.inria.fr/crossett/pesto
Contact:‌
Clément Rossetti

6.1.13 StrasGPT‌

Keywords:
Polyhedral compilation, LLM,‌‌ Automatic parallelization, Vectorization
Functional Description:
This program is‌ a direct C implementation‌ of the Qwen3 /‌‌ LLaMa 3.x / Mistral LLM transformer architecture amongst‌ others, reusing the tokenizer‌ and the sampler of‌‌ Andrej Karpathy's llama2.c project and its fork by‌ James Delancey llama3.c. Given‌ an input prompt, StrasGPT‌‌ can generate a text that continues it. It‌ was initially designed as‌ a parallel programming project‌‌ for master students in 2025 (students had to‌ parallelize it with OpenMP‌ + MPI). It is‌‌ now getting continued for (polyhedral) compiler research.
News‌ of the Year:
Creation‌
Contact:
Cedric Bastoul
Participant:‌‌
Cedric Bastoul

7 New‌ results

7.1 Semi-automatic and assisted code optimization

7.1.1‌ OptiTrust: Producing Trustworthy High-Performance Code via Source-to-Source Transformations‌

Participants: Arthur Charguéraud, Guillaume Bertholon, Thomas‌ Koehler, Elian Morel, Julien de Castelnau‌.

In 2025, we pursued the development of‌ the OptiTrust prototype framework for producing high-performance code‌ via source-to-source transformations, with formal guarantees of correctness.‌

We have extended the framework to support full‌ functional correctness assertions. OptiTrust thereby becomes a modern‌ implementation of Necula's concept of proof-carrying-code. We generalized‌ two prior case studies, namely OpenCV's box-blur and‌ TVM's matrix multiply, to full functional correctness. This‌ work is described as part of the PhD‌ thesis of Guillaume Bertolon 57. A 60-page‌ journal article describing OptiTrust has been recently submitted‌ it for publication at a top-tier journal. In‌ addition, Guillaume has presented a description of OptiTrust'‌ bidirectional translation between C and its internal lambda-calculus‌ at JFLA'25 national workshop 22.

The Master‌ internship of Elian Morel contributed a case study‌ on an LLM inference code. Preliminary results show‌ that OptiTrust supports code transformation on such a‌ complex, realistic program. Elian also contributed an important‌ technical addition to the typechecker: an elaboration phase‌ to automatically infer the numerous annotations (known as‌ “ghost operations for focusing”), which are needed to‌ typecheck array-manipulating operations in separation logic.

The ongoing‌ Master internship of Julien de Castelnau aims to‌ extend OptiTrust to support refinement from CPU to‌ GPU code, optimization at the GPU level, and‌ extraction from GPU code to Cuda syntax.

7.1.2‌ Sketch-Guided Polyhedral Compilation

Participants: Valeran Maytié, Thomas‌ Koehler, Cedric Bastoul.

As part of‌ Valéran Maytié's internship and PhD, we developed a‌ new semi-automatic, sketch-guided compilation approach. It enables users‌ to write sketches that guide the compiler towards‌ key optimizations by describing the desired structure of‌ the optimised code, without worrying about how to‌ get there. We introduce a sketch language that‌ enables expressing the result of imperative loop transformations‌ and a new polyhedral algorithm capable of generating‌ code constrained by both a sketch and a‌ computation specification. This work was presented at the‌ IMPACT'26 workshop 27, and we are working‌ towards a full conference paper. This work has‌ been done in collaboration with Christophe Alias (Inria‌ CASH).

7.1.3 Specx: A C++ Task-Based Runtime System‌ for Heterogeneous Distributed Architectures

Participants: Bérenger Bramas.‌

Bérenger Bramas and Paul Cardosi completed and submitted‌ the paper presenting Specx several years after the‌ end of Paul Cardosi's contract, this paper has‌ been published in 2025 16). Specx is‌ now capable of executing task graphs on heterogeneous‌ distributed architectures. It provides an elegant way to‌ define task graphs and describe objects that the‌ runtime system can move or send.

7.1.4 Using‌ the Discrete Wavelet Transform for Scientific Data Compression‌

Participants: Atoli Huppé, Clément Flint, Bérenger‌ Bramas, Stéphane Genaud, Philippe Helluy.‌

As a follow up of Clément Flint's thesis work 59, in‌ which we worked on‌ compressing simulation data for‌‌ a Lattice Boltzmann application, the Atoli Huppe 's‌ PhD work aims to‌ propose a general compressor‌‌ for scientific data. It addresses the use case‌ in which the data‌ to compress would be‌‌ generated by a scientific application on the GPU,‌ and then the compression‌ would be directly performed‌‌ on the data from the GPU memory. The‌ original data compressed into‌ blocks can then be‌‌ decompressed when they are needed for further computations,‌ or saved from the‌ GPU to the disk‌‌ through the CPU.

This GPU implementation based on‌ the Discrete Wawelet Transform‌ is, to the best‌‌ of our knowledge, the first full GPU, single-kernel,‌ implementation using this compression‌ model. Our performance evaluation,‌‌ conducted at the end of 2025, shows that‌ our compressor achieves a‌ higher compression ratio than‌‌ the state-of-the-art compressor cuSZp2 60 with comparable compression‌ and decompression throughputs. A‌ preliminary version of this‌‌ work was presented at a COMPAS 32,‌ and our latest results‌ will be submitted to‌‌ an international conference.

7.1.5 Exploiting Ray Tracing Technology‌ Through OptiX to Compute‌ Particle Interactions with Cutoff‌‌ in a 3D Environment on GPUs

Participants: Bérenger‌ Bramas.

Bérenger Bramas‌ and David Algis worked‌‌ on utilizing OptiX for neighbor finding in n-body‌ simulations. Several methods were‌ implemented, including two novel‌‌ approaches based on new geometric patterns. A preprint‌ demonstrates that these methods‌ can achieve significant speedups‌‌ when the grid is sparse (i.e., when particles‌ are not uniformly distributed)‌ 15.

7.1.6 Real-time‌‌ ocean simulation

Participants: Bérenger Bramas.

Bérenger Bramas‌ collaborated with Emmanuelle Darles,‌ Lilian Aveneau and David‌‌ Algis on real-time ocean simulation. This work led‌ to the development of‌ the Arc Blanc framework‌‌ 13, 21, a fully described GPU/CPU‌ real-time pipeline for simulating‌ the free ocean surface‌‌ and solid–fluid interactions while preserving physical realism at‌ large scale. The framework‌ includes improvements such as‌‌ real-time computation of fluid velocities at arbitrary depth‌ and enhanced solid-to-fluid coupling.‌ In addition, we supported‌‌ the integration of these simulations into Unity by‌ developing an open-source interoperability‌ tool between Unity compute‌‌ shaders and CUDA, enabling access to advanced GPU‌ programming features not available‌ in Unity's native environment‌‌ 14.

7.2 Fully-automatic code optimization

7.2.1 Algebraic‌ tiling

Participants: Clément Rossetti‌, Philippe Clauss.‌‌

We propose a new loop tiling approach based‌ on the volumes of‌ the tiles, i.e.,‌‌ the number of iterations delimited by the tiles,‌ instead of the sizes‌ of standard (hyper-)rectangular tiles,‌‌ i.e., the sizes of the edges of‌ the tiles. In the‌ proposed approach, tiles are‌‌ dynamically generated and have almost equal volumes, even‌ if their shape and‌ edge sizes may differ.‌‌ The iteration domain is well covered by a‌ minimum number of tiles‌ that are all almost‌‌ full. Since the bounds of the generated tiles‌ are not linear and‌ defined by algebraic mathematical‌‌ expressions, we call this‌ loop tiling technique algebraic tiling. It uses‌ the mathematical engine TRAHRHE also developed in the‌ team.

Algebraic tiles are built by successive hierarchical‌ slicing of the initial iteration domain, from the‌ outermost to the innermost depth dimensions of the‌ target loop nest, in a way ensuring that‌ slices have all quasi-equal volumes. The bounds of‌ the loop nests that are handled must be‌ constants, or linear functions of the surrounding loop‌ iterators and of unknown parameters – which are‌ typically related to the data input size. Such‌ loops are also called polyhedral loops since they‌ may be handled using the polyhedral model. Quasi-perfect‌ load balancing is achieved when each parallel loop‌ is sliced using as many slices of quasi-equal‌ volumes as parallel threads, and when most of‌ the iterations have close execution times. Thus, such‌ dynamic slicing strategy makes the resulting parallel loop‌ scalable regarding the number of threads. Good data‌ locality is reached by slicing profitably the non-parallelized‌ loops, and by slicing the parallel loops in‌ a number of parts equal to a multiple‌ of the number of parallel threads. The number‌ of generated slices for each dimension may stay‌ as a parameter at compile-time, making algebraic tiling‌ a parameterized loop tiling technique, and allowing the‌ produced code to adapt to the number of‌ parallel threads and data layout. Our experiments show‌ that algebraic tiling outperforms significantly (hyper-)rectangular tiling when‌ parallelizing loops with OpenMP using static scheduling, and‌ mostly provides similar or lower execution times when‌ compared to traditionally tiled loops parallelized using dynamic‌ scheduling of OpenMP. Thus, algebraic tiling makes dynamic‌ scheduling fairly purposeless for the handled loop nests.‌

Algebraic tiling has been implemented in a source-to-source‌ automatic code optimizer called Pesto (6.1.12)‌ by Clément Rossetti, who defended his thesis on‌ the 18th of December 2025.

7.2.2 Connecting Kokkos‌ with the Polyhedral Model

Participants: Ugo Battiston,‌ Philippe Clauss.

The increasing complexity of HPC‌ hardware forces scientists to shift towards performance portable‌ parallel programming models. Modern C++ libraries, such as‌ Kokkos, have become essential: they allow developers to‌ write a single code that runs efficiently on‌ heterogeneous hardware (CPUs or GPUs).

However, this abstraction‌ comes at a cost. The heavy use of‌ C++ templates and lambda functions inside Kokkos hides‌ the control flow and memory access patterns from‌ the compiler. Consequently, advanced static analyzers, specifically those‌ based on the polyhedral model like LLVM's Polly‌ extension, fail to detect optimization opportunities such as‌ loop tiling or fusion, leaving significant performance on‌ the table.

We propose a novel approach to‌ bridge the gap between high level C++ abstractions‌ and low-level polyhedral optimizations. We present a co-design‌ strategy involving modifications to both the Kokkos library‌ and Polly. First, we modify and instrument Kokkos‌ to produce a cleaner intermediate representation (IR) structure‌ and expose loop and data structures at the‌ LLVM IR level. Second, we extend Polly to recognize these constructs and‌ apply aggressive loop optimizing‌ and parallelizing transformations on‌‌ Kokkos kernels.

We show that this pipeline enables‌ the automatic application of‌ polyhedral transformations on standard‌‌ Kokkos codes. Our evaluation on loop kernels from‌ the Polybench benchmark suite,‌ rewritten using Kokkos, shows‌‌ significant speedups reaching up to 12.3 times over‌ baseline Kokkos usage.

This‌ work has been presented‌‌ by Ugo Battiston at the Conférence francophone d'informatique‌ en Parallélisme, Architecture et‌ Système (COMPAS 2025). A‌‌ paper has been recently submitted to an international‌ conference.

7.2.3 Automatic Multi-Versioning‌ of Computation Kernels

Participants:‌‌ Raphaël Colin, Erwan Auer, Philippe Clauss‌.

Compute-intensive scientific applications‌ usually combine multiple compute‌‌ kernels. They can go through various different execution‌ phases, where the kernels‌ may operate with different‌‌ parameters and in different execution contexts. As such,‌ standard compilers and generalist‌ optimization tools fail to‌‌ optimize compute kernels to the fullest regarding the‌ execution contexts that the‌ application goes through.

Multi-versioning,‌‌ iterative compilation and auto-tuning are optimization techniques that‌ aim to specialize the‌ optimization parameters of the‌‌ target code. By using feedback obtained from performance‌ measurements at runtime, they‌ are able to choose‌‌ the best implementation variant, the best compiler optimizations,‌ or the best set‌ of values for some‌‌ parameters. However, these techniques mostly generate code by‌ relying on static information,‌ or by relying on‌‌ the user to provide different implementations of the‌ same kernel, or to‌ reference relevant parameters to‌‌ tune.

We are currently designing a multi-versioning system‌ which generates different efficient‌ versions of compute kernels‌‌ at runtime, and selects the best performing one‌ for each encountered execution‌ context. The different versions‌‌ that are generated result from applying automatic loop‌ optimizing and parallelizing transformations‌ that are based on‌‌ the polyhedral model to the LLVM intermediate representation.‌ The latter is then‌ compiled on-the-fly using the‌‌ LLVM Just-In-Time compiler. This multi-versioning system is currently‌ being implemented in the‌ Apollo dynamic parallelizer (‌‌6.1.6).

The system requires very few annotations‌ from the user, and‌ uses information about the‌‌ execution context that is gathered at runtime, in‌ order to guide the‌ automatic transformations of the‌‌ compute kernels.

This work has been presented by‌ Raphaël Colin at the‌ Conférence francophone d'informatique en‌‌ Parallélisme, Architecture et Système (COMPAS 2025) 31.‌

7.2.4 Dynamic Task Scheduling‌ with Multiple Priorities on‌‌ Heterogeneous Computing Systems

Participants: Hayfa Tayeb, Bérenger‌ Bramas.

In the‌ context of Albert d'Aviau‌‌ de Piolant and Hayfa Tayeb's PhD work, Bérenger‌ Bramas collaborated with Mathieu‌ Faverge, Abdou Guermouche, and‌‌ Amina Guermouche to optimize energy consumption in StarPU-based‌ applications 29, addressing‌ a central challenge in‌‌ high-performance computing (HPC), namely improving energy efficiency without‌ sacrificing too much performance.‌ A key lever explored‌‌ in this work is GPU power capping, a‌ technique that enforces a‌ fixed upper power limit‌‌ on devices such as CPUs and GPUs, with‌ the objective of reducing‌ energy usage while preserving‌‌ acceptable throughput. The activity‌ focused on evaluating the impact of static GPU‌ power caps in heterogeneous HPC environments, where multiple‌ accelerators with potentially different performance characteristics are used‌ concurrently, and where the performance/energy trade-off becomes a‌ scheduling and resource allocation problem rather than a‌ simple per-device tuning problem. The study first conducted‌ an extensive characterization on a compute-intensive reference kernel—GEMM‌ (matrix multiplication)—across multiple Nvidia GPU architectures, in order‌ to quantify how reducing the available power budget‌ affects both execution time and energy consumption. The‌ results highlight that compute-bound kernels can become significantly‌ more energy-efficient under moderate power constraints: in particular,‌ setting the GPU power limit in the range‌ of 55-70% of the Thermal Design Power (TDP)‌ can yield up to 30% energy efficiency improvement‌ with only limited performance degradation. Building on these‌ observations, the work then investigated how applying distinct‌ power caps to different GPUs within the same‌ heterogeneous node can improve the global energy efficiency‌ of real HPC workloads, focusing on dense linear‌ algebra task-based computations including matrix multiplication and Cholesky‌ factorization. Importantly, the study also demonstrated that the‌ runtime scheduler (StarPU) can automatically adapt scheduling decisions‌ to exploit the resulting heterogeneity induced by different‌ GPU power limits, thereby aligning task placement with‌ each device's effective compute capability under capping. Overall,‌ on a platform equipped with four GPUs, applying‌ power capping across all devices led to substantial‌ end-to-end efficiency gains, improving energy efficiency for matrix‌ multiplication by up to 24.3% in double precision‌ and 33.78% in single precision, confirming that power-aware‌ runtime-driven execution is a practical and effective approach‌ to reduce the energy footprint of heterogeneous HPC‌ applications.

7.2.5 Scheduling multiple task-based applications on distributed‌ heterogeneous computing nodes

Participants: Jean-Etienne Ndamlabin, Bérenger‌ Bramas.

The size, complexity and cost of‌ supercomputers continue to grow making any waste more‌ critical than in the past. Consequently, we need‌ methods to reduce the waste coming from the‌ users' choices, badly optimized applications or heterogeneous workloads‌ during executions. In this context, we worked on‌ the scheduling of several task-based applications on given‌ hardware resources. Specifically, we created load balancing heuristics‌ to distribute the task-graph over the processing units.‌ We validated our approach by implementing a super-scheduler‌ in StarPU 18.

7.2.6 Automatic task-based parallelization‌

Participants: Bérenger Bramas, Marek Felosci, Julien‌ Gaupp, Stéphane Genaud.

We extended our‌ approach to automatically parallelize any application using a‌ task-based method. We reimplemented APAC using OptiTrust (developed‌ by our team), which enables source code transformations‌ to be expressed compactly. We addressed several challenges‌ related to explicit synchronizations, dependency specifications for arrays,‌ and code duplication required to maintain both sequential‌ and parallel versions. In addition, we created a‌ purely LLVM-based version, which supports a broader subset‌ of the C++ language, but at the cost‌ of more complex transformation implementations 25, 30‌.

7.2.7 Ionic Models Code Generation for Heterogeneous‌ Architectures

Participants: Vincent Loechner, Stephane Genaud, Cedric Bastoul, Adilla‌ Susungi, Antoine Pierquin‌.

We participate in‌‌ the research and development of a cardiac electrophysiology‌ simulator called openCARP (‌6.1.2) in the‌‌ context of the MICROCARD-2 European project (8.3.1‌). Our team provides‌ their optimizing compiler expertise‌‌ to build a bridge from a high-level DSL‌ language convenient for ionic‌ model experts (EasyML) to‌‌ a code that will run efficiently on exascale‌ supercomputers, using the MLIR‌ compiler framework. We have‌‌ extended the capabilities of openCARP for generating multiple‌ parallel versions of the‌ ionic currents computation, hence‌‌ enabling the exploitation of the various parallel computing‌ units that are available‌ in the target architecture‌‌ nodes (multicore CPUs with vector units, GPUs, etc.).‌ We have collaborated with‌ members of the STORM‌‌ team (Inria Bordeaux), also implied in the MICROCARD-2‌ project, to extend the‌ capability of executing simulations‌‌ simultaneously on multiple CPUs and GPUs.

In 2025,‌ we improved the openCARP‌ software to:

robustify the‌‌ compilation of MLIR generated code;
optimize the code‌ to avoid memory transfers‌ between GPUs and main‌‌ node memory;
provide new functions to access local‌ variables of the ionic‌ model and permit the‌‌ implementation of SDC (spectral deferred correction) methods, in‌ collaboration with our European‌ partner from ZIB (Berlin);‌‌
investigate how to replace the fast linear interpolation‌ to approximate complex formulas‌ by polynomial interpolation, using‌‌ the Sollya library.

7.2.8 Machine Learning Guided Equality‌ Saturation

Participants: Thomas Koehler‌.

In joint work‌‌ led by Nicole Heinimann (PhD at TU Berlin‌ supervised by Michel Steuwer),‌ Thomas Koehler has been‌‌ exploring the idea of guiding the equality saturation‌ optimization technique through machine‌ learning. Equality saturation has‌‌ successfully been applied in many domains. Yet, scaling‌ issues hold back its‌ success in even more‌‌ applications. Thomas' prior work proposed Guided Equality Saturation‌ as a solution that‌ breaks challenging rewrite problems‌‌ into a sequence of equality saturations. However, this‌ prior work relied on‌ human experts to provide‌‌ insights in the form of guides that describe‌ when to stop one‌ equality saturation and start‌‌ the next. The ongoing effort, presented at the‌ EGRAPHS'25 workshop 54,‌ attempts to reduce the‌‌ reliance on human experts. The goal of Machine‌ Learning Guided Equality Saturation‌ is to automatically generate‌‌ guides using a machine learning model. The training‌ setup and machine learning‌ model went through multiple‌‌ design iterations already, and experiments are ongoing to‌ assess how effective this‌ approach is on challenging‌‌ workloads.

7.2.9 Combining Optimization and Numerical Analysis of‌ Functional Array Programs

Participants:‌ Thomas Koehler.

In‌‌ joint work with Eva Darulova (Uppsala University, Sweden),‌ Thomas Koehler is working‌ towards combining program optimization‌‌ with numerical analysis. Eva and Thomas co-supervised two‌ Master students at Uppsala‌ University who finished their‌‌ thesis in 2025. Simon Björklund wrote his thesis‌ on Numerical Analysis of‌ Highly Performant Functional Array‌‌ Programs58. Filip von Knorring wrote his‌ thesis on Exploring Accuracy‌ and Performance Trade-offs in‌‌ Functional Array Programs64‌. Eva and Thomas are now working towards‌ an international-level publication for this work.

7.3 Fundamental‌ algorithms & mathematical tools

7.3.1 Trahrhe expressions

Participants:‌ Philippe Clauss, Clément Rossetti, Marceau Noury‌.

In the mid-1990s, Philippe Clauss and Vincent‌ Loechner introduced the mathematical theory of Ehrhart polynomials‌ in computer science for the quantitative analysis of‌ iterative programs 3, 6. These special‌ mathematical objects give the exact number of integer‌ points contained in a polyhedron depending linearly on‌ parameters. In the context of polyhedral modeling of‌ nested loops, this number can correspond to the‌ total number of iterations, the number of parallel‌ iterations, the number of accessed data, etc.

A‌ particular use of these Ehrhart polynomials are ranking‌ polynomials. Such polynomials give the position, or‌ rank, of an iteration of a loop nest,‌ according to the lexicographic order of execution of‌ the iterations. These polynomials are determined by calculating‌ the number of integer points lexicographically inferior to‌ any point in the polyhedral domain of the‌ iterations. Philippe Clauss has shown a first application‌ of such polynomials to data layout transformation for‌ optimal spatial locality in 2000.

More recently, we‌ have been interested in inverting such ranking polynomials,‌ in order to be able to determine, for‌ a given rank, what are the corresponding loop‌ indices. This unranking problem is particularly challenging from‌ a theoretical and practical point of view. Thanks‌ to the specific properties of ranking polynomials, we‌ have developed a method for inverting such polynomials‌ by solving uni-variate polynomial equations and propagating the‌ integer floors of the roots to lower dimensions‌ 2.

Since 2019, the mathematical engine computing‌ Trahrhe expressions has been developed as a software‌ (TRAHRHE) (6.1.1) usable for‌ several loop optimization purposes, as non-rectangular loop collapsing‌ 2 or algebraic loop tiling 66. A‌ completely revised version written in C++ and implementing‌ many improvements has been developed by Marceau Noury‌ , Clément Rossetti and Philippe Clauss . It‌ is now available from the website.

7.3.2 Z-Polyhedra‌ and LBLs in PolyLib

Participants: Vincent Loechner.‌

Z-polyhedra were first introduced in PolyLib (6.1.5‌) in 2000, but this implementation suffered from‌ several limitations. Since then, significant advances have been‌ made in defining a solid mathematical foundation and‌ a sound normal form for Z- polyhedra, LBLs‌ (linearly bounded lattices) and their unions. We extended‌ this theoretical work to enable the manipulation of‌ arbitrary union of LBLs or Z-polyhedra in PolyLib,‌ using efficient algorithms to perform set operations and‌ transformations of unions of LBLs. When implementing the‌ LBLs in PolyLib we took special care to‌ ensure safe and efficient memory allocations, to write‌ efficient and robust functions, and to validate them‌ on a broad range of verified test examples.‌ This work was presented at the IMPACT workshop‌ in January 2026 34.

7.3.3 Polyhedral Scheduling‌

Participants: Tom Hammer, Vincent Loechner, Stephane Genaud, Alain Ketterlin‌, Cedric Bastoul,‌ Bérenger Bramas.

Scheduling‌‌ is the central operation in the polyhedral compilation‌ chain, to find the‌ best execution order of‌‌ loop iterations for parallelizing and optimizing the code.‌ Discovering the best polyhedral‌ schedules remains a challenge‌‌ due to the huge search space. Moreover, current‌ classes of polyhedral schedulers‌ proceed from outer to‌‌ inner loops, making them unpractical for enforcing efficient‌ vectorization in innermost loops.‌ We have shown those‌‌ limitations in our survey on polyhedral compilers 28‌ presented at the HiPEAC‌ 2025 conference.

The PhD‌‌ work of Tom Hammer is currently investigating if‌ bringing the results produced‌ by an auto-vectorizer can‌‌ help choose a schedule that enables both thread-parallelism‌ and vectorization. To that‌ end, we have extended‌‌ Autovesk 67 developed in our team, which implements‌ a Superword Level Parallelism‌ (SLP) algorithm, to track‌‌ how vectorized instructions relate to the original statement‌ instruction instances. We then‌ use a modified version‌‌ of the Pluto algorithm to build a parallel‌ schedule under the extra‌ constraints discovered by Autovesk.‌‌ We finally check if the schedule taking into‌ account the vectorization dimension‌ is legal before generating‌‌ a transformed loop nest. We have tested this‌ approach on the Polybench/C‌ suite and our findings‌‌ so far are that it does increase the‌ number of vector instructions‌ generated when compiling with‌‌ standard compilers (GCC, Clang). However, the benefits of‌ vectorization can be outweighed‌ by losses in data‌‌ locality, which is favored by the standard Pluto‌ schedule. This work was‌ presented at the IMPACT'26‌‌ Workshop 26.

7.3.4 Integer Polynomials and Polynomial‌ Loops

Participants: Alain Ketterlin‌.

For some time‌‌ now we have been working on a specific‌ representation of integer polynomials,‌ which has proved to‌‌ be well fitted for characterizing polyhedral programs properties‌ (like the counts and‌ ranks of their instructions)‌‌ 63. The same representation has also been‌ used to extend our‌ previous work on loop‌‌ recognition in traces 9, which is now‌ able to produce “polynomial‌ loops” thanks to very‌‌ efficient polynomial interpolation techniques 33. Both of‌ these research lines have‌ introduced the notion of‌‌ “polynomial loops”, i.e., loops where all bounds and‌ values are multivariate polynomials‌ in the surrounding loop‌‌ counters, a model that is, in its full‌ generality, too expressive for‌ our current analysis and‌‌ optimization abilities, but nicely extends the classical polyhedral‌ model.

This year's work‌ has focused on three‌‌ more aspects of loops involving integer polynomials in‌ their control and or‌ computations. The first is‌‌ their ability to be systematically turned into perfect‌ loops (i.e., loops whose‌ bodies are single constructs,‌‌ either a sub-loop or a single instruction), effectively‌ turning control into computation.‌ Besides the theoretical interest‌‌ of such loops, we expect this to have‌ an impact on how‌ such loops are executed,‌‌ especially in the case where dedicated hardware is‌ produced. The second aspect‌ is the efficient execution‌‌ of polynomial loops, which‌ we have proved is only moderately more costly‌ than their affine, non-perfect counterparts, and may even‌ be as efficient provided enough hardware is available.‌ The third and last aspect of the use‌ of integer polynomials inside loops is their compilation‌ on general purpose processors, where the compiler is‌ in charge of detecting their presence inside the‌ code, and of optimizing their computation. We expect‌ this last aspect to have an impact on‌ many numeric computations, but also on programs manipulating‌ multi-dimensional arrays, where non-linear address computations are pervasive.‌

7.3.5 Slotted E-Graphs

Participants: Thomas Koehler.

In‌ joint work led by Rudi Schneider (PhD at‌ TU Berlin supervised by Michel Steuwer), Thomas Koehler‌ and his collaborators have been working on efficiently‌ representing (bound) variables in e-graphs. An e-graph is‌ a data structure at the heart of powerful‌ optimization and reasoning techniques such as equality saturation,‌ that space-efficiently represents equal sub-terms uniquely. In their‌ paper published at PLDI'25 20, they present‌ a novel approach to representing bound variables in‌ e-graphs by making them a first-class built-in feature‌ of the data structure. Their slotted e-graph represents‌ terms that differ only by (bound or free)‌ variable names uniquely. Slotted e-graphs are evaluated on‌ two case studies from compiler optimization and theorem‌ proving to show that performing equality saturation for‌ languages with bound variables is greatly simplified and‌ that it becomes possible to solve practically relevant‌ problems that could not be solved with e-graphs‌ using names or de Bruijn indices.

7.3.6 Improvements‌ of the C programming language

Participants: Jens Gustedt‌.

The C standards committee TC1/SC22/WG14 is now‌ discussing changes for the next version of the‌ C standard, coined C2y at the moment. The‌ discussion on these new features took place in‌ two face-to-face meetings in Graz, Austria, and Brno,‌ Czech Republic.

In 2025 we contributed with several‌ papers to the future revision. We contributed to‌ the following subjects:

improvement of syntax and semantics‌ for arrays 55
type-safe minimum and maximum 39‌
improvement of the preprocessor 52
improvement of some‌ problem spots concerning undefined behavior 534243‌4548,
revision of the thread and‌ atomics features 3840495046
continued‌ work on function attributes 44, 51
the‌ new defer feature 47,
C semantics for‌ contracts 41

In addition to the C standard,‌ 62, the technical specification TS 6010 for‌ a sound and verifiable memory model that is‌ based on provenance 61 has now been published.‌ Jens Gustedt had been an editor and major‌ contributor to this specification.

To promote the new‌ C standard, we also published a C23 edtion‌ of the book Modern C, 35, which‌ finally appeared in print in 2025. By keeping‌ the rights also for this edition, we were‌ able to maintain a free online version on‌ HAL which has again been a great success,‌ with now (Jan. 2026) more than 185000 downloads in total.

7.3.7 Towards‌ Pen-and-Paper-Style Equational Reasoning in‌ Interactive Theorem Provers by‌‌ Equality Saturation

Participants: Thomas Koehler.

Equations are‌ ubiquitous in mathematical reasoning.‌ Often, however, they only‌‌ hold under certain conditions. As these conditions are‌ usually clear from context‌ mathematicians regularly omit them‌‌ when performing equational reasoning on paper. In contrast,‌ interactive theorem provers pedantically‌ insist on every detail‌‌ to be convinced that a theorem holds, hindering‌ equational reasoning at the‌ more abstract level of‌‌ pen-and-paper mathematics.

In joint work led by Marcus‌ Rossel (PhD at TU‌ Darmstadt supervised by Andrés‌‌ Goens), we address this issue by raising the‌ level of equational reasoning‌ to enable pen-and-paper style‌‌ in interactive theorem provers. We achieve this by‌ interpreting theorems as conditional‌ rewrite rules, and use‌‌ equality saturation to automatically derive equational proofs. Conditions‌ that cannot be automatically‌ proven may be surfaced‌‌ as proof obligations. Concretely, we present how to‌ interpret theorems as conditional‌ rewrite rules for a‌‌ significant class of theorems. Handling these theorems goes‌ beyond simple syntactic rewriting,‌ and deals with aspects‌‌ like propositional conditions and type classes. We evaluate‌ our approach by implementing‌ it as a tactic‌‌ in Lean, using the egg library for equality‌ saturation with e-graphs. We‌ show four use cases‌‌ demonstrating the efficacy of this higher level of‌ abstraction for equational reasoning.‌ This work is published‌‌ at POPL'26 19.

7.3.8 Formal Proof of‌ Space Bounds for Concurrent,‌ Garbage-Collected Programs

Participants: Arthur‌‌ Charguéraud, Alexandre Moine.

Alexandre Moine, co-advised‌ by Arthur Charguéraud and‌ François Pottier (Inria Paris)‌‌ have presented a novel, high-level program logic for‌ establishing space bounds in‌ Separation Logic, for programs‌‌ that execute with a garbage collector. A key‌ challenge is to design‌ sound, modular, lightweight mechanisms‌‌ for establishing the unreachability of a block. In‌ the setting of a‌ high-level, ML-style language, a‌‌ key problem is to identify and reason about‌ the memory locations that‌ the garbage collector considers‌‌ as roots. Our recent work has focused on‌ generalizing our previous results‌ to handle concurrent programs.‌‌ A key challenge is to handle the fact‌ that if an allocation‌ lacks free space, then‌‌ it is blocked until all other threads exit‌ their critical section. Only‌ at that point may‌‌ a GC execute and free the requested space.‌ To handle this challenge,‌ we propose to combine‌‌ two language constructs: protected sections (during which the‌ GC cannot be triggered)‌ and polling points (where‌‌ a thread pauses if other threads request a‌ GC execution). Our article‌ describing the results has‌‌ appeared in the premier journal TOPLAS 17.‌

7.3.9 Typechecking of Overloading‌

Participants: Arthur Charguéraud.‌‌

In joint work with Martin Bodin from Inria‌ Grenoble and Jana Dunfield‌ from Queen's School of‌‌ Computing (Canada), Arthur Charguéraud has been working on‌ a typechecking algorithm for‌ resolving overloaded symbols. Overloading‌‌ consists of using the same symbol to refer‌ to several functions, or‌ the same name to‌‌ refer to several constants.‌ Overloading is ubiquitous in mathematics. It also appears‌ in numerous programming languages that resolve overloading statically,‌ as opposed to languages that rely on dynamic‌ dispatch during program execution. A key question is‌ how to determine, for every occurrence of an‌ overloaded symbol, which function it refers to. Static‌ resolution of overloading is intrinsically intertwined with typechecking:‌ overloading resolution depends on types, but the types‌ of the overloaded symbols depend on how they‌ are resolved.

We present the first overloading resolution‌ algorithm accompanied with a polynomial complexity bound. The‌ bound is expressed in terms of the size‌ of the description of the instances, as well‌ as of the size of the typed tree‌ to which the program resolves. In our algorithm,‌ resolution is guided not only by the type‌ of function arguments, but also by the type‌ expected by the context. We allow candidate instances‌ to have dependencies (assumptions). As in certain previously‌ proposed algorithms, we take a non-backtracking approach, which‌ avoids exponential search.

Our implementation parses OCaml-style syntax‌ where functions, constants, constructors and record fields can‌ be overloaded. We assume explicit quantification of polymorphic‌ type variables. If all overloaded symbols can be‌ unambiguously resolved, our tool produces standard OCaml code,‌ in which every overloaded symbol is replaced with‌ the value or name that it resolves to.‌ Preliminary results have been presented at the JFLA'25‌ French workshop 23. An article submission is‌ under preparation.

7.3.10 Binding Boolean Expressions and Extended‌ Pattern Matching

Participants: Arthur Charguéraud, Yanni Lefki‌.

Functional programming languages include various pattern matching‌ features, such as guarded patterns, matching by custom‌ predicate, active patterns, synonymous patterns, etc. Besides, several‌ languages include mechanisms for binding names as part‌ of a boolean expression that appears in either‌ an if-statement, a while-loop condition, or a pattern‌ guard. These names may be bound either with‌ a simple let-binding or via a test performed‌ using pattern-matching. All these features are useful in‌ practice, yet it appears that no mainstream language‌ supports them all at once. In this work,‌ we present a core language that consists of‌ a small number of constructs that suffice to‌ encode and combine all the desired features of‌ pattern matching and binding boolean expressions. Thereby, we‌ hope to consolidate existing knowledge on the topics‌ of pattern matching and generalized forms of boolean‌ expressions, through a streamlined presentation. We expect it‌ to be useful not only for pedagogical purposes,‌ but also potentially for simplifying the work of‌ compiler developers. This work has been presented at‌ the ML family workshop (ML'25) colocated with ICFP.‌ An article submission is under preparation.

8 Partnerships‌ and cooperations

8.1 International initiatives

8.1.1 Participation in‌ other International Programs

CrOptAI (Sophie-Germain Program)

Participants: Thomas‌ Koehler, Valeran Maytie, Cedric Bastoul.‌

Title:
Cross-Stack Optimisation for AI
Partner Institutions:
LIB‌ UR 7534, Université Bourgogne Europe, France; University of‌ Edinburgh, United Kingdom.
Date/Duration:
from November 1, 2025 to October 31, 2026‌ (1 Year).
Principal Investigators:‌
Thomas Koehler, Annabelle Gillet‌‌ (LIB), Eric Leclercq (LIB)
Funding Impact:
This funding‌ will enable collaboration and‌ synchronisation between the partners‌‌ and their PhD researchers: research visits, conference trips,‌ hardware acquisition. We will‌ lay the foundations for‌‌ further collaboration and funding.
Research Project:
Improving the‌ efficiency of artificial intelligence‌ computing is critical. Further,‌‌ best performance is achieved through optimisation decisions that‌ cut through the entire‌ software stack, from high-level‌‌ algorithmic choices down to hardware execution choices. Our‌ project is to explore‌ novel approaches to cross-stack‌‌ optimisation, in order to improve artificial intelligence performance‌ while lowering engineering costs.‌

8.2 International research visitors‌‌

8.2.1 Visits of international scientists

Bastian Köpcke (postdoc‌ at TU Berlin, Germany)‌ visited CAMUS to collaborate‌‌ with Julien De Castelnau, Thomas Koehler and Arthur‌ Charguéraud on verified code‌ optimization for GPUs (1‌‌ week research stay).
Reuben Carolan (PhD at University‌ of Edinburgh, UK) visited‌ CAMUS to collaborate with‌‌ Valéran Maytie, Thomas Koehler and Cedric Bastoul on‌ sketch-guided polyhedral compilation (1‌ week research stay).

8.2.2‌‌ Visits to international teams

Thomas Koehler visited Eva‌ Darulova and her Datalogi‌ team at Uppsala University,‌‌ Sweden for one week in September 2025. Eva‌ and Thomas are working‌ towards a publication on‌‌ combining program optimization with numerical analysis. This comes‌ after co-supervising two Master‌ students, Simon Björklund and‌‌ Filip von Knorring, who finished their thesis in‌ 2025.

8.3 European initiatives‌

8.3.1 Horizon Europe

MICROCARD-2‌‌ Centre of Excellence (EuroHPC and ANR)

Participants: Vincent‌ Loechner, Stephane Genaud‌, Cedric Bastoul,‌‌ Adilla Susungi, Antoine Pierquin.

Title:
MICROCARD-2:‌ numerical modeling of cardiac‌ electrophysiology at the cellular‌‌ scale
Duration:
from November 1, 2024 to April‌ 30, 2027
Partners:
Inria,‌ France; Karlsruher Institut Für‌‌ Technologie, Germany; Megware, Germany; Simula Research Laboratory (Simula),‌ Norway; Technical University München‌ (TUM), Germany; Università degli‌‌ Studi di Pavia, Italy; Università di Trento (UTrento),‌ Italy; Université de Bordeaux,‌ France; Université de Strasbourg,‌‌ France.
Coordinator:
Mark Potse, Université de Bordeaux‌
WP4 leader:
Vincent Loechner‌
Summary:

The MICROCARD-2 project‌‌ is coordinated by Université de Bordeaux and involves‌ the Inria teams Carmen,‌ Cardamom, Storm and TADaaM‌‌ in Bordeaux, and CAMUS in Strasbourg, among a‌ total of ten partner‌ institutions in France, Germany,‌‌ Italy, and Norway. This Centre of Excellence for‌ numerical modeling of cardiac‌ electrophysiology at the cellular‌‌ scale builds on the MICROCARD(-1) project (2021–2024), and‌ has the same website‌.

The modelling of‌‌ cardiac electrophysiology at the cellular scale requires thousands‌ of model elements per‌ cell, of which there‌‌ are billions in a human heart. Even for‌ small tissue samples such‌ models require at least‌‌ exascale supercomputers. In addition the production of meshes‌ of the complex tissue‌ structure is extremely challenging,‌‌ even more so at this scale. MICROCARD-2 works,‌ in concert, on every‌ aspect of this problem:‌‌ tailored numerical schemes, linear-system solvers, and preconditioners; dedicated‌ compilers to produce efficient‌ system code for different‌‌ CPU and GPU architectures‌ (including the EPI and other ARM architectures); mitigation‌ of energy usage; mesh production and partitioning; simulation‌ workflows; and benchmarking.

The contribution of the CAMUS‌ team concerns code optimization of the ionic models,‌ and implies the MLIR compiler frontend and SIMD‌ code generation for CPUs, plus GPU (Nvidia and‌ AMD) code generation. An engineer and a junior‌ researcher have been hired from Jan./Feb. 2025.

8.4‌ National initiatives

8.4.1 ANR OptiTrust

Participants: Arthur Charguéraud‌, Thomas Koehler, Guillaume Bertholon, Elian‌ Morel, Julien François de Castelnau, Jens‌ Gustedt.

Turning a high-level, unoptimized algorithm into‌ a high-performance code can take weeks, if not‌ months, for an expert programmer. The challenge is‌ to take full advantage of vectorized instructions, of‌ all the cores and all the servers available,‌ as well as to optimize the data layout,‌ maximize data locality, and avoid saturating the memory‌ bandwidth. In general, annotating the code with "pragmas"‌ is insufficient, and domain-specific languages are too restrictive.‌ Thus, in most cases, the programmer needs to‌ write, by hand, a low-level code that combines‌ dozens of optimizations. This approach is not only‌ tedious and time-consuming, it also degrades code readibility,‌ harms code maintenance, and can result in the‌ introduction of bugs. A promising approach consists of‌ deriving an HPC code via a series of‌ source-to-source transformations guided by the programmer. This approach‌ has been successfully applied in niche domains, such‌ as image processing and machine learning. We aim‌ to generalize this approach to optimize arbitrary code.‌ Furthermore, the OptiTrust project aims at obtaining formal‌ guarantees on the output code. A number of‌ these transformations are correct only under specific hypotheses.‌ We will formalize these hypotheses, and investigate which‌ of them can be verified by means of‌ static analysis. To handle the more complex hypotheses,‌ we will transform not just code but also‌ formal invariants attached to the code. Doing so‌ will allow exploiting invariants expressed on the original‌ code for justifying transformations performed at the n-th‌ step of the transformation chain.

Funding: ANR
Start:‌ October 2022
End: September 2028
Coordinator: Arthur Charguéraud‌ (Inria)
Partners: Inria team Camus (Strasbourg), Inria team‌ MACARON (formerly TONUS) (Strasbourg), Inria team Cambium (Paris),‌ Inria team CASH (Lyon), CEA team LIST

8.4.2‌ ANR AUTOSPEC

Participants: Bérenger Bramas, Philippe Clauss‌, Stéphane Genaud, Marek Felosci, Anastasios‌ Souris.

The AUTOSPEC project aims to create‌ methods for automatic task-based parallelization and to improve‌ this paradigm by increasing the degree of parallelism‌ using speculative execution. The project will focus on‌ source-to-source transformations for automatic parallelization, speculative execution models,‌ DAG scheduling, and the activation mechanisms for speculative‌ execution. With this aim, the project will rely‌ on a source-to-source compiler that targets the C++‌ language, a runtime system with speculative execution capabilities,‌ and an editor (IDE) to enable compiler-guided development.‌ The outcomes from the project will be open-source‌ with the objective of developing a user community. The benefits will be‌ of great interest both‌ for developers who want‌‌ to use an automatic parallelization method, but also‌ for high-performance programming experts‌ who will benefit from‌‌ improvements of the task-based programming. The results of‌ this project will be‌ validated in various applications‌‌ such as a protein complexes simulation software, and‌ widely used open-source software.‌ The aim will be‌‌ to cover a wide range of applications to‌ demonstrate the potential of‌ the methods derived from‌‌ this project while trying to establish their limitations‌ to open up new‌ research perspectives.

Funding: ANR‌‌ (JCJC)
Start: October 2021
End: September 2025
Coordinator:‌ Bérenger Bramas

8.4.3 Exa-SofT‌ project, PEPR NumPEx

Participants:‌‌ Bérenger Bramas, Philippe Clauss, Raphael Colin‌, Ugo Battiston,‌ Erwan Auer.

Though‌‌ significant efforts have been devoted to the implementation‌ and optimization of several‌ crucial parts of a‌‌ typical HPC software stack. Most HPC experts agree‌ that exascale supercomputers will‌ raise new challenges, mostly‌‌ because the trend in exascale compute-node hardware is‌ toward heterogeneity and scalability.‌ Compute nodes of future‌‌ systems will have a combination of regular CPUs‌ and accelerators (typically GPUs),‌ along with a diversity‌‌ of GPU architectures. Meeting the needs of complex‌ parallel applications and the‌ requirements of exascale architectures‌‌ raises numerous challenges which are still left unaddressed.‌ As a result, several‌ parts of the software‌‌ stack must evolve to better support these architectures.‌ More importantly, the links‌ between these parts must‌‌ be strengthened to form a coherent, tightly integrated‌ software suite. The Exa-SofT‌ project aims at consolidating‌‌ the exascale software ecosystem by providing a coherent,‌ exascale-ready software stack featuring‌ breakthrough research advances enabled‌‌ by multidisciplinary collaborations between researchers. The main scientific‌ challenges we intend to‌ address are: productivity, performance‌‌ portability, heterogeneity, scalability and resilience, performance, and energy‌ efficiency.

Philippe Clauss is‌ managing the work package‌‌ 2 "Just-in-Time code optimization with continuous feedback loop"‌ of this project. He‌ is also involved in‌‌ two major tasks of this package devoted (1)‌ to the integration of‌ polyhedral optimization techniques in‌‌ the Kokkos framework and (2) to the development‌ of an dynamic multi-versioning‌ system.

Funding: PEPR NumPEx‌‌
Start: September 2023
End: August 2028
Coordinator: Raymond‌ Namyst (Inria STORM)
WP2‌ co-leader: Philippe Clauss

8.4.4‌‌ PEPR CAMELIA

Participants: Thomas Koehler, Cedric Bastoul‌, Arthur Charguéraud.‌

Funding:
PEPR (3rd type)‌‌
Title:
Composants pour l'Accélération Matérielle et Logicielle de‌ l'IA
Duration:
from March‌ 2026 to 2032 (6‌‌ years).
Coordinators:
Cédric Auliac (CEA), Olivier Sentieys (Inria‌ TARAN)
WP4 Coordinators:
Fabrice‌ Rastello (Inria CORSE), H.P.‌‌ Charles (CEA)
WP4.2 Coordinator:
Thomas Koehler
Summary:

The‌ French government requires sovereign‌ access to key components‌‌ required for AI and its acceleration. In this‌ context, the ASIC and‌ Numeric program agencies backed‌‌ by CEA and Inria were trusted with proposing‌ a research and attractivity‌ strategy. This research program‌‌ complements other national initiatives, with a focus on‌ developing modular hardware acceleration‌ components and their software‌‌ stack. WP4 focuses on‌ the software aspect of the project. WP4.2 tackles‌ program representation and compilation challenges: from high-level domain-specific‌ languages down to low-level hardware targets, their runtime‌ and ISAs.

The contribution of the CAMUS team‌ is to coordinate WP4.2 and to develop new‌ compilation techniques that facilitate prototyping AI code optimizations‌ at all abstraction levels, from tensor expressions down‌ to hardware ISAs. Ideally, these techniques enable producing‌ highly optimized AI code for new accelerators without‌ having to rewrite hand-optimized libraries or to redesign‌ optimizing compilers.

9 Dissemination

9.1 Promoting scientific activities‌

9.1.1 Scientific events: organisation

General chair, scientific chair‌

Thomas Koehler : Séminaire Pile Logicielle et Compilation‌ pour l'IA, Aussois

9.1.2 Scientific events: selection

Member‌ of the conference program committees

Arthur Charguéraud :‌ SPAA'25 (ACM Symposium on Parallelism in Algorithms and‌ Architectures)
Arthur Charguéraud : ML'25 (ML family workshop,‌ colocated with ICFP)
Arthur Charguéraud : PLDI'25 (ACM‌ Conference on Programming Language Design and Implementation)
Thomas‌ Koehler : PLDI'25 (ACM Conference on Programming Language‌ Design and Implementation)
Cedric Bastoul : CC'26 (Intl‌ Conference on Compiler Construction)

Reviewer

Thomas Koehler :‌ EGRAPHS'25 workshop at PLDI'25

9.1.3 Journal

Reviewer -‌ reviewing activities

Thomas Koehler : TACO (ACM)
Vincent‌ Loechner : TACO (ACM), Journal of Symbolic Computation‌ (Elsevier)
Philippe Clauss : Journal of Supercomputing
Arthur‌ Charguéraud : Journal of Functional Programming

9.1.4 Invited‌ talks

Philippe Clauss has been invited as keynote‌ speaker to the 15th International Workshop on Polyhedral‌ Compilation Techniques (IMPACT 2025), January 22, 2025, Barcelona,‌ Spain : Counting-based Loop Optimization.
Thomas Koehler‌ : Guided Equality Saturation, AST Lab, ETH Zürich,‌ Switzerland
Thomas Koehler : A Case For Interactive‌ Optimization Assistants, User-Schedulable Languages Workshop, ASPLOS, Rotterdam, Netherlands‌
Arthur Charguéraud : Binding Boolean Expressions and Extended‌ Pattern Matching, Inria Cambium team, Paris, France.

9.1.5‌ Scientific expertise

Arthur Charguéraud has been reviewer for‌ 2 ANR projects.
Jens Gustedt is a member‌ of the ISO/IEC working groups ISO/IEC PL1/SC22/WG14 and‌ WG21 for the standardization of the C and‌ C++ programming languages, respectively.

9.1.6 Research administration

Stéphane‌ Genaud is the head of the ICPS team‌ for the ICube lab. Arthur Charguéraud is vice-head.‌
Jens Gustedt is deputy director of the ICube‌ lab, responsible for the IT and CS policy‌ and for the coordination between the lab and‌ the Inria center. In that function, he also‌ represents ICube on the board of the project‌ committee of the Inria Centre at Université de‌ Lorraine.
Jens Gustedt is member of the steering‌ committee of the interdisciplanary institute IRMIA++ of Strasbourg‌ University.
Jens Gustedt is (together with Philippe Helluy‌ of the IRMA lab) responsible for the Inria‌ PIQ program for the Strasbourg site.
Arthur Charguéraud‌ is a member of the COMIPERS jury for‌ PhD and postdoc grants at Inria Nancy Grand-Est.‌
Arthur Charguéraud represents Inria at the meetings of‌ the MSII doctoral school (Mathématiques, Sciences de l'Information‌ et de l'Ingénieur, ED269) in Strasbourg.
Bérenger Bramas‌ is a member of the CDT and IES committee at Inria Nancy‌ Grand-Est.

9.2 Teaching -‌ Supervision - Juries -‌‌ Educational and pedagogical outreach

9.2.1 Teaching

Licence:
- Philippe‌ Clauss , Computer architecture,‌ 18h, L2, Université de‌‌ Strasbourg, France
- Vincent Loechner , Algorithmics and programmation,‌ 82h, L1, Université de‌ Strasbourg, France
- Vincent Loechner‌‌ , System administration, 40h, Licence Pro, Université de‌ Strasbourg, France
- Vincent Loechner‌ , Parallel programming, 18h,‌‌ M1, Université de Strasbourg, France
- Bérenger Bramas ,‌ System programming, 24h, L2,‌ UFAZ, France-Azerbadjian
- Alain Ketterlin‌‌ , Culture et pratique de l'Informatique, L1 Math-Info,‌ 48h, Université de Strasbourg,‌ France
- Alain Ketterlin ,‌‌ Programmation système, L2 Math-Info, 40h, Université de Strasbourg,‌ France
- Alain Ketterlin ,‌ Algorithmique et programmation, L1‌‌ Math-Info, 66h, Université de Strasbourg, France
- Alain Ketterlin‌ , Software Engineering (an‌ Anglais), L2 Math-Info, 64h,‌‌ Université de Strasbourg, France
- Stéphane Genaud , Algorithmics‌ and programmation, 82h, L1,‌ Université de Strasbourg, France‌‌
- Stéphane Genaud , Data Structures & Algorithms 2,‌ 25h, L2, UFAZ, France-Azerbadjian‌
- Stéphane Genaud , Parallel‌‌ programming, 30h, M1, Université de Strasbourg, France
Master:‌
- Philippe Clauss , Compilation,‌ 132h, M1, Université de‌‌ Strasbourg, France
- Philippe Clauss , Real-time programming and‌ system, 37h, M1, Université‌ de Strasbourg, France
- Philippe‌‌ Clauss , Code optimization and transformation, 31h, M1,‌ Université de Strasbourg, France‌
- Vincent Loechner , Real-time‌‌ systems, 12h, M1, Université de Strasbourg, France
- Bérenger‌ Bramas , Compilation and‌ Performance, 24h, M2, Université‌‌ de Strasbourg, France
- Bérenger Bramas , Compilation, 24h,‌ M1, Université de Strasbourg,‌ France
- Cedric Bastoul ,‌‌ Parallel programming, 10h, M1, Université de Strasbourg, France‌
- Cedric Bastoul , Compilation,‌ 36h, M1, Université de‌‌ Strasbourg, France
- Cedric Bastoul , Research & Development‌ Project, 20h, M2, Université‌ de Strasbourg, France
- Stéphane‌‌ Genaud , Cloud and Virtualization, 12h, M1, Université‌ de Strasbourg, France
- Stéphane‌ Genaud , Large-Scale Data‌‌ Processing, 15h, M1, Université de Strasbourg, France
- Stéphane‌ Genaud , Distributed Storage‌ and Processing, 15h, M2,‌‌ Université de Strasbourg, France
Eng. School:
- Vincent Loechner‌ , Parallel programming, 20h,‌ Telecom Physique Strasbourg -‌‌ 3rd year, Université de Strasbourg, France
- Stéphane Genaud‌ , Introduction to Operating‌ Systems, 16h, Telecom Physique‌‌ Strasbourg - 1st year, Université de Strasbourg, France‌
- Stéphane Genaud , Object-Oriented‌ Programming, 60h, Telecom Physique‌‌ Strasbourg - 1st year, Université de Strasbourg, France‌
Free online course: Arthur‌ Charguéraud has made publicly‌‌ available the solutions to the 125+ exercises of‌ his all-in-Rocq course on‌ the Foundations of Separation‌‌ Logic.
DU IRMIA++ interdisciplinary seminar, as well as‌ seminar of the doctoral‌ school ED269 : Arthur‌‌ Charguéraud , Introductory course to Interactive Program Verification‌ (3h), Université de Strasbourg,‌ France
Corps des mines:‌‌ Arthur Charguéraud , Design and Implementation of Educational‌ Software (10h), Paris, France‌

Teaching tracks:

Philippe Clauss‌‌ is in charge of the master's degree in‌ Computer Science of the‌ University of Strasbourg, since‌‌ Sept. 2020.
Stéphane Genaud is in charge of‌ the Bachelor in Computer‌ Science and co-head of‌‌ Master Data Science and Artificial Intelligence at UFAZ‌ (Baku, Azerbadjian) who delivers‌ Unistra diplomas. Since resp.‌‌ Aug. 2023 and Aug.‌ 2024.
Cedric Bastoul is in charge of the‌ Software Science and Engineering track of the Master's‌ degree in Computer Science of the University of‌ Strasbourg, since Sept. 2025.

9.2.2 Supervision

PhD completed:‌

PhD defended in 2025: Guillaume Bertholon, Interactive Compilation‌ via Trusworthy Source-to-Source Transformations, advised by Arthur Charguéraud‌ , since Sept 2022.
PhD defended in 2025:‌ Clément Rossetti, Algebraic Tiling: Volume-guided Tiling of Parallel‌ Loops for Near-Perfect Load Balancing, advised by Philippe‌ Clauss , since Oct 2022.
PhD defended in‌ 2025: Hayfa Tayeb, Efficient scheduling strategies for the‌ task-based parallelization, advised by Bérenger Bramas , Abdou‌ Guermouche (Inria project-team TOPAL), Mathieu Faverge (Inria project-team‌ TOPAL), since Nov 2021.
PhD defended in 2025:‌ David Algis, Hybridization of the Tessendorf method and‌ Smoothed Particle Hydrodynamics for real-time ocean simulation., advised‌ by Bérenger Bramas , Emmanuelle Darles (XLim), Lilian‌ Aveneau (XLim lab), since Oct 2022.

PhD in‌ progress:

PhD in progress: Yanni Lefki, Foundational Verification‌ of Interactively Optimized Programs, is advised by Arthur‌ Charguéraud , since Sept 2025.
PhD in progress:‌ Raphaël Colin, Runtime multi-versioning of parallel tasks, advised‌ by Philippe Clauss and Thierry Gautier (Inria project-team‌ Avalon), since Oct. 2023.
PhD in progress: Ugo‌ Battiston, C++ complexity disambiguation for advanced optimizing and‌ parallelizing code transformations, advised by Philippe Clauss and‌ Marc Pérache (CEA), since Oct. 2023.
PhD in‌ progress: Tom Hammer, Synergie entre ordonnancement et optimisation‌ des accès mémoire dans le modèle polyédrique, advised‌ by Stéphane Genaud and Vincent Loechner , since‌ Sept 2023.
PhD in progress: Valéran Maytie ,‌ Optimizing LLMs with Sketch-Guided Polyhedral Compilation, advised by‌ Thomas Koehler and Cedric Bastoul , since Oct‌ 2025.

9.2.3 Juries

Cedric Bastoul has been reviewer‌ and member of the jury for the PhD‌ thesis of Vincent Alba at the University of‌ Bordeaux
Cedric Bastoul has been president of the‌ jury for the PhD thesis of Lana Scravaglieri‌ at the University of Bordeaux
Cedric Bastoul has‌ been president of the jury for the Habilitation‌ thesis of Quentin Bramas at the University of‌ Strasbourg
Arthur Charguéraud has been reviewer and member‌ of the jury for the PhD thesis of‌ Josué Moreau at the University Paris–Saclay
Arthur Charguéraud‌ has been garant and member of the jury‌ for the Habilitation thesis of Bérenger Bramas at‌ the University of Strasbourg
Jens Gustedt has been‌ reviewer and member of the jury for the‌ thesis of Sébastien Michelland at the Université Grenoble‌ Alpes

9.3 Popularization

9.3.1 Specific official responsibilities in‌ science outreach structures

Arthur Charguéraud is co-founder and‌ vice-president of the non-profit organization France-ioi. This‌ organization is in charge of the French participation‌ to international olympiads in informatics. It also organizes‌ numerous contests, such as the Concours Castor, Concours‌ Algorea, concours Alkindi, and the French Olympiads in‌ Informatics.
Arthur Charguéraud is a co-organizer of the‌ Concours Castor informatique. The purpose of the‌ Concours Castor in to introduce pupils, from CM1‌ to Terminale, to computer sciences. 650,000 teenagers played with the interactive exercises‌ in November and December‌ 2025.

10 Scientific production‌‌

10.1 Major publications

1 inproceedingsU. A.Umut‌ A Acar, V.‌Vitaly Aksenov, A.‌‌Arthur Charguéraud and M.Mike Rainey. Provably‌ and Practically Efficient Granularity‌ Control.PPoPP 2019‌‌ - Principles and Practice of Parallel ProgrammingWashington‌ DC, United StatesFebruary‌ 2019HAL DOI
2‌‌ inproceedingsP.Philippe Clauss, E.Ervin Altintas‌ and M.Matthieu Kuhn‌. Automatic Collapsing of‌‌ Non-Rectangular Loops.Parallel and Distributed Processing Symposium‌ (IPDPS), 2017Orlando, United‌ StatesIEEE InternationalMay‌‌ 2017, 778 - 787HAL DOI back‌ to text back to‌ text
3 inproceedingsP.‌‌Philippe Clauss. Counting Solutions to Linear and‌ Nonlinear Constraints Through Ehrhart‌ Polynomials: Applications to Analyze‌‌ and Transform Scientific Programs.ICS, International Conference‌ on SupercomputingACM International‌ Conference on Supercomputing 25th‌‌ Anniversary VolumeMunich, Germany2014HAL DOI back‌ to text
4 article‌P.Philippe Clauss,‌‌ F. J.Federico Javier Fernández, D.Diego‌ Garbervetsky and S.Sven‌ Verdoolaege. Symbolic polynomial‌‌ maximization over convex sets and its application to‌ memory requirement estimation.‌IEEE Transactions on Very‌‌ Large Scale Integration (VLSI) Systems178August‌ 2009, 983-996HAL‌DOI
5 articleP.-N.‌‌Pierre-Nicolas Clauss and J.Jens Gustedt. Iterative‌ Computations with Ordered Read-Write‌ Locks.Journal of‌‌ Parallel and Distributed Computing7052010,‌ 496-504HAL DOI
6‌ articleP.Philippe Clauss‌‌ and V.Vincent Loechner. Parametric Analysis of‌ Polyhedral Iteration Spaces.‌Journal of Signal Processing‌‌ Systems192July 1998, 179-194HAL‌DOI back to text‌
7 bookJ.Jens‌‌ Gustedt. Modern C.ManningNovember 2019‌HAL
8 articleA.‌Alexandra Jimborean, P.‌‌Philippe Clauss, J.-F.Jean-François Dollinger, V.‌Vincent Loechner and M.‌Martinez Juan Manuel.‌‌ Dynamic and Speculative Polyhedral Parallelization Using Compiler-Generated Skeletons‌.International Journal of‌ Parallel Programming424‌‌August 2014, 529-545HAL
9 inproceedingsA.‌Alain Ketterlin and P.‌Philippe Clauss. Prediction‌‌ and trace compression of data access addresses through‌ nested loop recognition.‌6th annual IEEE/ACM international‌‌ symposium on Code generation and optimizationProceedings of‌ the 6th annual IEEE/ACM‌ international symposium on Code‌‌ generation and optimizationBoston, United StatesACMApril‌ 2008, 94-103HAL‌DOI back to text‌‌
10 inproceedingsA.Alain Ketterlin and P.Philippe‌ Clauss. Profiling Data-Dependence‌ to Assist Parallelization: Framework,‌‌ Scope, and Optimization.MICRO-45, The 45th Annual‌ IEEE/ACM International Symposium on‌ MicroarchitectureVancouver, CanadaDecember‌‌ 2012HAL
11 articleB.Benoit Pradelle,‌ A.Alain Ketterlin and‌ P.Philippe Clauss.‌‌ Polyhedral parallelization of binary code.ACM Transactions‌ on Architecture and Code‌ Optimization84January‌‌ 2012, 39:1--39:21HALDOI
12 articleA.‌Aravind Sukumaran-Rajam and P.‌Philippe Clauss. The‌‌ Polyhedral Model of Nonlinear Loops.ACM Transactions‌ on Architecture and Code‌ Optimization124January‌‌ 2016HAL DOI

10.2‌ Publications of the year

International journals

13 article‌D.David Algis, B.Bérenger Bramas,‌ E.Emmanuelle Darles and L.Lilian Aveneau.‌ Arc Blanc: a real time ocean simulation framework‌.Journal of Computer Graphics Techniques1401‌March 2025, 70-115HAL back to text‌
14 articleD.David Algis, B.Bérenger‌ Bramas, E.Emmanuelle Darles and L.Lilian‌ Aveneau. InteropUnityCUDA: A Tool for Interoperability Between‌ Unity and CUDA.Software: Practice and Experience‌February 2025HAL DOIback to text
15‌ articleD.David Algis and B.Bérenger Bramas‌. Exploiting ray tracing technology through OptiX to‌ compute particle interactions with cutoff in a 3D‌ environment on GPU.International journal of advanced‌ computer science and applications (IJACSA)Volume 16 Issue‌ 2March 2025HALback to text
16‌ articleP.Paul Cardosi and B.Bérenger Bramas‌. Specx: a C++ task-based runtime system for‌ heterogeneous distributed architectures.PeerJ Computer ScienceJuly‌ 2025HAL DOI back to text
17 article‌A.Alexandre Moine, A.Arthur Charguéraud and‌ F.François Pottier. Will it Fit? Verifying‌ Heap Space Bounds of Concurrent Programs under Garbage‌ Collection.ACM Transactions on Programming Languages and‌ Systems (TOPLAS)February 2025HAL DOI back to‌ text
18 articleE.Etienne Ndamlabin and B.‌Bérenger Bramas. RSCHED: An effective heterogeneous resources‌ management for simultaneous execution of task-based applications.‌International journal of advanced computer science and applications‌ (IJACSA)Volume 16 Issue 2March 2025HAL‌DOI back to text
19 articleM.Marcus‌ Rossel, R.Rudi Schneider, T.Thomas‌ Koehler, M.Michel Steuwer and A.Andrés‌ Goens. Towards Pen-and-Paper-Style Equational Reasoning in Interactive‌ Theorem Provers by Equality Saturation.Proceedings of‌ the ACM on Programming Languages10POPLJanuary‌ 2026HAL DOI back to text
20 article‌R.Rudi Schneider, M.Marcus Rossel,‌ A.Amir Shaikhha, A.Andrés Goens,‌ T.Thomas Kœhler and M.Michel Steuwer.‌ Slotted E-Graphs: First-Class Support for (Bound) Variables in‌ E-Graphs.Proceedings of the ACM on Programming‌ Languages9PLDIJune 2025, 1888-1910HAL‌DOI back to text

Invited conferences

21 inproceedings‌D.David Algis, B.Bérenger Bramas,‌ E.Emmanuelle Darles and L.Lilian Aveneau.‌ Arc Blanc: A Real-Time Ocean Simulator.I3D‌ 2025 – ACM SIGGRAPH Symposium on Interactive 3D‌ Graphics and GamesJersey City, NJ, United States‌May 2025HAL back to text

International peer-reviewed‌ conferences

22 inproceedingsG.Guillaume Bertholon and A.‌Arthur Charguéraud. Bidirectional Translation between a C-like‌ Language and an Imperative Lambda-calculus.36es Journées‌ Francophones des Langages Applicatifs (JFLA 2025)Roiffé, France‌January 2025HAL back to text
23 inproceedings‌A.Arthur Charguéraud, M.Martin Bodin and‌ L.Louis Riboulet. Typechecking of Overloading in‌ Programming Languages and Mechanized Mathematics.36es Journées‌ Francophones des Langages Applicatifs (JFLA 2025)Roiffé, FranceJanuary 2025HAL back‌ to text
24 inproceedings‌C.Camilla Fiorini,‌‌ C.Clément Flint, L.Louis Fostier,‌ E.Emmanuel Franck,‌ R.Reyhaneh Hashemi,‌‌ V.Victor Michel-Dansac and W.Wassim Tenachi.‌ Generalizing the SINDy approach‌ with nested neural networks‌‌.CEMRACS 2023 - Scientific Machine LearningCEMRACS‌ 2023 - Scientific Machine‌ Learning81Marseille (CIRM,‌‌ Centre International de Rencontres Mathématiques), FranceOctober 2025‌, 168-192HAL DOI‌
25 inproceedingsJ.Julien‌‌ Gaupp and B.Bérenger Bramas. Contraintes d'OpenMP‌ pour la parallélisation automatique‌ à base de tâches‌‌.Conférence francophone d'informatique en Parallélisme, Architecture et‌ Système (COMPAS 2025)Bordeaux,‌ FranceJune 2025HAL‌‌back to text
26 inproceedingsT.Tom Hammer‌, S.Stéphane Genaud‌ and V.Vincent Loechner‌‌. Guiding Polyhedral Scheduling for Vectorization through Constraints‌ Generated from an SLP‌ Algorithm.IMPACT '26‌‌ - 16th International Workshop on Polyhedral Compilation Techniques‌Krakow, PolandJanuary 2026‌HAL back to text‌‌
27 inproceedingsV.Valeran Maytié, R.Reuben‌ Carolan, C.Christophe‌ Alias, C.Cedric‌‌ Bastoul and T.Thomas Koehler. Towards Optimising‌ Programs with Sketch-Guided Polyhedral‌ Compilation.IMPACT 2026‌‌ - International Workshop on Polyhedral Compilation TechniquesCracovie‌ (PL), PolandJanuary 2026‌HAL back to text‌‌
28 inproceedingsA.Arun Thangamani, V.Vincent‌ Loechner and S.Stéphane‌ Genaud. Extending Polygeist‌‌ to Generate OpenMP SIMD and GPU MLIR Code‌.LNCSEuro-Par 2024‌ - 30th International European‌‌ Conference on Parallel and Distributed Computing - PhD‌ SymposiumMadrid, SpainMay‌ 2025HAL back to‌‌ text
29 inproceedingsA.Albert d'Aviau de Piolant‌, H.Hayfa Tayeb‌, B.Bérenger Bramas‌‌, M.Mathieu Faverge, A.Abdou Guermouche‌ and A.Amina Guermouche‌. Improving energy efficiency‌‌ of HPC applications using unbalanced GPU power capping‌.HCW (Ipdps workshop)‌Milan (Italie), ItalyJune‌‌ 2025HAL back to text

Conferences without proceedings‌

30 inproceedingsB.Bérenger‌ Bramas, M.Marek‌‌ Felšöci and S.Stéphane Genaud. Parallélisation automatique‌ à base de tâches‌ avec modélisation de performances‌‌.COMPAS 2025 - Conférence francophone d'informatique en‌ Parallélisme, Architecture et Système‌Bordeaux (France), FranceJune‌‌ 2025HAL back to text
31 inproceedingsR.‌Raphaël Colin. Automatic‌ Multi-Versioning of Computation Kernels‌‌.Conférence francophone d'informatique en Parallélisme, Architecture et‌ Système (COMPAS 2025)Bordeaux,‌ FranceJune 2025HAL‌‌back to text
32 inproceedingsA.Atoli Huppé‌, B.Bérenger Bramas‌, C.Clément Flint‌‌ and S.Stéphane Genaud. A fast implementation‌ of 3D wavelet compression‌ on GPU.COMPAS‌‌ 2025 - Conférence francophone d'informatique en Parallélisme, Architecture‌ et SystèmeBordeaux, France‌October 2025HAL back‌‌ to text
33 inproceedingsA.Alain Ketterlin.‌ Polynomial Loop Recognition in‌ Traces.IMPACT 2025‌‌ -- 15th International Workshop on Polyhedral Compilation Techniques‌Barcelona, SpainJanuary 2025‌HAL back to text‌‌
34 inproceedingsV.Vincent Loechner and D.Dhimiter‌ Riza. Z-Polyhedra and‌ LBLs in PolyLib.‌‌IMPACT 2026 - 16th‌ International Workshop on Polyhedral Compilation TechniquesKrakow, Poland‌January 2026HAL back to text

Scientific books‌

35 bookJ.Jens Gustedt. Modern C:‌ Covers the C23 standard.ManningSeptember 2025‌HAL back to textback to text

Doctoral‌ dissertations and habilitation theses

36 thesisB.Bérenger‌ Bramas. High-Performance Computing: from Optimization to Automation‌.UnistraOctober 2025HAL

Reports & preprints‌

37 reportE.Emmanuel Agullo, B.Bérenger‌ Bramas, O.Olivier Coulaud and A.Antoine‌ Gicquel. Unified and Composable Rank-Structured Hierarchical Matrix–Vector‌ Multiplication: Unified and Composable Rank-Structured Hierarchical Matrix–Vector Multiplication}‌.RR-9611InriaMarch 2026HAL
38 report‌H.Hans Boehm and J.Jens Gustedt.‌ Retire the concept of consume operations.N3607‌ISO JCT1/SC22/WG14June 2025HAL back to text‌
39 reportJ.Jens Gustedt. Add type-safe‌ minimum and maximum type-generic macros.ISO JCT1/SC22/WG14‌December 2025HAL back to text
40 report‌J.Jens Gustedt. Another daemon: waiting for‌ condition variables.N3559ISO JCT1/SC22/WG14May 2025‌HAL back to text
41 reportJ.Jens‌ Gustedt. C semantics for contracts.N3739‌ISO JCT1/SC22/WG14November 2025HAL back to text‌
42 reportJ.Jens Gustedt. Chasing Ghosts‌ I: constant expressions.N3558ISO JCT1/SC22/WG14May‌ 2025HAL back to text
43 reportJ.‌Jens Gustedt. Chasing Ghosts II: accessing allocated‌ storage.N3448ISO JCT1/SC22/WG14January 2025HAL‌back to text
44 reportJ.Jens Gustedt‌. Clarify status of non-returning functions with respect‌ to function attributes.N3494ISO JCT1/SC22/WG14February‌ 2025HAL back to text
45 reportJ.‌Jens Gustedt. Clarify the specification of the‌ width macros.N3496ISO JCT1/SC22/WG14February 2025‌HAL back to text
46 reportJ.Jens‌ Gustedt. Clean up atomics, non-normative changes.‌N3761ISO JCT1/SC22/WG14December 2025HAL back to‌ text
47 reportJ.Jens Gustedt. Even‌ simpler defer for direct integration.N3497ISO‌ JCT1/SC22/WG14February 2025HALback to text
48‌ reportJ.Jens Gustedt. Objects of known‌ constant size.N3508ISO JCT1/SC22/WG14February 2025‌HAL back to text
49 reportJ.Jens‌ Gustedt. Properly specify the interaction of library‌ calls for condition variables.N3764ISO JCT1/SC22/WG14‌December 2025HAL back to text
50 report‌J.Jens Gustedt. Properly specify the interaction‌ of library calls for mutexes.N3763ISO‌ JCT1/SC22/WG14December 2025HALback to text
51‌ reportJ.Jens Gustedt. Reproducible expressions.‌N3499ISO JCT1/SC22/WG14February 2025HAL back to‌ text
52 reportJ.Jens Gustedt and J.‌Jeremy Rifkin. The __COUNTER__ predefined macro.‌N3457ISO JCT1/SC22/WG14January 2025HAL back to‌ text
53 reportJ.Jens Gustedt. static_assert‌ without UB.N3525ISO JCT1/SC22/WG14April 2025‌HAL back to text
54 miscN.Nicole‌ Heinimann, T.Thomas Koehler and M.Michel‌ Steuwer. Machine Learning Guided Equality Saturation.June 2025HAL back‌ to text
55 report‌J. A.Javier A.‌‌ Múgica and J.Jens Gustedt. Array subscripting‌ without decay.N3517‌ISO JCT1/SC22/WG14March 2025‌‌HAL back to text

Other scientific publications

56‌ inproceedingsA.Antoine Gicquel‌, O.Olivier Coulaud‌‌ and B.Bérenger Bramas. Towards a composable‌ abstraction of hierarchical methods‌ for matrix-vector product acceleration‌‌.COMPAS 2025 - Conférence francophone d'informatique en‌ Parallélisme, Architecture et Système‌Bordeaux, FranceJune 2025‌‌HAL

10.3 Cited publications

57 phdthesisG.Guillaume‌ Bertholon. Interactive compilation‌ via trustworthy source-to-source transformations‌‌.Université de StrasbourgSeptember 2025HAL back‌ to text
58 mastersthesis‌S.Simon Björklund.‌‌ Numerical Analysis of Highly Performant Functional Array Programs‌.MA ThesisUppsala‌ University, Department of Information‌‌ TechnologyUppsala University, Department of Information Technology2025‌, 37back to‌ text
59 phdthesisC.‌‌Clément Flint. Efficient data compression for high-performance‌ PDE solvers.Université‌ de StrasbourgOctober 2024‌‌HAL back to text
60 inproceedingsY.Yafan‌ Huang, S.Sheng‌ Di, G.Guanpeng‌‌ Li and F.Franck Cappello. cuSZp2: A‌ GPU Lossy Compressor with‌ Extreme Throughput and Optimized‌‌ Compression Ratio.Proceedings of the International Conference‌ for High Performance Computing,‌ Networking, Storage, and Analysis‌‌SC '24Atlanta, GA, USAIEEE Press2024‌, URL: https://doi.org/10.1109/SC41406.2024.00021DOI‌back to text
61‌‌ bookISO/IEC IS 9899:2024: Programming languages - C‌ - A provenance-aware memory‌ object model for C‌‌.pub-ISO:adrpub-ISOMay 2025, 23back‌ to text
62 book‌ISO/IEC IS 9899:2024: Programming‌‌ languages - C.pub-ISO:adrpub-ISOOctober 2024‌, 758back to‌ text
63 inproceedingsA.‌‌Alain Ketterlin. Easy Counting and Ranking for‌ Simple Loops.IMPACT‌ 2024 -- 14th International‌‌ Workshop on Polyhedral Compilation TechniquesMünich, GermanyJanuary‌ 2024HAL back to‌ text
64 miscF.‌‌Filip von Knorring. Exploring Accuracy and Performance‌ Trade-offs in Functional Array‌ Programs.Uppsala University,‌‌ Computing Science2025back to text
65 inproceedings‌V.Valeran Maytié,‌ R.Reuben Carolan,‌‌ C.Christophe Alias, C.Cedric Bastoul and‌ T.Thomas Koehler.‌ Towards Optimising Programs with‌‌ Sketch-Guided Polyhedral Compilation.IMPACT 2026 - International‌ Workshop on Polyhedral Compilation‌ TechniquesCracovie (PL), Poland‌‌January 2026HAL back to text
66 inproceedings‌C.Clément Rossetti and‌ P.Philippe Clauss.‌‌ Algebraic Tiling.IMPACT 2023, 13th International Workshop‌ on Polyhedral Compilation Techniques‌Toulouse, FranceJanuary 2023‌‌HAL back to text
67 articleH.Hayfa‌ Tayeb, L.Ludovic‌ Paillat and B.Bérenger‌‌ Bramas. Autovesk: Automatic Vectorized Code Generation from‌ Unstructured Static Kernels Using‌ Graph Transformations.ACM‌‌ Trans. Archit. Code Optim.211December 2023‌, URL: https://doi.org/10.1145/3631709DOI‌back to text

CAMUS - 2025

CAMUS - 2025

2025Activity report﻿‌​‌Project-TeamCAMUS

Keywords

Computer Science and​​﻿﻿ Digital Science

Other Research﻿​﻿﻿ Topics and Application Domains​‌﻿﻿

1 Team members,﻿​﻿﻿ visitors, external collaborators

Research​‌﻿﻿ Scientists

Faculty​​﻿﻿ Members

Post-Doctoral Fellow

PhD​​﻿﻿ Students

Technical Staff

Interns and​‌﻿﻿ Apprentices

Administrative Assistants​​​‌

2 Overall﻿​​﻿ objectives

3​​​‌ Research program

3.1 Semi-automatic​‌﻿﻿ and assisted code optimization​​﻿﻿

3.2 Fully-automatic﻿​​﻿ code optimization

3.3 Fundamental algorithms﻿﻿﻿‌ & mathematical tools

4 Application domains

5 Highlights of﻿​﻿﻿ the year

6 Latest​​​‌ software developments, platforms, open﻿​﻿﻿ data

6.1 Latest software​‌﻿﻿ developments

6.1.1 TRAHRHE

6.1.2 openCARP

6.1.3 SPECX

6.1.4 Autovesk﻿‌​‌

6.1.5 PolyLib

6.1.6​‌﻿﻿ APOLLO

6.1.7​​﻿﻿ OptiTrust

6.1.8 APAC

6.1.9​​﻿﻿ Rise & Shine

6.1.10 egg-sketches​​​‌

6.1.11 slotted-egraphs

6.1.12 Pesto

6.1.13 StrasGPT﻿﻿﻿‌

7 New​​​‌ results

7.1 Semi-automatic and﻿​﻿﻿ assisted code optimization

7.1.1​‌﻿﻿ OptiTrust: Producing Trustworthy High-Performance​​﻿﻿ Code via Source-to-Source Transformations​​​‌

7.1.2​‌﻿﻿ Sketch-Guided Polyhedral Compilation

7.1.3 Specx: A﻿​﻿﻿ C++ Task-Based Runtime System​‌﻿﻿ for Heterogeneous Distributed Architectures​​﻿﻿

7.1.4 Using​​​‌ the Discrete Wavelet Transform﻿​﻿﻿ for Scientific Data Compression​‌﻿﻿

7.1.5﻿​​﻿ Exploiting Ray Tracing Technology​​​‌ Through OptiX to Compute﻿﻿﻿‌ Particle Interactions with Cutoff﻿‌​‌ in a 3D Environment﻿​​﻿ on GPUs

7.1.6 Real-time﻿‌​‌ ocean simulation

7.2 Fully-automatic﻿​​﻿ code optimization

7.2.1 Algebraic​​​‌ tiling

7.2.2 Connecting Kokkos​​​‌ with the Polyhedral Model﻿​﻿﻿

7.2.3 Automatic Multi-Versioning﻿﻿﻿‌ of Computation Kernels

7.2.4 Dynamic Task Scheduling﻿﻿﻿‌ with Multiple Priorities on﻿‌​‌ Heterogeneous Computing Systems

7.2.5 Scheduling multiple​​﻿﻿ task-based applications on distributed​​​‌ heterogeneous computing nodes

7.2.6 Automatic task-based parallelization​​​‌

7.2.7 Ionic Models﻿​﻿﻿ Code Generation for Heterogeneous​‌﻿﻿ Architectures

7.2.8﻿​​﻿ Machine Learning Guided Equality​​​‌ Saturation

7.2.9 Combining Optimization﻿​​﻿ and Numerical Analysis of​​​‌ Functional Array Programs

7.3 Fundamental​​​‌ algorithms & mathematical tools﻿​﻿﻿

7.3.1 Trahrhe expressions

7.3.2 Z-Polyhedra​​​‌ and LBLs in PolyLib﻿​﻿﻿

7.3.3 Polyhedral Scheduling​‌﻿﻿

7.3.4﻿​​﻿ Integer Polynomials and Polynomial​​​‌ Loops

7.3.5 Slotted E-Graphs

7.3.6 Improvements​‌﻿﻿ of the C programming​​﻿﻿ language

7.3.7 Towards​​​‌ Pen-and-Paper-Style Equational Reasoning in﻿﻿﻿‌ Interactive Theorem Provers by﻿‌​‌ Equality Saturation

7.3.8 Formal Proof of​​​‌ Space Bounds for Concurrent,﻿﻿﻿‌ Garbage-Collected Programs

7.3.9 Typechecking of Overloading﻿﻿﻿‌

7.3.10 Binding​​﻿﻿ Boolean Expressions and Extended​​​‌ Pattern Matching

8 Partnerships​​​‌ and cooperations

8.1 International﻿​﻿﻿ initiatives

8.1.1 Participation in​‌﻿﻿ other International Programs

CrOptAI​​﻿﻿ (Sophie-Germain Program)

8.2 International research visitors﻿‌​‌

8.2.1 Visits of international﻿​​﻿ scientists

8.2.2﻿‌​‌ Visits to international teams﻿​​﻿

8.3 European initiatives﻿﻿﻿‌

8.3.1 Horizon Europe

MICROCARD-2﻿‌​‌ Centre of Excellence (EuroHPC﻿​​﻿ and ANR)

8.4​​​‌ National initiatives

8.4.1 ANR﻿​﻿﻿ OptiTrust

8.4.2​​​‌ ANR AUTOSPEC

8.4.3 Exa-SofT﻿﻿﻿‌ project, PEPR NumPEx

8.4.4﻿‌​‌ PEPR CAMELIA

2025Activity report‌‌Project-TeamCAMUS

Computer Science and Digital Science

Other Research Topics and Application Domains‌

1 Team members, visitors, external collaborators

Research‌ Scientists

Faculty Members

PhD Students

Interns and‌ Apprentices

Administrative Assistants‌

2 Overall objectives

3‌ Research program

3.1 Semi-automatic‌ and assisted code optimization

3.2 Fully-automatic code optimization

3.3 Fundamental algorithms‌ & mathematical tools

5 Highlights of the year

6 Latest‌ software developments, platforms, open data

6.1 Latest software‌ developments

6.1.4 Autovesk‌‌

6.1.6‌ APOLLO

6.1.7 OptiTrust

6.1.9 Rise & Shine

6.1.10 egg-sketches‌

6.1.13 StrasGPT‌

7 New‌ results

7.1 Semi-automatic and assisted code optimization

7.1.1‌ OptiTrust: Producing Trustworthy High-Performance Code via Source-to-Source Transformations‌

7.1.2‌ Sketch-Guided Polyhedral Compilation

7.1.3 Specx: A C++ Task-Based Runtime System‌ for Heterogeneous Distributed Architectures

7.1.4 Using‌ the Discrete Wavelet Transform for Scientific Data Compression‌

7.1.5 Exploiting Ray Tracing Technology‌ Through OptiX to Compute‌ Particle Interactions with Cutoff‌‌ in a 3D Environment on GPUs

7.1.6 Real-time‌‌ ocean simulation

7.2 Fully-automatic code optimization

7.2.1 Algebraic‌ tiling

7.2.2 Connecting Kokkos‌ with the Polyhedral Model

7.2.3 Automatic Multi-Versioning‌ of Computation Kernels

7.2.4 Dynamic Task Scheduling‌ with Multiple Priorities on‌‌ Heterogeneous Computing Systems

7.2.5 Scheduling multiple task-based applications on distributed‌ heterogeneous computing nodes

7.2.6 Automatic task-based parallelization‌

7.2.7 Ionic Models Code Generation for Heterogeneous‌ Architectures

7.2.8 Machine Learning Guided Equality‌ Saturation

7.2.9 Combining Optimization and Numerical Analysis of‌ Functional Array Programs

7.3 Fundamental‌ algorithms & mathematical tools

7.3.2 Z-Polyhedra‌ and LBLs in PolyLib

7.3.3 Polyhedral Scheduling‌

7.3.4 Integer Polynomials and Polynomial‌ Loops

7.3.6 Improvements‌ of the C programming language

7.3.7 Towards‌ Pen-and-Paper-Style Equational Reasoning in‌ Interactive Theorem Provers by‌‌ Equality Saturation

7.3.8 Formal Proof of‌ Space Bounds for Concurrent,‌ Garbage-Collected Programs

7.3.9 Typechecking of Overloading‌

7.3.10 Binding Boolean Expressions and Extended‌ Pattern Matching

8 Partnerships‌ and cooperations

8.1 International initiatives

8.1.1 Participation in‌ other International Programs

CrOptAI (Sophie-Germain Program)

8.2 International research visitors‌‌

8.2.1 Visits of international scientists

8.2.2‌‌ Visits to international teams

8.3 European initiatives‌

MICROCARD-2‌‌ Centre of Excellence (EuroHPC and ANR)

8.4‌ National initiatives

8.4.1 ANR OptiTrust

8.4.2‌ ANR AUTOSPEC

8.4.3 Exa-SofT‌ project, PEPR NumPEx

8.4.4‌‌ PEPR CAMELIA

9 Dissemination

9.1 Promoting scientific activities‌

9.1.1 Scientific events: organisation

General chair, scientific chair‌

9.1.2 Scientific events: selection

Member‌ of the conference program committees

Reviewer -‌ reviewing activities

9.1.4 Invited‌ talks

9.1.5‌ Scientific expertise

9.2 Teaching -‌ Supervision - Juries -‌‌ Educational and pedagogical outreach

9.3.1 Specific official responsibilities in‌ science outreach structures

10 Scientific production‌‌

10.2‌ Publications of the year

International peer-reviewed‌ conferences

Conferences without proceedings‌

Scientific books‌