2025Activity reportProject-TeamCAMUS
RNSR: 200920957V- Research center Inria Branch at the University of Strasbourg
- In partnership with:Université de Strasbourg
- Team name: Compilation for multi-processor and multi-core architectures
- In collaboration with:Laboratoire des sciences de l'ingénieur, de l'informatique et de l'imagerie
Creation of the Project-Team: 2023 October 01
Each year, Inria research teams publish an Activity Report presenting their work and results over the reporting period. These reports follow a common structure, with some optional sections depending on the specific team. They typically begin by outlining the overall objectives and research programme, including the main research themes, goals, and methodological approaches. They also describe the application domains targeted by the team, highlighting the scientific or societal contexts in which their work is situated.
The reports then present the highlights of the year, covering major scientific achievements, software developments, or teaching contributions. When relevant, they include sections on software, platforms, and open data, detailing the tools developed and how they are shared. A substantial part is dedicated to new results, where scientific contributions are described in detail, often with subsections specifying participants and associated keywords.
Finally, the Activity Report addresses funding, contracts, partnerships, and collaborations at various levels, from industrial agreements to international cooperations. It also covers dissemination and teaching activities, such as participation in scientific events, outreach, and supervision. The document concludes with a presentation of scientific production, including major publications and those produced during the year.
Keywords
Computer Science and Digital Science
- A1.1.1. Multicore, Manycore
- A1.1.2. Hardware accelerators (GPGPU, FPGA, etc.)
- A1.1.4. High performance computing
- A2.1.1. Semantics of programming languages
- A2.1.6. Concurrent programming
- A2.1.7. Distributed programming
- A2.1.10. Domain-specific languages
- A2.2.1. Static analysis
- A2.2.4. Parallel architectures
- A2.2.5. Run-time systems
- A2.2.6. GPGPU, FPGA...
- A2.2.7. Adaptive compilation
- A2.2.8. Code generation
- A4.5. Formal method for verification, reliability, certification
Other Research Topics and Application Domains
- B4.5.1. Green computing
- B6.1.1. Software engineering
- B6.6. Embedded systems
1 Team members, visitors, external collaborators
Research Scientists
- Bérenger Bramas [INRIA, Researcher]
- Arthur Charguéraud [INRIA, Senior Researcher, HDR]
- Jens Gustedt [INRIA, Senior Researcher, HDR]
- Thomas Koehler [CNRS, Researcher]
Faculty Members
- Philippe Clauss [Team leader, UNIV STRASBOURG, Professor, HDR]
- Cedric Bastoul [UNIV STRASBOURG, Professor, from Apr 2025]
- Stephane Genaud [UNIV STRASBOURG, Professor, HDR]
- Alain Ketterlin [UNIV STRASBOURG, Associate Professor]
- Vincent Loechner [UNIV STRASBOURG, Associate Professor]
- Eric Violard [UNIV STRASBOURG, Associate Professor, HDR]
Post-Doctoral Fellow
- Clément Flint [INRIA, Post-Doctoral Fellow, until Jul 2025]
PhD Students
- Ugo Battiston [INRIA]
- Guillaume Bertholon [UNIV STRASBOURG, until Aug 2025]
- Raphael Colin [INRIA]
- Tom Hammer [UNIV STRASBOURG]
- Atoli Huppe [INRIA]
- Yanni Lefki [INRIA, from Oct 2025]
- Valeran Maytie [ UNIV STRASBOURG, from Oct 2025]
- Clément Rossetti [UNIV STRASBOURG, until Oct 2025]
Technical Staff
- Erwan Auer [INRIA, Engineer]
- Antoine Pierquin [UNIV STRASBOURG, Engineer]
- Adilla Susungi [UNIV STRASBOURG, Engineer, from Feb 2025]
Interns and Apprentices
- Julien De Curieres De Castelnau [INRIA, Intern, from Sep 2025]
- Julien Gaupp [INRIA, Intern, until Aug 2025]
- Ilyas Kermad [INRIA, Intern, from Jun 2025 until Jul 2025]
- Yanni Lefki [INRIA, Intern, from Mar 2025 until Aug 2025]
- Valeran Maytie [INRIA, Intern, from Mar 2025 until Aug 2025]
- Elian Morel [INRIA, Intern, from May 2025]
- Marceau Noury [INRIA, Intern, until Jan 2025]
Administrative Assistants
- Marine Dufourmantelle [INRIA]
- Sylvie Hilbert [CNRS]
2 Overall objectives
The CAMUS team is focusing on developing, adapting and extending automatic and semi-automatic parallelization and optimization techniques, as well as proof and certification methods, for accelerating applications with the efficient use of current and future multi-processor and multicore hardware platforms.
The team's research activities are organized into three main axes which are: (1) semi-automatic and assisted code optimization, (2) fully-automatic code optimization, and (3) fundamental algorithms and mathematical tools. Axes (1) and (2) include two sub-axes each: (1.1) interactive program transformation, (1.2) new language constructs, (2.1) runtime systems and dynamic analysis & optimization, and (2.2) static analysis & optimization. Every axis may include some activities related to interdisciplinary collaborations focusing on high performance computing.
3 Research program
While trusted and fully automatic code optimizations are generally the most convenient solutions for developers, the growing complexity of software and hardware obviously impacts their scope and effectiveness. Although fully automatic techniques can be successfully applied in restricted contexts, it is often beneficial to let expert developers make some decisions on their own. Moreover, some expert knowledge, contextual requirements, and hardware novelties cannot be immediately integrated into automatic tools.
Thus, besides automatic optimizers that play undoubtedly an important role, semi-automatic optimizers providing helpful assistance to expert developers are also essential for reaching high performance. Note that such semi-automatic tools must ideally invoke fully automatic sub-parts, including dependence analyzers, code generators, correctness checkers or performance evaluators, in order to save the user from the burden of these tasks and expand the scope of the tools. Fully automatic tools may either be used as standalone solutions, when targeting the corresponding restricted codes, or used as satellite tools for semi-automatic environments. Fully automatic mechanisms are the elementary pieces of any more ambitious semi-automatic optimizing tool.
Schematic image illustrating how the CAMUS team focuses on both fully automatic methods (including runtime systems, dynamic tools, and static analysis/optimization) and semi-automatic methods (such as interactive transformations and the introduction of new language constructs).
CAMUS' main research axes are depicted in Figure 1. Semi-automatic methods for code optimization will be implemented either as interactive transformation tools, or as language extensions allowing users to control the way programs are transformed. Both approaches will be supported by fully automatic processes devoted to baseline code analysis and transformation schemes. Such schemes may be either static, i.e. applied at compile-time, or dynamic, i.e. applied while the target code runs. Note that these characteristics are not mutually exclusive: one optimization process may include simultaneously a static and a dynamic part. Note also that the invoked fully automatic processes may be very ambitious frameworks on their own, as for instance implementing advanced speculative optimization strategies.
Strong advances in code analysis and transformation are often due to fundamental algorithms and mathematical tools, that enable the extraction of important properties of programs, through a constructive conceptual modeling. We believe that the investment in core mathematics and computer science research must be permanent in the following directions:
- Mathematics are obviously a great pool of modeling and computing methods that may have a high impact in the field of program analysis and transformation. Additionally, mathematical results must be adapted and transformed into algorithms which are usable for our purpose. This task may require some mathematical extensions and the creation of fast and reliable algorithms and implementations.
- Some new contexts of use require the conception of new algorithms dedicated to well-known fundamental and essential tasks. For instance, many standard code analysis and transformation algorithms, originally developed to be exclusively used at compile-time, need to be revised to be used at runtime. Indeed, their respective execution times may not be acceptable when analyzing and optimizing code on-the-fly. The time-overhead must be dramatically lowered, while the ambitions may be adjusted to the new context. Typically, “optimal” solutions resulting from time-consuming computations may not be the final goal of runtime optimization strategies. Sub-optimal solutions may suffice, since the performance of a dynamically optimized code includes the time overhead of the runtime optimization process.
- It is always useful to identify a restricted class of programs to which very efficient optimizations may be applied. Such a restricted class usually takes advantage of an accurate model. Conversely, it may also be fruitful to target the removal of some restrictions regarding the class of programs that are candidates for efficient optimizations.
- Other scientific disciplines may also provide fundamental strategies to track code optimization issues. However, they may also require some prior adaptation. For instance, machine learning techniques are more and more considered in the area of code optimization.
Collaborations with researchers whose applications require high performance will be developed. Besides offering our expertise, we will especially use their applications as an inspiration for new developments of optimization techniques. Those colleagues from other teams will also play the role of beta testers for our semi-automatic code optimizers. Most research axes of CAMUS will include such collaborations. The local scientific environment is particularly favorable to the setting of interactions. For example, we participate in the inter-disciplinary institute IRMIA++ of the University of Strasbourg, that facilitates collaborations with mathematicians developing high performance numerical simulations.
3.1 Semi-automatic and assisted code optimization
Programming languages, as they are used in modern compute-intensive software, are relatively poor in their possibilities to describe all known properties of a particular code. On the one hand, a language construct may over-specify the semantics of the program, for example, imposing a specific execution order for the iterations of a loop whereas any order would have been correct. On the other hand, a language construct may under-specify the semantics of the program, for example, lacking the ability to describe the fact that two pointers must be distinct, or that a given integer value is always less than a small constant.
Modern tools that rewrite code for optimization, be it internally as optimizing compiler passes or externally as source-to-source transformations, miss a lot of opportunities for the programmer to annotate and integrate their knowledge of the code. As a consequence fully-automatic tools, are not easily brought to their full capacity and one-shot platform-specific programmer intervention is required.
To advance this field, we will develop re-usable and traceable features that provide the ability for programmers to specify and control code transformations and to annotate functional interfaces and code blocks with all the meta-knowledge they have.
3.2 Fully-automatic code optimization
We will focus on two main code optimization and parallelization approaches: the polyhedral model, based on a geometrical representation and transformation of loops; and task-based model, based on a runtime resolution of the dependencies between the tasks. Note that these two approaches can potentially be mixed.
The polyhedral model is a great source of new developments regarding fundamental mathematical tools dedicated to code analysis and transformation. This model was originally exclusively based on linear algebra. We have proposed in the past some extensions to polynomials, and we are currently investigating extensions to algebraic expressions. In the meantime, we also focus on runtime approaches that allow polyhedral-related techniques to be applied to codes that are not usually well-suited candidates. The motivation of such extensions is obviously to propose new compilation techniques with enlarged scope and better efficiency, that are either static, i.e, applied at compile-time, or dynamic, i.e., applied at runtime.
We will also keep studying the task-based method which is complementary to the polyhedral model, and beneficial in scenarios that are not adapted to the polyhedral model. For example, this method can work when the description of the parallelism is entirely performed at runtime, and it is able to parallelize sections with arbitrary structures (i.e., not necessarily loop nests).
In our project, we attempt to bridge the gap between the task-based method and the compiler by designing a novel automatic parallelization mechanism with static source-to-source transformations. We also work on improving the scheduling strategies or the description of the parallelism by designing speculative execution models that operate at runtime.
3.3 Fundamental algorithms & mathematical tools
Regarding our fundamental and theoretical studies, we plan to focus on three main topics: (1) Trahrhe expressions, (2) mechanized metatheory and interactive program verification, and (3) programmable polyhedral scheduling.
4 Application domains
High performance computing plays a crucial role in the resolution of important problems of science and industry. Additionally, software development companies, and software developers in general, are strongly constrained by the time-to-market issue, while facing growing complexities related to hardware and correctness of the developed programs. Computers become more and more powerful by integrating numerous and specialized processor cores, and programs taking advantage of such hardware are more and more exposed to correctness issues.
Our goal is to provide automatic and semi-automatic tools that will significantly lower the burden on developers. By ensuring a secured production of correct and well-performing software, developers can mostly concentrate on the implemented functionalities, and produce quality software in reasonable time.
Our scientific contributions are most of the time supported by a related developed software, or an extension of an existing software. Its role is to highlight the automation of the proposed analysis and optimization techniques, to highlight their effectiveness by exhibiting performance improvements on baseline benchmark programs, and to facilitate their application on any program that would be targeted by some potential users. Thus, our software tools must be made as accessible as possible for users of science and industry, for experimenting the implemented optimization procedures with their specific programs. As such, we usually propose a free non-commercial use, through an open-source software licence. While the software is made available in a shape that allows for its use in full autonomy, we expect interested users to contact us for some deeper exchanges related to their specific goals. Such exchanges may be the start of some fruitful collaborations. Publishing our proposals in top rated conferences and journals may obviously also result in a effective impact for their adoption and the use of the related software.
Our contributions in analysis and optimization techniques of programs may find interested users in many international companies, from semi-conductor industry actors, like ARM, SiPearl or STMicroelectronics, to big companies developing high performance or deep learning applications. At a national or local level, any company whose innovative developments require compute or data intensive applications, like Nyx, or dedicated support tools, like Atos, may be interested in our work, and potentially collaborate with us for more specific and dedicated research. Since the project-team is hosted by the University of Strasbourg, contacts with many local companies are made easier thanks to the hiring of former students, and to their involvement in teaching duties and supervision of internship students.
5 Highlights of the year
- The third edition of "Modern C" by Jens Gustedt 35 has been published by Manning and over all had about 185000 downloads on HAL.
6 Latest software developments, platforms, open data
6.1 Latest software developments
6.1.1 TRAHRHE
-
Name:
Trahrhe expressions and applications in loop optimization
-
Keywords:
Polyhedral compilation, Code optimisation, Source-to-source compiler
-
Functional Description:
This software includes a mathematic kernel for computing Trahthe expressions related to iteration domains, as well as extensions implementing source-to-source transformations of loops for applying optimizations based on Trahrhe expressions.
-
News of the Year:
A more robust way of computing the ranking polynomials has been implemented. A quite new version of the software written in C/C++ has been be published.
- URL:
- Publications:
-
Contact:
Philippe Clauss
-
Participants:
Clément Rossetti, Philippe Clauss, Marceau Noury
6.1.2 openCARP
-
Name:
Cardiac Electrophysiology Simulator
-
Keyword:
Cardiac Electrophysiology
-
Functional Description:
openCARP is an open cardiac electrophysiology simulator for in-silico experiments. Its source code is public and the software is freely available for academic purposes. openCARP is easy to use and offers single cell as well as multiscale simulations from ion channel to organ level. Additionally, openCARP includes a wide variety of functions for pre- and post-processing of data as well as visualization.
-
News of the Year:
Improvements of the code generation of ionic models (limpetMLIR) : generation of CUDA and AMD kernels. Building of all targets embedded in functions for the runtime interface. StarPU interface to the kernels. Benchmarks (execution time, energy consumption).
- URL:
- Publications:
-
Contact:
Vincent Loechner
-
Participants:
Vincent Loechner, Stephane Genaud, Antoine Pierquin, Adilla Susungi, 3 anonymous participants
-
Partner:
Karlsruhe Institute of Technology
6.1.3 SPECX
-
Name:
SPEculative eXecution task-based runtime system
-
Keywords:
HPC, Parallelization, Task-based algorithm
-
Functional Description:
Specx (previously SPETABARU) is a task-based runtime system for multi-core architectures that includes speculative execution models. It is a pure C++11 product without external dependency. It uses advanced meta-programming and allows for an easy customization of the scheduler. It is also capable to generate execution traces in SVG to better understand the behavior of the applications.
-
News of the Year:
In 2025, the paper that presents the multi-GPUs and MPI version of Spec has been published: Specx: a C++ task-based runtime system for heterogeneous distributed architectures Paul Cardosi, Bérenger Bramas, PeerJ CS.
- URL:
- Publication:
-
Contact:
Bérenger Bramas
6.1.4 Autovesk
-
Keywords:
HPC, Vectorization, Source-to-source compiler
-
Functional Description:
Autovesk is a tool to produce vectorized implementation from static kernels.
-
News of the Year:
In 2025, Autovesk has been updated to support more complex benchmarks.
- URL:
-
Contact:
Bérenger Bramas
-
Participant:
Bérenger Bramas
6.1.5 PolyLib
-
Name:
The Polyhedral Library
-
Keywords:
Rational polyhedra, Library, Polyhedral compilation
-
Scientific Description:
A C library used in polyhedral compilation, as a basic tool used to analyze, transform, optimize polyhedral loop nests. It has been shipped in the polyhedral tools Cloog and Pluto.
-
Functional Description:
PolyLib is a C library of polyhedral functions, that can manipulate unions of rational polyhedra of any dimension. It was the first to provide an implementation of the computation of parametric vertices of a parametric polyhedron, and the computation of an Ehrhart polynomial (expressing the number of integer points contained in a parametric polytope) based on an interpolation method.
-
Release Contributions:
Functions to manipulate LBLs (linearly bounded lattices) have been added in 2025. MIT Licence.
-
News of the Year:
Maintenance, upgrade of the user interface, upgrade of the build process.
- URL:
- Publication:
-
Contact:
Vincent Loechner
-
Participant:
Vincent Loechner
6.1.6 APOLLO
-
Name:
Automatic speculative POLyhedral Loop Optimizer
-
Keyword:
Automatic parallelization
-
Scientific Description:
APOLLO - Automatic speculative POLyhedral Loop Optimizer is a compiler framework dedicated to automatic, dynamic and speculative parallelization and optimization of programs' loop nests. This framework allows a user to mark in a C/C++ source code some nested loops of any kind (for, while or do-while loops) in order to be handled by a speculative parallelization process, to take advantage of the underlying multi-core processor architecture. The framework is composed of two main parts: extensions to the CLANG-LLVM compiler and a runtime system.
-
Functional Description:
APOLLO is dedicated to automatic, dynamic and speculative parallelization of loop nests that cannot be handled efficiently at compile-time. It is composed of a static part consisting of specific passes in the LLVM compiler suite, plus a modified Clang frontend, and a dynamic part consisting of a runtime system. It can apply on-the-fly any kind of polyhedral transformations, including tiling, and can handle nonlinear loops, as while-loops referencing memory through pointers and indirections. Some recent extensions enabling dynamic multi-versioning have been implemented in 2020.
-
News of the Year:
Apollo has been upgraded to LLVM 17 and to pluto 0.12.
- URL:
- Publications:
-
Contact:
Philippe Clauss
-
Participants:
Aravind Sukumaran Rajam, Erwan Auer, Raphael Colin, Juan Manuel Martinez Caamano, Manuel Selva, Philippe Clauss
6.1.7 OptiTrust
-
Name:
OptiTrust
-
Keywords:
Code optimisation, Verification
-
Functional Description:
The OptiTrust framework provides programmers with means of optimizing their programs via user-guided source-to-source transformations. It leverages Separation Logic for checking that both input and output programs satisfy the desired specification. The transformations maintain separation logic derivations, following the concept of proof-carrying code.
-
News of the Year:
OptiTrust has been extended to support validation of functional correctness on the output code, following the proof-carrying code approach. A new case study on LLM inference has been developed.
- URL:
-
Contact:
Arthur Charguéraud
-
Participants:
Arthur Charguéraud, Thomas Koehler, Guillaume Bertholon
6.1.8 APAC
-
Keywords:
Source-to-source compiler, Automatic parallelization, Parallel programming
-
Scientific Description:
APAC is a compiler for automatic parallelization that transforms C++ source code to make it parallel by inserting tasks. It uses the tasks+dependencies paradigm and relies on OpenMP as runtime system. Internally, it is based on Optitrust (and Clang-LLVM).
-
Functional Description:
Automatic task-based parallelization compiler
-
News of the Year:
Additional case studies have been developed.
- URL:
-
Contact:
Bérenger Bramas
-
Participants:
Marek Felsoci, Bérenger Bramas, Stephane Genaud
6.1.9 Rise & Shine
-
Keywords:
Programming language, Compilation
-
Functional Description:
Programming language and compiler for array computing. Programs are expressed at a high level in the RISE language. Programs are transformed using a set of rewrite rules that encode implementation and optimization choices. The Shine compiler generates high-performance parallel C or OpenCL code while preserving the optimization choices made during rewriting.
-
News of the Year:
A prototype Rise to C compiler executable that preserves floating-point semantics was added by Thomas Koehler, as part of his collaboration with Eva Darulova (Uppsala University).
- URL:
-
Contact:
Thomas Koehler
-
Participant:
Thomas Koehler
-
Partner:
Technische Universität Berlin
6.1.10 egg-sketches
-
Keyword:
Program rewriting techniques
-
Functional Description:
egg-sketches is a library adding support for program sketches on top of the egg (e-graphs good) library, an e-graph library optimized for equality saturation. Sketches are program patterns that are satisfied by a family of programs. They can also be seen as incomplete or partial programs as they can leave details unspecified. With egg-sketches, it is possible to perform Guided Equality Saturation: a semi-automatic technique that allows programmers to guide rewriting via program sketches.
-
News of the Year:
Upgraded to latest egg dependency, added an extra sketch construct, fixed two bugs.
- URL:
-
Contact:
Thomas Koehler
-
Participant:
Thomas Koehler
-
Partner:
TU Darmstadt
6.1.11 slotted-egraphs
-
Keyword:
Term Rewriting Systems
-
Functional Description:
Implementation of the slotted e-graph data structure, an extension of e-graphs representing terms that differ only in the names of their variables uniquely. With slotted-egraphs, users of languages with variables can perform equality saturation by: (1) defining the term language, representing variables and binders via slots, (2) defining rewrite rules, without having to worry about naming collisions, and leveraging built-in mechanisms for freshness predicates and substitutions, (3) performing equality saturation by initializing a slotted e-graph, growing it by applying rewrites, and extracting from it.
-
News of the Year:
Developement started in February 2024, led by Rudi Schneider from TU Berlin. Instigated and supervised by Thomas Koehler and Michel Steuwer. A paper was accepted at PLDI 2025.
- URL:
- Publication:
-
Contact:
Thomas Koehler
-
Participant:
Thomas Koehler
-
Partners:
Technische Universität Berlin, TU Darmstadt
6.1.12 Pesto
-
Name:
Polyhedral flExible loop-neST Optimizer
-
Keyword:
Optimizing compiler
-
Functional Description:
This tool allows to easily apply polyhedral transformation on C code, and particularly algebraic tiling. It is divided in two parts : the command-line interface, and the library.
-
News of the Year:
The first usable version of Pesto is now available.
- URL:
-
Contact:
Clément Rossetti
6.1.13 StrasGPT
-
Keywords:
Polyhedral compilation, LLM, Automatic parallelization, Vectorization
-
Functional Description:
This program is a direct C implementation of the Qwen3 / LLaMa 3.x / Mistral LLM transformer architecture amongst others, reusing the tokenizer and the sampler of Andrej Karpathy's llama2.c project and its fork by James Delancey llama3.c. Given an input prompt, StrasGPT can generate a text that continues it. It was initially designed as a parallel programming project for master students in 2025 (students had to parallelize it with OpenMP + MPI). It is now getting continued for (polyhedral) compiler research.
-
News of the Year:
Creation
-
Contact:
Cedric Bastoul
-
Participant:
Cedric Bastoul
7 New results
7.1 Semi-automatic and assisted code optimization
7.1.1 OptiTrust: Producing Trustworthy High-Performance Code via Source-to-Source Transformations
Participants: Arthur Charguéraud, Guillaume Bertholon, Thomas Koehler, Elian Morel, Julien de Castelnau.
In 2025, we pursued the development of the OptiTrust prototype framework for producing high-performance code via source-to-source transformations, with formal guarantees of correctness.
We have extended the framework to support full functional correctness assertions. OptiTrust thereby becomes a modern implementation of Necula's concept of proof-carrying-code. We generalized two prior case studies, namely OpenCV's box-blur and TVM's matrix multiply, to full functional correctness. This work is described as part of the PhD thesis of Guillaume Bertolon 57. A 60-page journal article describing OptiTrust has been recently submitted it for publication at a top-tier journal. In addition, Guillaume has presented a description of OptiTrust' bidirectional translation between C and its internal lambda-calculus at JFLA'25 national workshop 22.
The Master internship of Elian Morel contributed a case study on an LLM inference code. Preliminary results show that OptiTrust supports code transformation on such a complex, realistic program. Elian also contributed an important technical addition to the typechecker: an elaboration phase to automatically infer the numerous annotations (known as “ghost operations for focusing”), which are needed to typecheck array-manipulating operations in separation logic.
The ongoing Master internship of Julien de Castelnau aims to extend OptiTrust to support refinement from CPU to GPU code, optimization at the GPU level, and extraction from GPU code to Cuda syntax.
7.1.2 Sketch-Guided Polyhedral Compilation
Participants: Valeran Maytié, Thomas Koehler, Cedric Bastoul.
As part of Valéran Maytié's internship and PhD, we developed a new semi-automatic, sketch-guided compilation approach. It enables users to write sketches that guide the compiler towards key optimizations by describing the desired structure of the optimised code, without worrying about how to get there. We introduce a sketch language that enables expressing the result of imperative loop transformations and a new polyhedral algorithm capable of generating code constrained by both a sketch and a computation specification. This work was presented at the IMPACT'26 workshop 27, and we are working towards a full conference paper. This work has been done in collaboration with Christophe Alias (Inria CASH).
7.1.3 Specx: A C++ Task-Based Runtime System for Heterogeneous Distributed Architectures
Participants: Bérenger Bramas.
Bérenger Bramas and Paul Cardosi completed and submitted the paper presenting Specx several years after the end of Paul Cardosi's contract, this paper has been published in 2025 16). Specx is now capable of executing task graphs on heterogeneous distributed architectures. It provides an elegant way to define task graphs and describe objects that the runtime system can move or send.
7.1.4 Using the Discrete Wavelet Transform for Scientific Data Compression
Participants: Atoli Huppé, Clément Flint, Bérenger Bramas, Stéphane Genaud, Philippe Helluy.
As a follow up of Clément Flint's thesis work 59, in which we worked on compressing simulation data for a Lattice Boltzmann application, the Atoli Huppe 's PhD work aims to propose a general compressor for scientific data. It addresses the use case in which the data to compress would be generated by a scientific application on the GPU, and then the compression would be directly performed on the data from the GPU memory. The original data compressed into blocks can then be decompressed when they are needed for further computations, or saved from the GPU to the disk through the CPU.
This GPU implementation based on the Discrete Wawelet Transform is, to the best of our knowledge, the first full GPU, single-kernel, implementation using this compression model. Our performance evaluation, conducted at the end of 2025, shows that our compressor achieves a higher compression ratio than the state-of-the-art compressor cuSZp2 60 with comparable compression and decompression throughputs. A preliminary version of this work was presented at a COMPAS 32, and our latest results will be submitted to an international conference.
7.1.5 Exploiting Ray Tracing Technology Through OptiX to Compute Particle Interactions with Cutoff in a 3D Environment on GPUs
Participants: Bérenger Bramas.
Bérenger Bramas and David Algis worked on utilizing OptiX for neighbor finding in n-body simulations. Several methods were implemented, including two novel approaches based on new geometric patterns. A preprint demonstrates that these methods can achieve significant speedups when the grid is sparse (i.e., when particles are not uniformly distributed) 15.
7.1.6 Real-time ocean simulation
Participants: Bérenger Bramas.
Bérenger Bramas collaborated with Emmanuelle Darles, Lilian Aveneau and David Algis on real-time ocean simulation. This work led to the development of the Arc Blanc framework 13, 21, a fully described GPU/CPU real-time pipeline for simulating the free ocean surface and solid–fluid interactions while preserving physical realism at large scale. The framework includes improvements such as real-time computation of fluid velocities at arbitrary depth and enhanced solid-to-fluid coupling. In addition, we supported the integration of these simulations into Unity by developing an open-source interoperability tool between Unity compute shaders and CUDA, enabling access to advanced GPU programming features not available in Unity's native environment 14.
7.2 Fully-automatic code optimization
7.2.1 Algebraic tiling
Participants: Clément Rossetti, Philippe Clauss.
We propose a new loop tiling approach based on the volumes of the tiles, i.e., the number of iterations delimited by the tiles, instead of the sizes of standard (hyper-)rectangular tiles, i.e., the sizes of the edges of the tiles. In the proposed approach, tiles are dynamically generated and have almost equal volumes, even if their shape and edge sizes may differ. The iteration domain is well covered by a minimum number of tiles that are all almost full. Since the bounds of the generated tiles are not linear and defined by algebraic mathematical expressions, we call this loop tiling technique algebraic tiling. It uses the mathematical engine TRAHRHE also developed in the team.
Algebraic tiles are built by successive hierarchical slicing of the initial iteration domain, from the outermost to the innermost depth dimensions of the target loop nest, in a way ensuring that slices have all quasi-equal volumes. The bounds of the loop nests that are handled must be constants, or linear functions of the surrounding loop iterators and of unknown parameters – which are typically related to the data input size. Such loops are also called polyhedral loops since they may be handled using the polyhedral model. Quasi-perfect load balancing is achieved when each parallel loop is sliced using as many slices of quasi-equal volumes as parallel threads, and when most of the iterations have close execution times. Thus, such dynamic slicing strategy makes the resulting parallel loop scalable regarding the number of threads. Good data locality is reached by slicing profitably the non-parallelized loops, and by slicing the parallel loops in a number of parts equal to a multiple of the number of parallel threads. The number of generated slices for each dimension may stay as a parameter at compile-time, making algebraic tiling a parameterized loop tiling technique, and allowing the produced code to adapt to the number of parallel threads and data layout. Our experiments show that algebraic tiling outperforms significantly (hyper-)rectangular tiling when parallelizing loops with OpenMP using static scheduling, and mostly provides similar or lower execution times when compared to traditionally tiled loops parallelized using dynamic scheduling of OpenMP. Thus, algebraic tiling makes dynamic scheduling fairly purposeless for the handled loop nests.
Algebraic tiling has been implemented in a source-to-source automatic code optimizer called Pesto (6.1.12) by Clément Rossetti, who defended his thesis on the 18th of December 2025.
7.2.2 Connecting Kokkos with the Polyhedral Model
Participants: Ugo Battiston, Philippe Clauss.
The increasing complexity of HPC hardware forces scientists to shift towards performance portable parallel programming models. Modern C++ libraries, such as Kokkos, have become essential: they allow developers to write a single code that runs efficiently on heterogeneous hardware (CPUs or GPUs).
However, this abstraction comes at a cost. The heavy use of C++ templates and lambda functions inside Kokkos hides the control flow and memory access patterns from the compiler. Consequently, advanced static analyzers, specifically those based on the polyhedral model like LLVM's Polly extension, fail to detect optimization opportunities such as loop tiling or fusion, leaving significant performance on the table.
We propose a novel approach to bridge the gap between high level C++ abstractions and low-level polyhedral optimizations. We present a co-design strategy involving modifications to both the Kokkos library and Polly. First, we modify and instrument Kokkos to produce a cleaner intermediate representation (IR) structure and expose loop and data structures at the LLVM IR level. Second, we extend Polly to recognize these constructs and apply aggressive loop optimizing and parallelizing transformations on Kokkos kernels.
We show that this pipeline enables the automatic application of polyhedral transformations on standard Kokkos codes. Our evaluation on loop kernels from the Polybench benchmark suite, rewritten using Kokkos, shows significant speedups reaching up to 12.3 times over baseline Kokkos usage.
This work has been presented by Ugo Battiston at the Conférence francophone d'informatique en Parallélisme, Architecture et Système (COMPAS 2025). A paper has been recently submitted to an international conference.
7.2.3 Automatic Multi-Versioning of Computation Kernels
Participants: Raphaël Colin, Erwan Auer, Philippe Clauss.
Compute-intensive scientific applications usually combine multiple compute kernels. They can go through various different execution phases, where the kernels may operate with different parameters and in different execution contexts. As such, standard compilers and generalist optimization tools fail to optimize compute kernels to the fullest regarding the execution contexts that the application goes through.
Multi-versioning, iterative compilation and auto-tuning are optimization techniques that aim to specialize the optimization parameters of the target code. By using feedback obtained from performance measurements at runtime, they are able to choose the best implementation variant, the best compiler optimizations, or the best set of values for some parameters. However, these techniques mostly generate code by relying on static information, or by relying on the user to provide different implementations of the same kernel, or to reference relevant parameters to tune.
We are currently designing a multi-versioning system which generates different efficient versions of compute kernels at runtime, and selects the best performing one for each encountered execution context. The different versions that are generated result from applying automatic loop optimizing and parallelizing transformations that are based on the polyhedral model to the LLVM intermediate representation. The latter is then compiled on-the-fly using the LLVM Just-In-Time compiler. This multi-versioning system is currently being implemented in the Apollo dynamic parallelizer (6.1.6).
The system requires very few annotations from the user, and uses information about the execution context that is gathered at runtime, in order to guide the automatic transformations of the compute kernels.
This work has been presented by Raphaël Colin at the Conférence francophone d'informatique en Parallélisme, Architecture et Système (COMPAS 2025) 31.
7.2.4 Dynamic Task Scheduling with Multiple Priorities on Heterogeneous Computing Systems
Participants: Hayfa Tayeb, Bérenger Bramas.
In the context of Albert d'Aviau de Piolant and Hayfa Tayeb's PhD work, Bérenger Bramas collaborated with Mathieu Faverge, Abdou Guermouche, and Amina Guermouche to optimize energy consumption in StarPU-based applications 29, addressing a central challenge in high-performance computing (HPC), namely improving energy efficiency without sacrificing too much performance. A key lever explored in this work is GPU power capping, a technique that enforces a fixed upper power limit on devices such as CPUs and GPUs, with the objective of reducing energy usage while preserving acceptable throughput. The activity focused on evaluating the impact of static GPU power caps in heterogeneous HPC environments, where multiple accelerators with potentially different performance characteristics are used concurrently, and where the performance/energy trade-off becomes a scheduling and resource allocation problem rather than a simple per-device tuning problem. The study first conducted an extensive characterization on a compute-intensive reference kernel—GEMM (matrix multiplication)—across multiple Nvidia GPU architectures, in order to quantify how reducing the available power budget affects both execution time and energy consumption. The results highlight that compute-bound kernels can become significantly more energy-efficient under moderate power constraints: in particular, setting the GPU power limit in the range of 55-70% of the Thermal Design Power (TDP) can yield up to 30% energy efficiency improvement with only limited performance degradation. Building on these observations, the work then investigated how applying distinct power caps to different GPUs within the same heterogeneous node can improve the global energy efficiency of real HPC workloads, focusing on dense linear algebra task-based computations including matrix multiplication and Cholesky factorization. Importantly, the study also demonstrated that the runtime scheduler (StarPU) can automatically adapt scheduling decisions to exploit the resulting heterogeneity induced by different GPU power limits, thereby aligning task placement with each device's effective compute capability under capping. Overall, on a platform equipped with four GPUs, applying power capping across all devices led to substantial end-to-end efficiency gains, improving energy efficiency for matrix multiplication by up to 24.3% in double precision and 33.78% in single precision, confirming that power-aware runtime-driven execution is a practical and effective approach to reduce the energy footprint of heterogeneous HPC applications.
7.2.5 Scheduling multiple task-based applications on distributed heterogeneous computing nodes
Participants: Jean-Etienne Ndamlabin, Bérenger Bramas.
The size, complexity and cost of supercomputers continue to grow making any waste more critical than in the past. Consequently, we need methods to reduce the waste coming from the users' choices, badly optimized applications or heterogeneous workloads during executions. In this context, we worked on the scheduling of several task-based applications on given hardware resources. Specifically, we created load balancing heuristics to distribute the task-graph over the processing units. We validated our approach by implementing a super-scheduler in StarPU 18.
7.2.6 Automatic task-based parallelization
Participants: Bérenger Bramas, Marek Felosci, Julien Gaupp, Stéphane Genaud.
We extended our approach to automatically parallelize any application using a task-based method. We reimplemented APAC using OptiTrust (developed by our team), which enables source code transformations to be expressed compactly. We addressed several challenges related to explicit synchronizations, dependency specifications for arrays, and code duplication required to maintain both sequential and parallel versions. In addition, we created a purely LLVM-based version, which supports a broader subset of the C++ language, but at the cost of more complex transformation implementations 25, 30.
7.2.7 Ionic Models Code Generation for Heterogeneous Architectures
Participants: Vincent Loechner, Stephane Genaud, Cedric Bastoul, Adilla Susungi, Antoine Pierquin.
We participate in the research and development of a cardiac electrophysiology simulator called openCARP (6.1.2) in the context of the MICROCARD-2 European project (8.3.1). Our team provides their optimizing compiler expertise to build a bridge from a high-level DSL language convenient for ionic model experts (EasyML) to a code that will run efficiently on exascale supercomputers, using the MLIR compiler framework. We have extended the capabilities of openCARP for generating multiple parallel versions of the ionic currents computation, hence enabling the exploitation of the various parallel computing units that are available in the target architecture nodes (multicore CPUs with vector units, GPUs, etc.). We have collaborated with members of the STORM team (Inria Bordeaux), also implied in the MICROCARD-2 project, to extend the capability of executing simulations simultaneously on multiple CPUs and GPUs.
In 2025, we improved the openCARP software to:
- robustify the compilation of MLIR generated code;
- optimize the code to avoid memory transfers between GPUs and main node memory;
- provide new functions to access local variables of the ionic model and permit the implementation of SDC (spectral deferred correction) methods, in collaboration with our European partner from ZIB (Berlin);
- investigate how to replace the fast linear interpolation to approximate complex formulas by polynomial interpolation, using the Sollya library.
7.2.8 Machine Learning Guided Equality Saturation
Participants: Thomas Koehler.
In joint work led by Nicole Heinimann (PhD at TU Berlin supervised by Michel Steuwer), Thomas Koehler has been exploring the idea of guiding the equality saturation optimization technique through machine learning. Equality saturation has successfully been applied in many domains. Yet, scaling issues hold back its success in even more applications. Thomas' prior work proposed Guided Equality Saturation as a solution that breaks challenging rewrite problems into a sequence of equality saturations. However, this prior work relied on human experts to provide insights in the form of guides that describe when to stop one equality saturation and start the next. The ongoing effort, presented at the EGRAPHS'25 workshop 54, attempts to reduce the reliance on human experts. The goal of Machine Learning Guided Equality Saturation is to automatically generate guides using a machine learning model. The training setup and machine learning model went through multiple design iterations already, and experiments are ongoing to assess how effective this approach is on challenging workloads.
7.2.9 Combining Optimization and Numerical Analysis of Functional Array Programs
Participants: Thomas Koehler.
In joint work with Eva Darulova (Uppsala University, Sweden), Thomas Koehler is working towards combining program optimization with numerical analysis. Eva and Thomas co-supervised two Master students at Uppsala University who finished their thesis in 2025. Simon Björklund wrote his thesis on Numerical Analysis of Highly Performant Functional Array Programs58. Filip von Knorring wrote his thesis on Exploring Accuracy and Performance Trade-offs in Functional Array Programs64. Eva and Thomas are now working towards an international-level publication for this work.
7.3 Fundamental algorithms & mathematical tools
7.3.1 Trahrhe expressions
Participants: Philippe Clauss, Clément Rossetti, Marceau Noury.
In the mid-1990s, Philippe Clauss and Vincent Loechner introduced the mathematical theory of Ehrhart polynomials in computer science for the quantitative analysis of iterative programs 3, 6. These special mathematical objects give the exact number of integer points contained in a polyhedron depending linearly on parameters. In the context of polyhedral modeling of nested loops, this number can correspond to the total number of iterations, the number of parallel iterations, the number of accessed data, etc.
A particular use of these Ehrhart polynomials are ranking polynomials. Such polynomials give the position, or rank, of an iteration of a loop nest, according to the lexicographic order of execution of the iterations. These polynomials are determined by calculating the number of integer points lexicographically inferior to any point in the polyhedral domain of the iterations. Philippe Clauss has shown a first application of such polynomials to data layout transformation for optimal spatial locality in 2000.
More recently, we have been interested in inverting such ranking polynomials, in order to be able to determine, for a given rank, what are the corresponding loop indices. This unranking problem is particularly challenging from a theoretical and practical point of view. Thanks to the specific properties of ranking polynomials, we have developed a method for inverting such polynomials by solving uni-variate polynomial equations and propagating the integer floors of the roots to lower dimensions 2.
Since 2019, the mathematical engine computing Trahrhe expressions has been developed as a software (TRAHRHE) (6.1.1) usable for several loop optimization purposes, as non-rectangular loop collapsing 2 or algebraic loop tiling 66. A completely revised version written in C++ and implementing many improvements has been developed by Marceau Noury , Clément Rossetti and Philippe Clauss . It is now available from the website.
7.3.2 Z-Polyhedra and LBLs in PolyLib
Participants: Vincent Loechner.
Z-polyhedra were first introduced in PolyLib (6.1.5) in 2000, but this implementation suffered from several limitations. Since then, significant advances have been made in defining a solid mathematical foundation and a sound normal form for Z- polyhedra, LBLs (linearly bounded lattices) and their unions. We extended this theoretical work to enable the manipulation of arbitrary union of LBLs or Z-polyhedra in PolyLib, using efficient algorithms to perform set operations and transformations of unions of LBLs. When implementing the LBLs in PolyLib we took special care to ensure safe and efficient memory allocations, to write efficient and robust functions, and to validate them on a broad range of verified test examples. This work was presented at the IMPACT workshop in January 2026 34.
7.3.3 Polyhedral Scheduling
Participants: Tom Hammer, Vincent Loechner, Stephane Genaud, Alain Ketterlin, Cedric Bastoul, Bérenger Bramas.
Scheduling is the central operation in the polyhedral compilation chain, to find the best execution order of loop iterations for parallelizing and optimizing the code. Discovering the best polyhedral schedules remains a challenge due to the huge search space. Moreover, current classes of polyhedral schedulers proceed from outer to inner loops, making them unpractical for enforcing efficient vectorization in innermost loops. We have shown those limitations in our survey on polyhedral compilers 28 presented at the HiPEAC 2025 conference.
The PhD work of Tom Hammer is currently investigating if bringing the results produced by an auto-vectorizer can help choose a schedule that enables both thread-parallelism and vectorization. To that end, we have extended Autovesk 67 developed in our team, which implements a Superword Level Parallelism (SLP) algorithm, to track how vectorized instructions relate to the original statement instruction instances. We then use a modified version of the Pluto algorithm to build a parallel schedule under the extra constraints discovered by Autovesk. We finally check if the schedule taking into account the vectorization dimension is legal before generating a transformed loop nest. We have tested this approach on the Polybench/C suite and our findings so far are that it does increase the number of vector instructions generated when compiling with standard compilers (GCC, Clang). However, the benefits of vectorization can be outweighed by losses in data locality, which is favored by the standard Pluto schedule. This work was presented at the IMPACT'26 Workshop 26.
7.3.4 Integer Polynomials and Polynomial Loops
Participants: Alain Ketterlin.
For some time now we have been working on a specific representation of integer polynomials, which has proved to be well fitted for characterizing polyhedral programs properties (like the counts and ranks of their instructions) 63. The same representation has also been used to extend our previous work on loop recognition in traces 9, which is now able to produce “polynomial loops” thanks to very efficient polynomial interpolation techniques 33. Both of these research lines have introduced the notion of “polynomial loops”, i.e., loops where all bounds and values are multivariate polynomials in the surrounding loop counters, a model that is, in its full generality, too expressive for our current analysis and optimization abilities, but nicely extends the classical polyhedral model.
This year's work has focused on three more aspects of loops involving integer polynomials in their control and or computations. The first is their ability to be systematically turned into perfect loops (i.e., loops whose bodies are single constructs, either a sub-loop or a single instruction), effectively turning control into computation. Besides the theoretical interest of such loops, we expect this to have an impact on how such loops are executed, especially in the case where dedicated hardware is produced. The second aspect is the efficient execution of polynomial loops, which we have proved is only moderately more costly than their affine, non-perfect counterparts, and may even be as efficient provided enough hardware is available. The third and last aspect of the use of integer polynomials inside loops is their compilation on general purpose processors, where the compiler is in charge of detecting their presence inside the code, and of optimizing their computation. We expect this last aspect to have an impact on many numeric computations, but also on programs manipulating multi-dimensional arrays, where non-linear address computations are pervasive.
7.3.5 Slotted E-Graphs
Participants: Thomas Koehler.
In joint work led by Rudi Schneider (PhD at TU Berlin supervised by Michel Steuwer), Thomas Koehler and his collaborators have been working on efficiently representing (bound) variables in e-graphs. An e-graph is a data structure at the heart of powerful optimization and reasoning techniques such as equality saturation, that space-efficiently represents equal sub-terms uniquely. In their paper published at PLDI'25 20, they present a novel approach to representing bound variables in e-graphs by making them a first-class built-in feature of the data structure. Their slotted e-graph represents terms that differ only by (bound or free) variable names uniquely. Slotted e-graphs are evaluated on two case studies from compiler optimization and theorem proving to show that performing equality saturation for languages with bound variables is greatly simplified and that it becomes possible to solve practically relevant problems that could not be solved with e-graphs using names or de Bruijn indices.
7.3.6 Improvements of the C programming language
Participants: Jens Gustedt.
The C standards committee TC1/SC22/WG14 is now discussing changes for the next version of the C standard, coined C2y at the moment. The discussion on these new features took place in two face-to-face meetings in Graz, Austria, and Brno, Czech Republic.
In 2025 we contributed with several papers to the future revision. We contributed to the following subjects:
- improvement of syntax and semantics for arrays 55
- type-safe minimum and maximum 39
- improvement of the preprocessor 52
- improvement of some problem spots concerning undefined behavior 5342434548,
- revision of the thread and atomics features 3840495046
- continued work on function attributes 44, 51
- the new defer feature 47,
- C semantics for contracts 41
In addition to the C standard, 62, the technical specification TS 6010 for a sound and verifiable memory model that is based on provenance 61 has now been published. Jens Gustedt had been an editor and major contributor to this specification.
To promote the new C standard, we also published a C23 edtion of the book Modern C, 35, which finally appeared in print in 2025. By keeping the rights also for this edition, we were able to maintain a free online version on HAL which has again been a great success, with now (Jan. 2026) more than 185000 downloads in total.
7.3.7 Towards Pen-and-Paper-Style Equational Reasoning in Interactive Theorem Provers by Equality Saturation
Participants: Thomas Koehler.
Equations are ubiquitous in mathematical reasoning. Often, however, they only hold under certain conditions. As these conditions are usually clear from context mathematicians regularly omit them when performing equational reasoning on paper. In contrast, interactive theorem provers pedantically insist on every detail to be convinced that a theorem holds, hindering equational reasoning at the more abstract level of pen-and-paper mathematics.
In joint work led by Marcus Rossel (PhD at TU Darmstadt supervised by Andrés Goens), we address this issue by raising the level of equational reasoning to enable pen-and-paper style in interactive theorem provers. We achieve this by interpreting theorems as conditional rewrite rules, and use equality saturation to automatically derive equational proofs. Conditions that cannot be automatically proven may be surfaced as proof obligations. Concretely, we present how to interpret theorems as conditional rewrite rules for a significant class of theorems. Handling these theorems goes beyond simple syntactic rewriting, and deals with aspects like propositional conditions and type classes. We evaluate our approach by implementing it as a tactic in Lean, using the egg library for equality saturation with e-graphs. We show four use cases demonstrating the efficacy of this higher level of abstraction for equational reasoning. This work is published at POPL'26 19.
7.3.8 Formal Proof of Space Bounds for Concurrent, Garbage-Collected Programs
Participants: Arthur Charguéraud, Alexandre Moine.
Alexandre Moine, co-advised by Arthur Charguéraud and François Pottier (Inria Paris) have presented a novel, high-level program logic for establishing space bounds in Separation Logic, for programs that execute with a garbage collector. A key challenge is to design sound, modular, lightweight mechanisms for establishing the unreachability of a block. In the setting of a high-level, ML-style language, a key problem is to identify and reason about the memory locations that the garbage collector considers as roots. Our recent work has focused on generalizing our previous results to handle concurrent programs. A key challenge is to handle the fact that if an allocation lacks free space, then it is blocked until all other threads exit their critical section. Only at that point may a GC execute and free the requested space. To handle this challenge, we propose to combine two language constructs: protected sections (during which the GC cannot be triggered) and polling points (where a thread pauses if other threads request a GC execution). Our article describing the results has appeared in the premier journal TOPLAS 17.
7.3.9 Typechecking of Overloading
Participants: Arthur Charguéraud.
In joint work with Martin Bodin from Inria Grenoble and Jana Dunfield from Queen's School of Computing (Canada), Arthur Charguéraud has been working on a typechecking algorithm for resolving overloaded symbols. Overloading consists of using the same symbol to refer to several functions, or the same name to refer to several constants. Overloading is ubiquitous in mathematics. It also appears in numerous programming languages that resolve overloading statically, as opposed to languages that rely on dynamic dispatch during program execution. A key question is how to determine, for every occurrence of an overloaded symbol, which function it refers to. Static resolution of overloading is intrinsically intertwined with typechecking: overloading resolution depends on types, but the types of the overloaded symbols depend on how they are resolved.
We present the first overloading resolution algorithm accompanied with a polynomial complexity bound. The bound is expressed in terms of the size of the description of the instances, as well as of the size of the typed tree to which the program resolves. In our algorithm, resolution is guided not only by the type of function arguments, but also by the type expected by the context. We allow candidate instances to have dependencies (assumptions). As in certain previously proposed algorithms, we take a non-backtracking approach, which avoids exponential search.
Our implementation parses OCaml-style syntax where functions, constants, constructors and record fields can be overloaded. We assume explicit quantification of polymorphic type variables. If all overloaded symbols can be unambiguously resolved, our tool produces standard OCaml code, in which every overloaded symbol is replaced with the value or name that it resolves to. Preliminary results have been presented at the JFLA'25 French workshop 23. An article submission is under preparation.
7.3.10 Binding Boolean Expressions and Extended Pattern Matching
Participants: Arthur Charguéraud, Yanni Lefki.
Functional programming languages include various pattern matching features, such as guarded patterns, matching by custom predicate, active patterns, synonymous patterns, etc. Besides, several languages include mechanisms for binding names as part of a boolean expression that appears in either an if-statement, a while-loop condition, or a pattern guard. These names may be bound either with a simple let-binding or via a test performed using pattern-matching. All these features are useful in practice, yet it appears that no mainstream language supports them all at once. In this work, we present a core language that consists of a small number of constructs that suffice to encode and combine all the desired features of pattern matching and binding boolean expressions. Thereby, we hope to consolidate existing knowledge on the topics of pattern matching and generalized forms of boolean expressions, through a streamlined presentation. We expect it to be useful not only for pedagogical purposes, but also potentially for simplifying the work of compiler developers. This work has been presented at the ML family workshop (ML'25) colocated with ICFP. An article submission is under preparation.
8 Partnerships and cooperations
8.1 International initiatives
8.1.1 Participation in other International Programs
CrOptAI (Sophie-Germain Program)
Participants: Thomas Koehler, Valeran Maytie, Cedric Bastoul.
-
Title:
Cross-Stack Optimisation for AI
-
Partner Institutions:
LIB UR 7534, Université Bourgogne Europe, France; University of Edinburgh, United Kingdom.
-
Date/Duration:
from November 1, 2025 to October 31, 2026 (1 Year).
-
Principal Investigators:
Thomas Koehler, Annabelle Gillet (LIB), Eric Leclercq (LIB)
-
Funding Impact:
This funding will enable collaboration and synchronisation between the partners and their PhD researchers: research visits, conference trips, hardware acquisition. We will lay the foundations for further collaboration and funding.
-
Research Project:
Improving the efficiency of artificial intelligence computing is critical. Further, best performance is achieved through optimisation decisions that cut through the entire software stack, from high-level algorithmic choices down to hardware execution choices. Our project is to explore novel approaches to cross-stack optimisation, in order to improve artificial intelligence performance while lowering engineering costs.
8.2 International research visitors
8.2.1 Visits of international scientists
- Bastian Köpcke (postdoc at TU Berlin, Germany) visited CAMUS to collaborate with Julien De Castelnau, Thomas Koehler and Arthur Charguéraud on verified code optimization for GPUs (1 week research stay).
- Reuben Carolan (PhD at University of Edinburgh, UK) visited CAMUS to collaborate with Valéran Maytie, Thomas Koehler and Cedric Bastoul on sketch-guided polyhedral compilation (1 week research stay).
8.2.2 Visits to international teams
- Thomas Koehler visited Eva Darulova and her Datalogi team at Uppsala University, Sweden for one week in September 2025. Eva and Thomas are working towards a publication on combining program optimization with numerical analysis. This comes after co-supervising two Master students, Simon Björklund and Filip von Knorring, who finished their thesis in 2025.
8.3 European initiatives
8.3.1 Horizon Europe
MICROCARD-2 Centre of Excellence (EuroHPC and ANR)
Participants: Vincent Loechner, Stephane Genaud, Cedric Bastoul, Adilla Susungi, Antoine Pierquin.
-
Title:
MICROCARD-2: numerical modeling of cardiac electrophysiology at the cellular scale
-
Duration:
from November 1, 2024 to April 30, 2027
-
Partners:
Inria, France; Karlsruher Institut Für Technologie, Germany; Megware, Germany; Simula Research Laboratory (Simula), Norway; Technical University München (TUM), Germany; Università degli Studi di Pavia, Italy; Università di Trento (UTrento), Italy; Université de Bordeaux, France; Université de Strasbourg, France.
-
Coordinator:
Mark Potse, Université de Bordeaux
-
WP4 leader:
Vincent Loechner
-
Summary:
The MICROCARD-2 project is coordinated by Université de Bordeaux and involves the Inria teams Carmen, Cardamom, Storm and TADaaM in Bordeaux, and CAMUS in Strasbourg, among a total of ten partner institutions in France, Germany, Italy, and Norway. This Centre of Excellence for numerical modeling of cardiac electrophysiology at the cellular scale builds on the MICROCARD(-1) project (2021–2024), and has the same website.
The modelling of cardiac electrophysiology at the cellular scale requires thousands of model elements per cell, of which there are billions in a human heart. Even for small tissue samples such models require at least exascale supercomputers. In addition the production of meshes of the complex tissue structure is extremely challenging, even more so at this scale. MICROCARD-2 works, in concert, on every aspect of this problem: tailored numerical schemes, linear-system solvers, and preconditioners; dedicated compilers to produce efficient system code for different CPU and GPU architectures (including the EPI and other ARM architectures); mitigation of energy usage; mesh production and partitioning; simulation workflows; and benchmarking.
The contribution of the CAMUS team concerns code optimization of the ionic models, and implies the MLIR compiler frontend and SIMD code generation for CPUs, plus GPU (Nvidia and AMD) code generation. An engineer and a junior researcher have been hired from Jan./Feb. 2025.
8.4 National initiatives
8.4.1 ANR OptiTrust
Participants: Arthur Charguéraud, Thomas Koehler, Guillaume Bertholon, Elian Morel, Julien François de Castelnau, Jens Gustedt.
Turning a high-level, unoptimized algorithm into a high-performance code can take weeks, if not months, for an expert programmer. The challenge is to take full advantage of vectorized instructions, of all the cores and all the servers available, as well as to optimize the data layout, maximize data locality, and avoid saturating the memory bandwidth. In general, annotating the code with "pragmas" is insufficient, and domain-specific languages are too restrictive. Thus, in most cases, the programmer needs to write, by hand, a low-level code that combines dozens of optimizations. This approach is not only tedious and time-consuming, it also degrades code readibility, harms code maintenance, and can result in the introduction of bugs. A promising approach consists of deriving an HPC code via a series of source-to-source transformations guided by the programmer. This approach has been successfully applied in niche domains, such as image processing and machine learning. We aim to generalize this approach to optimize arbitrary code. Furthermore, the OptiTrust project aims at obtaining formal guarantees on the output code. A number of these transformations are correct only under specific hypotheses. We will formalize these hypotheses, and investigate which of them can be verified by means of static analysis. To handle the more complex hypotheses, we will transform not just code but also formal invariants attached to the code. Doing so will allow exploiting invariants expressed on the original code for justifying transformations performed at the n-th step of the transformation chain.
- Funding: ANR
- Start: October 2022
- End: September 2028
- Coordinator: Arthur Charguéraud (Inria)
- Partners: Inria team Camus (Strasbourg), Inria team MACARON (formerly TONUS) (Strasbourg), Inria team Cambium (Paris), Inria team CASH (Lyon), CEA team LIST
8.4.2 ANR AUTOSPEC
Participants: Bérenger Bramas, Philippe Clauss, Stéphane Genaud, Marek Felosci, Anastasios Souris.
The AUTOSPEC project aims to create methods for automatic task-based parallelization and to improve this paradigm by increasing the degree of parallelism using speculative execution. The project will focus on source-to-source transformations for automatic parallelization, speculative execution models, DAG scheduling, and the activation mechanisms for speculative execution. With this aim, the project will rely on a source-to-source compiler that targets the C++ language, a runtime system with speculative execution capabilities, and an editor (IDE) to enable compiler-guided development. The outcomes from the project will be open-source with the objective of developing a user community. The benefits will be of great interest both for developers who want to use an automatic parallelization method, but also for high-performance programming experts who will benefit from improvements of the task-based programming. The results of this project will be validated in various applications such as a protein complexes simulation software, and widely used open-source software. The aim will be to cover a wide range of applications to demonstrate the potential of the methods derived from this project while trying to establish their limitations to open up new research perspectives.
- Funding: ANR (JCJC)
- Start: October 2021
- End: September 2025
- Coordinator: Bérenger Bramas
8.4.3 Exa-SofT project, PEPR NumPEx
Participants: Bérenger Bramas, Philippe Clauss, Raphael Colin, Ugo Battiston, Erwan Auer.
Though significant efforts have been devoted to the implementation and optimization of several crucial parts of a typical HPC software stack. Most HPC experts agree that exascale supercomputers will raise new challenges, mostly because the trend in exascale compute-node hardware is toward heterogeneity and scalability. Compute nodes of future systems will have a combination of regular CPUs and accelerators (typically GPUs), along with a diversity of GPU architectures. Meeting the needs of complex parallel applications and the requirements of exascale architectures raises numerous challenges which are still left unaddressed. As a result, several parts of the software stack must evolve to better support these architectures. More importantly, the links between these parts must be strengthened to form a coherent, tightly integrated software suite. The Exa-SofT project aims at consolidating the exascale software ecosystem by providing a coherent, exascale-ready software stack featuring breakthrough research advances enabled by multidisciplinary collaborations between researchers. The main scientific challenges we intend to address are: productivity, performance portability, heterogeneity, scalability and resilience, performance, and energy efficiency.
Philippe Clauss is managing the work package 2 "Just-in-Time code optimization with continuous feedback loop" of this project. He is also involved in two major tasks of this package devoted (1) to the integration of polyhedral optimization techniques in the Kokkos framework and (2) to the development of an dynamic multi-versioning system.
- Funding: PEPR NumPEx
- Start: September 2023
- End: August 2028
- Coordinator: Raymond Namyst (Inria STORM)
- WP2 co-leader: Philippe Clauss
8.4.4 PEPR CAMELIA
Participants: Thomas Koehler, Cedric Bastoul, Arthur Charguéraud.
-
Funding:
PEPR (3rd type)
-
Title:
Composants pour l'Accélération Matérielle et Logicielle de l'IA
-
Duration:
from March 2026 to 2032 (6 years).
-
Coordinators:
Cédric Auliac (CEA), Olivier Sentieys (Inria TARAN)
-
WP4 Coordinators:
Fabrice Rastello (Inria CORSE), H.P. Charles (CEA)
-
WP4.2 Coordinator:
Thomas Koehler
-
Summary:
The French government requires sovereign access to key components required for AI and its acceleration. In this context, the ASIC and Numeric program agencies backed by CEA and Inria were trusted with proposing a research and attractivity strategy. This research program complements other national initiatives, with a focus on developing modular hardware acceleration components and their software stack. WP4 focuses on the software aspect of the project. WP4.2 tackles program representation and compilation challenges: from high-level domain-specific languages down to low-level hardware targets, their runtime and ISAs.
The contribution of the CAMUS team is to coordinate WP4.2 and to develop new compilation techniques that facilitate prototyping AI code optimizations at all abstraction levels, from tensor expressions down to hardware ISAs. Ideally, these techniques enable producing highly optimized AI code for new accelerators without having to rewrite hand-optimized libraries or to redesign optimizing compilers.
9 Dissemination
9.1 Promoting scientific activities
9.1.1 Scientific events: organisation
General chair, scientific chair
- Thomas Koehler : Séminaire Pile Logicielle et Compilation pour l'IA, Aussois
9.1.2 Scientific events: selection
Member of the conference program committees
- Arthur Charguéraud : SPAA'25 (ACM Symposium on Parallelism in Algorithms and Architectures)
- Arthur Charguéraud : ML'25 (ML family workshop, colocated with ICFP)
- Arthur Charguéraud : PLDI'25 (ACM Conference on Programming Language Design and Implementation)
- Thomas Koehler : PLDI'25 (ACM Conference on Programming Language Design and Implementation)
- Cedric Bastoul : CC'26 (Intl Conference on Compiler Construction)
Reviewer
- Thomas Koehler : EGRAPHS'25 workshop at PLDI'25
9.1.3 Journal
Reviewer - reviewing activities
- Thomas Koehler : TACO (ACM)
- Vincent Loechner : TACO (ACM), Journal of Symbolic Computation (Elsevier)
- Philippe Clauss : Journal of Supercomputing
- Arthur Charguéraud : Journal of Functional Programming
9.1.4 Invited talks
- Philippe Clauss has been invited as keynote speaker to the 15th International Workshop on Polyhedral Compilation Techniques (IMPACT 2025), January 22, 2025, Barcelona, Spain : Counting-based Loop Optimization.
- Thomas Koehler : Guided Equality Saturation, AST Lab, ETH Zürich, Switzerland
- Thomas Koehler : A Case For Interactive Optimization Assistants, User-Schedulable Languages Workshop, ASPLOS, Rotterdam, Netherlands
- Arthur Charguéraud : Binding Boolean Expressions and Extended Pattern Matching, Inria Cambium team, Paris, France.
9.1.5 Scientific expertise
- Arthur Charguéraud has been reviewer for 2 ANR projects.
- Jens Gustedt is a member of the ISO/IEC working groups ISO/IEC PL1/SC22/WG14 and WG21 for the standardization of the C and C++ programming languages, respectively.
9.1.6 Research administration
- Stéphane Genaud is the head of the ICPS team for the ICube lab. Arthur Charguéraud is vice-head.
- Jens Gustedt is deputy director of the ICube lab, responsible for the IT and CS policy and for the coordination between the lab and the Inria center. In that function, he also represents ICube on the board of the project committee of the Inria Centre at Université de Lorraine.
- Jens Gustedt is member of the steering committee of the interdisciplanary institute IRMIA++ of Strasbourg University.
- Jens Gustedt is (together with Philippe Helluy of the IRMA lab) responsible for the Inria PIQ program for the Strasbourg site.
- Arthur Charguéraud is a member of the COMIPERS jury for PhD and postdoc grants at Inria Nancy Grand-Est.
- Arthur Charguéraud represents Inria at the meetings of the MSII doctoral school (Mathématiques, Sciences de l'Information et de l'Ingénieur, ED269) in Strasbourg.
- Bérenger Bramas is a member of the CDT and IES committee at Inria Nancy Grand-Est.
9.2 Teaching - Supervision - Juries - Educational and pedagogical outreach
9.2.1 Teaching
- Licence:
- Philippe Clauss , Computer architecture, 18h, L2, Université de Strasbourg, France
- Vincent Loechner , Algorithmics and programmation, 82h, L1, Université de Strasbourg, France
- Vincent Loechner , System administration, 40h, Licence Pro, Université de Strasbourg, France
- Vincent Loechner , Parallel programming, 18h, M1, Université de Strasbourg, France
- Bérenger Bramas , System programming, 24h, L2, UFAZ, France-Azerbadjian
- Alain Ketterlin , Culture et pratique de l'Informatique, L1 Math-Info, 48h, Université de Strasbourg, France
- Alain Ketterlin , Programmation système, L2 Math-Info, 40h, Université de Strasbourg, France
- Alain Ketterlin , Algorithmique et programmation, L1 Math-Info, 66h, Université de Strasbourg, France
- Alain Ketterlin , Software Engineering (an Anglais), L2 Math-Info, 64h, Université de Strasbourg, France
- Stéphane Genaud , Algorithmics and programmation, 82h, L1, Université de Strasbourg, France
- Stéphane Genaud , Data Structures & Algorithms 2, 25h, L2, UFAZ, France-Azerbadjian
- Stéphane Genaud , Parallel programming, 30h, M1, Université de Strasbourg, France
- Master:
- Philippe Clauss , Compilation, 132h, M1, Université de Strasbourg, France
- Philippe Clauss , Real-time programming and system, 37h, M1, Université de Strasbourg, France
- Philippe Clauss , Code optimization and transformation, 31h, M1, Université de Strasbourg, France
- Vincent Loechner , Real-time systems, 12h, M1, Université de Strasbourg, France
- Bérenger Bramas , Compilation and Performance, 24h, M2, Université de Strasbourg, France
- Bérenger Bramas , Compilation, 24h, M1, Université de Strasbourg, France
- Cedric Bastoul , Parallel programming, 10h, M1, Université de Strasbourg, France
- Cedric Bastoul , Compilation, 36h, M1, Université de Strasbourg, France
- Cedric Bastoul , Research & Development Project, 20h, M2, Université de Strasbourg, France
- Stéphane Genaud , Cloud and Virtualization, 12h, M1, Université de Strasbourg, France
- Stéphane Genaud , Large-Scale Data Processing, 15h, M1, Université de Strasbourg, France
- Stéphane Genaud , Distributed Storage and Processing, 15h, M2, Université de Strasbourg, France
- Eng. School:
- Vincent Loechner , Parallel programming, 20h, Telecom Physique Strasbourg - 3rd year, Université de Strasbourg, France
- Stéphane Genaud , Introduction to Operating Systems, 16h, Telecom Physique Strasbourg - 1st year, Université de Strasbourg, France
- Stéphane Genaud , Object-Oriented Programming, 60h, Telecom Physique Strasbourg - 1st year, Université de Strasbourg, France
- Free online course: Arthur Charguéraud has made publicly available the solutions to the 125+ exercises of his all-in-Rocq course on the Foundations of Separation Logic.
- DU IRMIA++ interdisciplinary seminar, as well as seminar of the doctoral school ED269 : Arthur Charguéraud , Introductory course to Interactive Program Verification (3h), Université de Strasbourg, France
- Corps des mines: Arthur Charguéraud , Design and Implementation of Educational Software (10h), Paris, France
Teaching tracks:
- Philippe Clauss is in charge of the master's degree in Computer Science of the University of Strasbourg, since Sept. 2020.
- Stéphane Genaud is in charge of the Bachelor in Computer Science and co-head of Master Data Science and Artificial Intelligence at UFAZ (Baku, Azerbadjian) who delivers Unistra diplomas. Since resp. Aug. 2023 and Aug. 2024.
- Cedric Bastoul is in charge of the Software Science and Engineering track of the Master's degree in Computer Science of the University of Strasbourg, since Sept. 2025.
9.2.2 Supervision
PhD completed:
- PhD defended in 2025: Guillaume Bertholon, Interactive Compilation via Trusworthy Source-to-Source Transformations, advised by Arthur Charguéraud , since Sept 2022.
- PhD defended in 2025: Clément Rossetti, Algebraic Tiling: Volume-guided Tiling of Parallel Loops for Near-Perfect Load Balancing, advised by Philippe Clauss , since Oct 2022.
- PhD defended in 2025: Hayfa Tayeb, Efficient scheduling strategies for the task-based parallelization, advised by Bérenger Bramas , Abdou Guermouche (Inria project-team TOPAL), Mathieu Faverge (Inria project-team TOPAL), since Nov 2021.
- PhD defended in 2025: David Algis, Hybridization of the Tessendorf method and Smoothed Particle Hydrodynamics for real-time ocean simulation., advised by Bérenger Bramas , Emmanuelle Darles (XLim), Lilian Aveneau (XLim lab), since Oct 2022.
PhD in progress:
- PhD in progress: Yanni Lefki, Foundational Verification of Interactively Optimized Programs, is advised by Arthur Charguéraud , since Sept 2025.
- PhD in progress: Raphaël Colin, Runtime multi-versioning of parallel tasks, advised by Philippe Clauss and Thierry Gautier (Inria project-team Avalon), since Oct. 2023.
- PhD in progress: Ugo Battiston, C++ complexity disambiguation for advanced optimizing and parallelizing code transformations, advised by Philippe Clauss and Marc Pérache (CEA), since Oct. 2023.
- PhD in progress: Tom Hammer, Synergie entre ordonnancement et optimisation des accès mémoire dans le modèle polyédrique, advised by Stéphane Genaud and Vincent Loechner , since Sept 2023.
- PhD in progress: Valéran Maytie , Optimizing LLMs with Sketch-Guided Polyhedral Compilation, advised by Thomas Koehler and Cedric Bastoul , since Oct 2025.
9.2.3 Juries
- Cedric Bastoul has been reviewer and member of the jury for the PhD thesis of Vincent Alba at the University of Bordeaux
- Cedric Bastoul has been president of the jury for the PhD thesis of Lana Scravaglieri at the University of Bordeaux
- Cedric Bastoul has been president of the jury for the Habilitation thesis of Quentin Bramas at the University of Strasbourg
- Arthur Charguéraud has been reviewer and member of the jury for the PhD thesis of Josué Moreau at the University Paris–Saclay
- Arthur Charguéraud has been garant and member of the jury for the Habilitation thesis of Bérenger Bramas at the University of Strasbourg
- Jens Gustedt has been reviewer and member of the jury for the thesis of Sébastien Michelland at the Université Grenoble Alpes
9.3 Popularization
9.3.1 Specific official responsibilities in science outreach structures
- Arthur Charguéraud is co-founder and vice-president of the non-profit organization France-ioi. This organization is in charge of the French participation to international olympiads in informatics. It also organizes numerous contests, such as the Concours Castor, Concours Algorea, concours Alkindi, and the French Olympiads in Informatics.
- Arthur Charguéraud is a co-organizer of the Concours Castor informatique. The purpose of the Concours Castor in to introduce pupils, from CM1 to Terminale, to computer sciences. 650,000 teenagers played with the interactive exercises in November and December 2025.
10 Scientific production
10.1 Major publications
- 1 inproceedingsProvably and Practically Efficient Granularity Control.PPoPP 2019 - Principles and Practice of Parallel ProgrammingWashington DC, United StatesFebruary 2019HALDOI
- 2 inproceedingsAutomatic Collapsing of Non-Rectangular Loops.Parallel and Distributed Processing Symposium (IPDPS), 2017Orlando, United StatesIEEE InternationalMay 2017, 778 - 787HALDOIback to textback to text
- 3 inproceedingsCounting Solutions to Linear and Nonlinear Constraints Through Ehrhart Polynomials: Applications to Analyze and Transform Scientific Programs.ICS, International Conference on SupercomputingACM International Conference on Supercomputing 25th Anniversary VolumeMunich, Germany2014HALDOIback to text
- 4 articleSymbolic polynomial maximization over convex sets and its application to memory requirement estimation.IEEE Transactions on Very Large Scale Integration (VLSI) Systems178August 2009, 983-996HALDOI
- 5 articleIterative Computations with Ordered Read-Write Locks.Journal of Parallel and Distributed Computing7052010, 496-504HALDOI
- 6 articleParametric Analysis of Polyhedral Iteration Spaces.Journal of Signal Processing Systems192July 1998, 179-194HALDOIback to text
- 7 bookModern C.ManningNovember 2019HAL
- 8 articleDynamic and Speculative Polyhedral Parallelization Using Compiler-Generated Skeletons.International Journal of Parallel Programming424August 2014, 529-545HAL
- 9 inproceedingsPrediction and trace compression of data access addresses through nested loop recognition.6th annual IEEE/ACM international symposium on Code generation and optimizationProceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimizationBoston, United StatesACMApril 2008, 94-103HALDOIback to text
- 10 inproceedingsProfiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization.MICRO-45, The 45th Annual IEEE/ACM International Symposium on MicroarchitectureVancouver, CanadaDecember 2012HAL
- 11 articlePolyhedral parallelization of binary code.ACM Transactions on Architecture and Code Optimization84January 2012, 39:1--39:21HALDOI
- 12 articleThe Polyhedral Model of Nonlinear Loops.ACM Transactions on Architecture and Code Optimization124January 2016HALDOI
10.2 Publications of the year
International journals
Invited conferences
International peer-reviewed conferences
Conferences without proceedings
Scientific books
Doctoral dissertations and habilitation theses
Reports & preprints
Other scientific publications
10.3 Cited publications
- 57 phdthesisInteractive compilation via trustworthy source-to-source transformations.Université de StrasbourgSeptember 2025HALback to text
- 58 mastersthesisNumerical Analysis of Highly Performant Functional Array Programs.MA ThesisUppsala University, Department of Information TechnologyUppsala University, Department of Information Technology2025, 37back to text
- 59 phdthesisEfficient data compression for high-performance PDE solvers.Université de StrasbourgOctober 2024HALback to text
- 60 inproceedingscuSZp2: A GPU Lossy Compressor with Extreme Throughput and Optimized Compression Ratio.Proceedings of the International Conference for High Performance Computing, Networking, Storage, and AnalysisSC '24Atlanta, GA, USAIEEE Press2024, URL: https://doi.org/10.1109/SC41406.2024.00021DOIback to text
- 61 bookISO/IEC IS 9899:2024: Programming languages - C - A provenance-aware memory object model for C.pub-ISO:adrpub-ISOMay 2025, 23back to text
- 62 bookISO/IEC IS 9899:2024: Programming languages - C.pub-ISO:adrpub-ISOOctober 2024, 758back to text
- 63 inproceedingsEasy Counting and Ranking for Simple Loops.IMPACT 2024 -- 14th International Workshop on Polyhedral Compilation TechniquesMünich, GermanyJanuary 2024HALback to text
- 64 miscExploring Accuracy and Performance Trade-offs in Functional Array Programs.Uppsala University, Computing Science2025back to text
- 65 inproceedingsTowards Optimising Programs with Sketch-Guided Polyhedral Compilation.IMPACT 2026 - International Workshop on Polyhedral Compilation TechniquesCracovie (PL), PolandJanuary 2026HALback to text
- 66 inproceedingsAlgebraic Tiling.IMPACT 2023, 13th International Workshop on Polyhedral Compilation TechniquesToulouse, FranceJanuary 2023HALback to text
- 67 articleAutovesk: Automatic Vectorized Code Generation from Unstructured Static Kernels Using Graph Transformations.ACM Trans. Archit. Code Optim.211December 2023, URL: https://doi.org/10.1145/3631709DOIback to text