MADMAX

MADMAX - 2025

2025Activity report‌Project-TeamMADMAX

RNSR: 202524725W‌

Research center Inria Centre‌‌ at Université Grenoble Alpes
In partnership with:CNRS,‌ Institut polytechnique de Grenoble,‌ Université de Grenoble Alpes‌‌
Team name: Moore, Amdahl, Dennard to their MAXimum‌
In collaboration with:Techniques‌ de l'Informatique et de‌‌ la Microélectronique pour l'Architecture des systèmes intégrés

Creation‌ of the Project-Team: 2025‌ August 01

Each year,‌‌ Inria research teams publish an Activity Report presenting‌ their work and results‌ over the reporting period.‌‌ These reports follow a common structure, with some‌ optional sections depending on‌ the specific team. They‌‌ typically begin by outlining the overall objectives and‌ research programme, including the‌ main research themes, goals,‌‌ and methodological approaches. They also describe the application‌ domains targeted by the‌ team, highlighting the scientific‌‌ or societal contexts in which their work is‌ situated.

The reports then‌ present the highlights of‌‌ the year, covering major scientific achievements, software developments,‌ or teaching contributions. When‌ relevant, they include sections‌‌ on software, platforms, and‌ open data, detailing the tools developed and how‌ they are shared. A substantial part is dedicated‌ to new results, where scientific contributions are described‌ in detail, often with subsections specifying participants and‌ associated keywords.

Finally, the Activity Report addresses funding,‌ contracts, partnerships, and collaborations at various levels, from‌ industrial agreements to international cooperations. It also covers‌ dissemination and teaching activities, such as participation in‌ scientific events, outreach, and supervision. The document concludes‌ with a presentation of scientific production, including major‌ publications and those produced during the year.

Keywords‌

Computer Science and Digital Science

A1.1. Architectures
A1.1.1.‌ Multicore, Manycore
A1.1.2. Hardware accelerators (GPGPU, FPGA, etc.)‌
A1.1.10. Reconfigurable architectures
A1.6. Green Computing
A2.3. Embedded‌ and cyber-physical systems
A4.5. Formal method for verification,‌ reliability, certification
A9.7. AI algorithmics

1 Team‌ members, visitors, external collaborators

Research Scientists

Cesar Fuguet‌ Tortolero [INRIA, Researcher, from Aug‌ 2025]
Arthur Pérais [CNRS, Researcher‌, from Aug 2025]

Faculty Members

Frédéric‌ Pétrot [Team leader, Grenoble INP -‌ UGA, Professor, from Aug 2025,‌ HDR]
Liliana Andrade [Grenoble INP -‌ UGA, Associate Professor, from Aug 2025‌, TIMA T423 / Polytech B217]
Julie‌ Dumas [Grenoble INP - UGA, Associate‌ Professor, from Aug 2025]
Olivier Muller‌ [Grenoble INP - UGA, Associate Professor‌, from Aug 2025]
Laurence Pierre [‌UGA, Professor, from Aug 2025,‌ HDR]
Frédéric Rousseau [Grenoble INP -‌ UGA, Professor, from Aug 2025,‌ HDR]

PhD Students

Andrei Ilin [UGA‌, from Dec 2025]
Kilian Mc Govern‌ [UGA, from Aug 2025]
Abdallah‌ Meebed [UGA, from Aug 2025]‌
Olivier Romane [UGA, from Aug 2025‌]
Johan Soderstrom [INRIA, from Nov‌ 2025]

Administrative Assistant

Myriam Etienne [INRIA‌]

2 Overall objectives

Our motto is "Making‌ the most of transistors in times of dwindling‌ semiconductor technology".

There are many challenges to‌ tackle in the context of Moore's law slowing‌ down and Dennard scaling law not applying anymore‌ for the design and implementation of computing machines.‌ Among those, our objectives are (1) to contribute‌ to more power efficient mid to high-performance processor‌ architectures, needed for sequential performance, (2) to propose‌ dedicated hardware support to Artificial Intelligence workloads, to‌ support execution of relatively large neural networks at‌ an acceptable cost, for edge applications, and (3)‌ define new design and verification methods for software‌ centric hardware systems, and develop tools to put‌ them into practice.

3 Research program

Micro-architecture:
Discovery‌ of yet more execution parallelism is challenging, but‌ also rewarding. It can be at instruction level‌ by leveraging speculation where possible, and at thread‌ level by letting users provide synchronization or data-sharing‌ hints, either within or among processors. For the latter case, the efficiency‌ of the memory accesses‌ remains key, and we‌‌ therefore also work on the details of the‌ cache coherency protocols: the‌ high-level state machines are‌‌ now well established, but a safe path to‌ implementation is challenging,
AI‌ acceleration:
Hardware acceleration of‌‌ AI workloads is inevitable. For example, 100x in‌ power-efficiency compared to CPU‌ is necessary to sustain‌‌ current AI deployment according to the GAFAM. Although‌ developing hardware at the‌ RTL is hard, we‌‌ believe that it is a necessary evil to‌ reach best-in-class performance required‌ by these ubiquitous devices.‌‌ We work on extreme quantization for weights and‌ activations, on the tiling‌ and scheduling of memory‌‌ accesses for hardware level folded networks, and weight‌ compression,
Computer-aided-design:
Although it‌ does not provide any‌‌ formal guarantees, simulation is the technology used to‌ validate, in particular, software‌ centric systems. Speed and‌‌ accuracy are key metrics of a simulation infrastructure,‌ but as we know,‌ architects cannot have their‌‌ cake and eat it too, and fast simulation‌ is inaccurate, while accurate‌ simulation is slow. We‌‌ work with QEMU, a fast simulator we contributed‌ to, for modeling cache‌ coherence protocols, 128-bit architectures,‌‌ etc. Semi-formal verification (runtime verification, not exhaustive but‌ fully automatable) or formal‌ verification methods (less automatic‌‌ but exhaustive) can provide additional guarantees about functional‌ or non-functional properties. We‌ in particular look into‌‌ SAIL, a framework for formal ISA description, to‌ validate dynamic binary translation.‌

4 Application domains

The‌‌ team develops core technologies that is meant to‌ be usable in many‌ different contexts, and does‌‌ not focus on one or several specific applications.‌

5 Social and environmental‌ responsibility

MADMAX work is‌‌ at the boundary between hardware and software, and‌ semiconductor manufacturing, used for‌ the fabrication of the‌‌ devices we design, consumes significant energy and water‌ and emits toxic chemicals‌ and greenhouse gases. But‌‌ without this very sophisticated industry and its capability‌ to shrink the size‌ of the devices, there‌‌ will be no software and no software evolution‌ as we know it.‌

5.1 Footprint of research‌‌ activities

In the absence of monitoring on our‌ premices, we can hardly‌ give a quantitative information.‌‌ We nevertheless host two large scale servers (96‌ and 128 cores respectively),‌ and a server with‌‌ a Blackwell GPU, consuming around 700 W each.‌

5.2 Impact of research‌ results

The team focuses‌‌ on better power efficiency for computations, be it‌ for general purpose computing‌ or hardware acceleration of‌‌ AI. In that sense, it has both a‌ positive and negative environmental‌ impact. Indeed, by lowering‌‌ the amount of power needed for some computing‌ task, it makes this‌ task feasible at a‌‌ larger scale. This vicious circle is call Jevon's‌ paradox, and is typical‌ of our modern societies.‌‌ LED lightning is quite interesing in that regard‌ (see https://hal.science/hal-05396402/).

6‌ Highlights of the year‌‌

6.1 Awards

Best paper award for our paper‌ "HPC Workload Analysis Using‌ Distributed Cross-ISA Binary Instrumentation",‌‌ presented at the 40th‌ Conference on Design of Circuits and Integrated Systems,‌ 2025 8
Best paper award for the paper‌ to which we contributed "FetchFlare: An Open-Source Strided‌ Data Prefetcher for High-Performance Cache Hierarchies", presented at‌ the 28th Euromicro Conference on Digital System Design‌ (DSD), 2025 6
Best paper award for the‌ paper to which we contributed "Ramping Up Open-Source‌ RISC-V Cores: Assessing the Energy Efficiency of Superscalar,‌ Out-of-Order Execution", presented at the 22nd ACM International‌ Conference on Computing Frontiers, 2025 5

7 Latest‌ software developments, platforms, open data

7.1 Latest software‌ developments

7.1.1 cv-hpdcache

Name:
OpenHW Core-V High-Performance L1‌ Dcache (CV-HPDcache)
Keywords:
Cache, Memory hierarchy
Scientific Description:‌
L1 instruction and data cache supporting several out-of-order‌ transactions and hit under miss.
Functional Description:
This‌ cache hold the data and instructions close to‌ the processor to minimize the latency of memory‌ accesses. Its microarchitecture supports multiple simultaneous accesses, with‌ requests returned in random order to allow pending‌ instructions to be released as quickly as possible.‌ It uses separate memory banks to activate only‌ the necessary parts, with a view to saving‌ energy.
Release Contributions:

This releases include some bugfixes‌ and optimizations.

Added

Support responses for CMO operations‌ when need_rsp in the request is set Support‌ not-power-of-two number of entries in the Flush controller‌

Fixed

Fix implementation of the data merge logic‌ in the write buffer to improve the area‌ Fix assertions syntax Fix CMO flushes shall unset‌ the dirty bit in the cache directory Fix‌ handling of bus errors on write misses Fix‌ prefetch requests shall update PLRU bits Fix initialization‌ of the CMO handler flush request valid register‌
URL:
https://github.com/openhwgroup/cv-hpdcache
Publications:
hal-05452261v1, cea-04110679v1
Contact:
Cesar‌ Fuguet Tortolero

7.1.2 qemu-riscv128

Name:
Quick emulation of‌ the rv128 instruction set architecture
Keywords:
Full system‌ simulation, Emulation
Functional Description:
This program is a‌ QEMU fork that supports elf128. Emulation of 128-bit‌ riscv instructions has been available in QEMU (https://github.com/qemu/qemu)‌ since january 2022. This is the first (and‌ currently only) simulator of a 128-bit processor.
Release‌ Contributions:
Support for reading executables using the elf128‌ format.
URL:
https://github.com/fpetrot/qemu-riscv128/
Contact:
Frederic Petrot

7.1.3 rv128-toolchain‌

Name:
Cross-compilation toolchain for 128-bit riscv
Keyword:
Compilers‌
Functional Description:
This tool-chain allows to cross-compile, assemble,‌ link and debug C programs to target the‌ rv128 riscv instruction set architecture. This cross-compilation environment‌ is based on gcc, the binutils, and gdb.‌ It also contains the definition of what is‌ the elf128 data format.
Release Contributions:
Bug corrections‌
URL:
https://github.com/fpetrot/riscv-gnu-toolchain.git
Contact:
Frederic Petrot

8 New results‌

CVA6S+: A Superscalar RISC-V Core with High-Throughput Memory‌ Architecture

Participants: Riccardo Tedeschi (University of Bologna),‌ Gianmarco Ottavi (University of Bologna), Côme Allart‌ (Thales and Mines Saint-Etienne, CEA, Leti, Centre CMP)‌, Nils Wistoff (ETH Zurich), Zexin Fu‌ (ETH Zurich), Filippo Grillotti (STMicroelectronics, Agrate Brianza)‌, Fabio de Ambroggi (STMicroelectronics, Agrate Brianza),‌ Elio Guidetti (STMicroelectronics, Agrate Brianza), Jean-Baptiste Rigaud‌ (Mines Saint-Etienne, CEA, Leti, Centre CMP), Olivier Potin (Mines Saint-Etienne, CEA,‌ Leti, Centre CMP),‌ Jean-Roch Coulon (Thales),‌‌ César Fuguet, Luca Benini (ETH Zurich and‌ University of Bologna),‌ Davide Rossi (University of‌‌ Bologna).

Open-source RISC-V cores are increasingly adopted‌ in high-end embedded domains‌ such as automotive, where‌‌ maximizing instructions per cycle (IPC) is becoming critical.‌ Building on the industry-supported‌ open-source CVA6 core and‌‌ its superscalar variant, CVA6S, we introduce CVA6S+, an‌ enhanced version incorporating improved‌ branch prediction, register renaming‌‌ and enhanced operand forwarding. These optimizations enable CVA6S+‌ to achieve a 43.5%‌ performance improvement over the‌‌ scalar configuration and 10.9% over CVA6S, with an‌ area overhead of just‌ 9.30% over the scalar‌‌ core (CVA6). Furthermore, we integrate CVA6S+ with the‌ OpenHW Core-V High-Performance L1‌ Dcache (HPDCache) and report‌‌ a 74.1% bandwidth improvement over the legacy CVA6‌ cache subsystem.

Published as‌ "CVA6S+: A superscalar RISC-V‌‌ core with high-throughput memory architecture." arXiv preprint arXiv:2505.03762‌ (2025).

Depth-first: A deterministic‌ and scalable NoC routing‌‌ protocol for 3.5D packaged architectures

Participants: Davy Million‌ (CEA List), César‌ Fuguet, Adrian Evans‌‌ (CEA List), Rim El Cheikh (Université Clermond-Auvergne)‌, Alireza Monemi (Barcelona‌ SuperComputing Center), Jonathan‌‌ Balkind (University of Santa Barbara), Frédéric Pétrot‌.

13 New high-volume‌ commercial products combine 2.5D‌‌ silicon-interposer based assemblies with 3D monolithic stacks of‌ chiplets. This combination is‌ called 3.5D packaging and‌‌ makes it possible to assemble dense compute solutions.‌ Components communicate via a‌ Network-On-Chip, but current solutions‌‌ do not support 3.5D Network-On-Chip topologies. To this‌ end, this work proposes‌ Depth-First, the first Deterministic,‌‌ Virtual Channel based, Network-On-Chip routing protocol supporting 3.5D‌ network topologies. The protocol‌ prevents deadlocks using additional‌‌ Virtual Channels only in the upper chiplets, while‌ imposing no VC constraints‌ on the base interposer.‌‌ Depth-First also features an efficient node naming scheme,‌ enabling highly compact routing‌ tables. Since vertical links‌‌ must be assigned to routers, we present a‌ Mixed-Integer Linear Programming formulation‌ that greatly speeds up‌‌ execution time compared to a reference implementation from‌ prior work, which was‌ based on an exhaustive‌‌ search. We formally prove that the protocol is‌ deadlock-free, study its performance‌ using an open-source cycle-accurate‌‌ simulator, and compare it with other protocols (on‌ a comparable topology). A‌ partial implementation of Depth-First‌‌ in an open-source router results in a small‌ 4.9% area impact (7nm‌ process) compared to an‌‌ implementation without our routing algorithm.

Published as "Depth-first:‌ A deterministic and scalable‌ NoC routing protocol for‌‌ 3.5 D packaged architectures." IEEE Journal on Emerging‌ and Selected Topics in‌ Circuits and Systems (2025)‌‌ 4.

Hardware-software co-design for supporting shared distributed‌ virtual memory

Participants: Eduardo‌ Tomasi Ribeiro (CEA List)‌‌, César Fuguet, Christian Fabre (CEA List)‌, Frédéric Pétrot.‌

With network technologies now‌‌ offering latencies approaching that of memory, sharing distributed‌ memory becomes an increasingly‌ feasible approach. However, memory‌‌ virtualization remains largely unexplored on distributed supercomputers, and‌ developers rely on complex‌ programming models to achieve‌‌ high performance. We propose‌ a hardware-software co-design approach to enable a single‌ virtual global address space. Our method introduces the‌ concept of virtual nodes, which are identified by‌ the most significant bits of the virtual address,‌ thereby defining address space ranges that correspond to‌ different nodes within the distributed system. To support‌ this approach, we provide the necessary hardware support‌ and validate its efficacy using two open-source simulators:‌ gem5 and SST. Experimental results demonstrate that programming‌ this system closely resembles the shared memory programming‌ model, and we present its scalability using a‌ benchmark. These results demonstrate the viability of implementing‌ a shared virtual address space in distributed systems,‌ simplifying the development of high-performance computing applications.

Published‌ as "Hardware-software co-design for supporting shared distributed virtual‌ memory." Proceedings of the 22nd ACM International Conference‌ on Computing Frontiers, 2025 9.

Variable and‌ extended precision (VRP) accelerator implemented in a 22‌ nm SoC

Participants: Eric Guthmuller (CEA), César‌ Fuguet, Andrea Bocco (CEA), Jérome Fereyre‌ (CEA), Adrian Evans (CEA), Yves Durand‌ (CEA), Jérôme Fereyre (CEA).

Linear solvers‌ and eigensolvers are the heart of HPC scientific‌ applications. Among them, iterative projection methods are preferred‌ to direct algorithms for large problems because of‌ their lower memory usage, but they are prone‌ to roundoff errors. Using an enhanced working precision‌ inside the linear computing kernels mitigates this issue‌ and accelerates convergence. However, only software libraries support‌ variable and extended precision Floating Point (FP) computations‌ beyond 80 bits. We introduce the VaRiable and‌ extended Precision Accelerator (VRP), a RISC-V accelerator implemented‌ on a System-on-Chip (SoC) using GF22FDX technology. The‌ VRP supports FP computations with a range of‌ significand bits from 2 to 512. This accelerator‌ delivers an average 19.25x application speedup compared to‌ the well-known MPFR software library running on a‌ 2400+ MHz Intel Xeon processor. Additionally, extended precision‌ facilitates the convergence of linear solvers for problems‌ that would otherwise fail to converge and reduces‌ energy-to-solution.

Published as "Variable and extended precision (VRP)‌ accelerator implemented in a 22 nm SoC." Electronics‌ Letters 61.1 (2025): e70255 12.

On Benefits‌ of Modeling the HPDcache in LNT

Participants: Zachary‌ Assoumani, César Fuguet, Radu Mateescu,‌ Wendelin Serwe.

Stepping from natural language towards‌ modern formal languages such as LNT is beneficial‌ for specifying hardware architectures. We illustrate this on‌ the HPDcache, the informal specification of which contains‌ numerous fragments in pseudo-code. Due to the syntactical‌ similarities between the latter and LNT, modeling the‌ HPDcache’s informal specification in LNT was greatly facilitated.‌ The CADP tools supporting LNT enabled us to‌ spot an error in the informal specification of‌ the HPDcache, which might have led to a‌ violation of the memory consistency rules of the‌ RISC-V.

Published as "On Benefits of Modeling the‌ HPDcache in LNT." RISC-V 2025-RISC-V Summit Europe. 2025‌ 11.

HPC Workload Analysis Using Distributed Cross-ISA‌ Binary Instrumentation

Participants: Eduardo Tomasi Ribeiro (CEA), César Fuguet, Christian‌ Fabre (CEA), Frédéric‌ Pétrot.

Developing distributed‌‌ High Performance Computing (HPC) applications is challenging, with‌ complex interactions between application,‌ runtime environment, processing cores,‌‌ and network to obtain the highest performance that‌ a given distributed computing‌ system can provide. HPC‌‌ systems are evolving at a fast pace, so‌ applications must often be‌ ported. Generally, developers natively‌‌ run their applications on current machines and extrapolate‌ the performances on future‌ ones. However, modern and‌‌ future HPC machines contain multiple nodes, each with‌ multiple general-purpose processor cores,‌ possibly with an Instruction‌‌ Set Architecture (ISA) different from the previous generations,‌ as well as new‌ domain-specific accelerators, so simple‌‌ extrapolations may not be accurate. Instead, we propose‌ an automated approach to‌ execute and non-intrusively characterize‌‌ distributed HPC applications on a QEMU-based, cross-ISA, distributed‌ simulation platform. As part‌ of this automated approach,‌‌ we propose a QEMU plugin to extract metrics‌ at runtime during the‌ execution of distributed applications.‌‌ The approach is demonstrated on a RISC-V-based distributed‌ multinode architecture. It achieves‌ an average speedup of‌‌ almost 3.5× on a single host machine with‌ 16 virtual nodes in‌ comparison with a single‌‌ node. Using QEMU plugins for collecting Message Passing‌ Interface (MPI) runtime metrics‌ slows the simulation by‌‌ 1.62× in average, but overall, our approach remains‌ much faster than other‌ simulation platforms.

Published as‌‌ "HPC Workload Analysis Using Distributed Cross-ISA Binary Instrumentation."‌ 2025 40th Conference on‌ Design of Circuits and‌‌ Integrated Systems (DCIS). IEEE, 2025 8. Best‌ Paper Award.

On the‌ Hardware Implementation of Lala's‌‌ 64-bit SEcDED Codes

Participants: Frédéric Pétrot, César‌ Fuguet.

Protecting memories,‌ and particularly caches, is‌‌ necessary for devices running in harsh environments. The‌ standard approach, and for‌ good reason, is to‌‌ use single-error correction, double-error detection (SECDED) codes. Among‌ the solutions, the codes‌ introduced by Lala have‌‌ not received a lot of attention, because they‌ cost one more bit‌ in memory. However, for‌‌ the 64-bit word granularity, they feature the lowest‌ number of ‘1’ in‌ their parity check matrices,‌‌ which translates in less logical operations for encoding‌ and correcting. This work‌ covers the design space‌‌ of Lala‘s codes from a practical point of‌ view, through synthesis on‌ a mature silicon technology.‌‌ We in particular show that the circuit timing,‌ area, and power characteristics‌ highly vary with the‌‌ actual matrix, which allows to devise what could‌ be the most appropriate‌ matrix for a given‌‌ set of system level constraints.

Published as "On‌ the Hardware Implementation of‌ Lala's 64-bit SEcDED Codes."‌‌ 2025 IEEE International Symposium on Defect and Fault‌ Tolerance in VLSI and‌ Nanotechnology Systems (DFT). IEEE,‌‌ 2025 7.

Ramping Up Open-Source RISC-V Cores:‌ Assessing the Energy Efficiency‌ of Superscalar Out-of-Order Execution‌‌

Participants: Zexin Fu (ETH Zurich), Riccardo Tedeschi‌ (University of Bologna),‌ Gianmarco Ottavi (University of‌‌ Bologna), Nils Wistoff (ETH Zurich), César‌ Fuguet, Davide Rossi‌ (University of Bologna),‌‌ Luca Benini (iETH Zurich‌ and University of Bologna).

Open-source RISC-V cores‌ are increasingly demanded in domains like automotive and‌ space, where achieving high instructions per cycle (IPC)‌ through superscalar and out-of-order (OoO) execution is crucial.‌ However, high-performance open-source RISC-V cores face adoption challenges:‌ some (e.g. BOOM, Xiangshan) are developed in Chisel‌ with limited support from industrial electronic design automation‌ (EDA) tools. Others, like the XuanTie C910 core,‌ use proprietary interfaces and protocols, including non-standard AXI‌ protocol extensions, interrupts, and debug support. In this‌ work, we present a modified version of the‌ OoO C910 core to achieve full RISC-V standard‌ compliance in its debug, interrupt, and memory interfaces.‌ We also introduce CVA6S+, an enhanced version of‌ the dual-issue, industry-supported open-source CVA6 core. CVA6S+ achieves‌ 34.4% performance improvement compared to the scalar configuration.‌ We conduct a detailed performance, area, power, and‌ energy analysis on the superscalar out-of-order C910, superscalar‌ in-order CVA6S+ and vanilla, single-issue in-order CVA6, all‌ implemented in GF22FDX technology and integrated into Cheshire,‌ an open-source modular SoC platform. We examine the‌ performance and efficiency of different microarchitectures using the‌ same ISA, SoC, and implementation with identical technology,‌ tools, and methodologies. The area and performance rankings‌ of CVA6, CVA6S+, and C910 follow expected trends:‌ compared to the scalar CVA6, CVA6S+ shows an‌ area increase of 6% and an IPC improvement‌ of 34.4%, while C910 exhibits a 75% increase‌ in area and a 119.5% improvement in IPC.‌ However, efficiency analysis reveals that CVA6S+ leads in‌ area efficiency (GOPS/mm²), while the C910 is highly‌ competitive in energy efficiency (GOPS/W). This challenges the‌ common belief that high performance in superscalar and‌ out-of-order cores inherently comes at a significant cost‌ in terms of area and energy efficiency.

Published‌ as "Ramping up open-source RISC-V cores: Assessing the‌ energy efficiency of superscalar, out-of-order execution." Proceedings of‌ the 22nd ACM International Conference on Computing Frontiers.‌ 2025 5.

FetchFlare: An Open-Source Strided Data‌ Prefetcher for High-Performance Cache Hierarchies

Participants: Golnaz Korkian‌ (Barcelona Supercomputing Center), Neiel Leyva (Barcelona Supercomputing‌ Center and Universitat Politècnica de Catalunya), Arnau‌ Bigas (Barcelona Supercomputing Center), Noelia Oliete-Escuín (Barcelona‌ Supercomputing Center and Universitat Politècnica de Catalunya),‌ Abbas Haghi (Barcelona Supercomputing Center), Alireza Monemi‌ (Barcelona Supercomputing Center), César Fuguet, Lluc‌ Alvarez (Barcelona Supercomputing Center).

In recent years,‌ the rise of open-source hardware has transformed the‌ landscape of technology development. In particular, RISC-V has‌ offered hardware designers the possibility of designing processors‌ in a much cheaper way by leveraging a‌ rich ecosystem of open-source designs that can be‌ easily reused, extended, and customized. Although the RISC-V‌ ecosystem is rapidly growing and open-source processors are‌ becoming increasingly sophisticated, some advanced architectural techniques typically‌ employed in commercial high-performance processors are still not‌ prevalent in RISC-V open-source architectures. Among them, hardware‌ prefetchers have been ubiquitous in highend processors for‌ many years, but they are not as commonly‌ found in open-source RISC-V processors. To bridge this gap, this work presents‌ FetchFlare, a stride prefetcher‌ for highperformance cache hierarchies.‌‌ FetchFlare is able to capture the memory access‌ patterns of applications, predict‌ future memory accesses, and‌‌ issue prefetch requests for them. We provide an‌ open-source RTL implementation of‌ FetchFlare and integrate it‌‌ into a complete open-source setup formed by the‌ OpenPiton framework, the Sargantana‌ core, and the High-Performance‌‌ Data Cache (HPDCache). Compared to a baseline system‌ without prefetching, FetchFlare achieves‌ an average speedup of‌‌ 63%, avoids cache misses in the L1D and‌ the L2 caches, and‌ presents an average accuracy,‌‌ coverage, and timeliness of 86%, 39%, and 99%,‌ respectively.

Published as "FetchFlare:‌ An Open-Source Strided Data‌‌ Prefetcher for High-Performance Cache Hierarchies." 2025 28th Euromicro‌ Conference on Digital System‌ Design (DSD). IEEE, 2025‌‌ 6.

Address/Data Instruction Steering in Clustered General‌ Purpose Processors

Participants: Chandana‌ S. Deshpande, Arthur‌‌ Perais, Frédéric Pétrot.

Although they differentiate‌ between integer and floating-point‌ datum, modern Instruction Set‌‌ Architectures and their implementations do not differentiate integer‌ datum used to address‌ memory from integer datum‌‌ used in purely arithmetic and logical computations. This‌ is a perfectly reasonable‌ choice as addresses are,‌‌ in fact, integral quantities. However, in many cases,‌ there is already a‌ fundamental difference between addresses‌‌ and integer data: Their width. As computer systems‌ moved from 16 to‌ 32, then to 64-bit‌‌ pointers, with a potential future where 128-bit might‌ be used for specific‌ systems, the data width‌‌ required to compute a given output with a‌ given algorithm has remained‌ the same, e.g., an‌‌ ASCII character is still represented on a byte.‌ This work aims to‌ leverage this dichotomy to‌‌ revisit hardware clustering, a well known microarchitectural technique‌ used to mitigate the‌ cost of scaling processor‌‌ backend structures by dividing the backend into several‌ mostly independent execution clusters.‌ We show that by‌‌ treating instructions as manipulating addresses or data and‌ steering them to a‌ ”data” or an ”address”‌‌ cluster accordingly, reasonable cluster load balancing can be‌ achieved without the need‌ for complex steering policies‌‌ that can lead to performance on par with‌ the baseline with limited‌ hardware overhead. Moreover, we‌‌ highlight two possible optimizations stemming from this distribution.‌ First, the registers of‌ the ”address” cluster can‌‌ easily be compressed thanks to address spatial and‌ temporal locality. Second, if‌ a processor requires a‌‌ large address space but only processes narrow data‌ (e.g., 32-bit data with‌ 64-bit pointers or 64-bit‌‌ data with 128-bit pointers), the ”data” cluster datapath‌ can be kept narrower‌ than the ”address” cluster‌‌ datapath.

Published as "Address/data instruction steering in clustered‌ general purpose processors". ACM‌ Transactions on Architecture and‌‌ Code Optimization, 22(3), 1-24, 2025 3.

9‌ Bilateral contracts and grants‌ with industry

9.1 Bilateral‌‌ contracts with industry

HPDCache Hardening

Participants: César Fuguet‌, Frédéric Pétrot.‌

Contract with Thales, 130‌‌ k€ (Floralis), 1/11/2025-29/2/2026

One of the major concerns‌ in aerospace applications, such‌ as the ones targeted‌‌ by the SCAF project,‌ is that processor-based devices are subject to radiation,‌ which may affect their functioning. A common consequence‌ of radiation is bit-flips in the embedded (on-chip)‌ memories: cache or scratchpad memories. Indeed, memories occupy‌ more than 50% of modern processor-based chips to‌ reduce the memory traffic to external DRAMs, which‌ suffer from low bandwidth and high latency. This‌ important footprint makes memories more vulnerable to radiation-induced‌ soft errors. The goal of this project, titled‌ “Hardening of the HPDcache for RISC-V processors”, is‌ to introduce fault-tolerant mechanisms in the HPDcache integrated‌ to the CVA6 processor. The CVA6 is an‌ open-source RISC-V processor, whose development is driven by‌ Thales under the umbrella of the OpenHW Group,‌ whilst the HPDcache is an open-source high-performance L1‌ Data Cache for RISC-V processors, initially developed by‌ CEA and now part of the OpenHW Group‌ IP portfolio. Specifically, this project has three objectives:‌ (1) we aim to introduce error correcting codes‌ (ECCs) in embedded SRAMs (Static Random Access Memories)‌ of the HPDcache. We plan to use SECDED‌ (Single Error Correction, Double Error Detection) codes; (2)‌ we aim to introduce a periodic memory scrubbing‌ mechanism to limit the probability of multi-bit errors‌ in the HPDcache SRAMs; (3) we aim to‌ introduce necessary modifications in the interface from the‌ CVA6 towards next level of cache or the‌ main memory to support error detection and correction‌ in memory transactions.

9.2 Bilateral Grants with Industry‌

“Back to the Future” – Predicting Hard to‌ Predict Branches

Participants: Arthur Perais, Yiannakis Sazeides‌ (University of Cyprus), Ioannis Constantinou (University of‌ Cyprus).

Intel Grant, 10 k€ (UGA) and‌ 150k€ (University of Cyprus) 1/9/2025-31/7/2026

Work on branch‌ prediction for high performance processor, with a focus‌ on hard to predict branches, in collaboration with‌ the University of Cyprus.

10 Partnerships and cooperations‌

10.1 International initiatives

Cooperation with University of Santa‌ Barbara

Participants: Davy Million (CEA), César Fuguet‌, Frédéric Pétrot, Adrian Evans (CEA),‌ Jonathan Balkind (University of Santa-Barbara).

1/9/2022 -‌ 31/12/2026

Cooperation between UCSB, CEA, UGA ans Inria‌ around contributions within the OpenPiton framework developed at‌ UCSB. Work on Network on Chip and heterogeneous‌ computing for chiplet based SoC, including routing algorithms,‌ CPU/GPU tile, ... Also actual implementation on large‌ FPGA fabrics hosted at CEA. Goal is to‌ deliver an opensource hardware/software platform. PhD grant for‌ Davy Million by CEA.

Cooperation with Barcelona SuperComputing‌ Center

Participants: Andrei Ilin, César Fuguet,‌ Frédéric Pétrot, Lluc Alvarez (Barcelona SuperComputing Center)‌.

1/9/2025 - 31/8/2028

Cosupervision of Andrei Ilin‌ PhD with BSC. PhD grant through UGA Idex‌ call. The work is focused on chiplet based‌ cache coherency, including optimization for CPU/GPU workloads. It‌ takes place at the

Cooperation with University of‌ Cyprus

Participants: Arthur Perais, Yiannakis Sazeides (University‌ of Cyprus), Ioannis Constantinou (University of Cyprus)‌.

1/3/2025 - 31/8/2025

Master student (Ioannis Constantinou)‌ cosupervision on branch prediction 10. Intel grant.

Cooperation with ETH Zurich‌ and University of Bologna‌

Participants: César Fuguet,‌‌ Riccardo Tedeschi (University of Bologna), Davide Rossi‌ (University of Bologna),‌ Luca Benini (ETH Zurich‌‌ and University of Bologna).

Since 2024

Informal‌ research cooperation around evolution‌ of the CVA6 memory‌‌ interface and memory hierarchy.

Cooperation with University of‌ Murcia

Participants: Arthur Perais‌, Alberto Ros (Universidad‌‌ de Murcia), Alexandra Jimborean (Universidad de Murcia)‌, Sawan Singh (Universidad‌ de Murcia), Ravikiran‌‌ Ravindranath Reddy (Universidad de Murcia).

Informal research‌ cooperation around high performance‌ processor microarchitecture since 2021‌‌ within the framework of Alberto Ros ERC grant.‌ Research is about speculation‌ for branches, memory accesses,‌‌ addresses, values, ...

10.2 European initiatives

10.2.1 Horizon‌ Europe

EdgeAI

Participants: Ana‌ Pinzari, Frédéric Pétrot‌‌.

Huge European project (35 M€), with 48‌ partners, funding of 200‌ k€ for Grenoble INP‌‌ - UGA. Our contributions are on power efficient‌ neural networks for grapewine‌ diseases detection, using a‌‌ mix of hardware and software solution. French partners‌ are STMicroelectronics, Pommery, and‌ Université Reims Champagne Ardennes.‌‌

10.3 National initiatives

PEPR IA Holigrail Project

Participants:‌ Abdallah Meebed, Van‌ Quan Pham, Olivier‌‌ Romane, Adrien Prost-Boucle, Olivier Muller,‌ Frédéric Pétrot.

Grant:‌ 900 k€, Grenoble INP‌‌ - UGA

We contribute to the Holigrail project‌ with work on design‌ and implementation of highgly‌‌ quantized neural networks, including pruning and compression issues.‌

PEPR Cloud Archi-CESAM Project‌

Participants: Arthur Perais,‌‌ Louka Yerly.

Grant: 150 k€, Grenoble INP‌ - UGA

We contribute‌ to the Archi-CESAM project‌‌ by characterizing the differences between user and kernel‌ code at the microarchitectural‌ level.

Défi Inria Cocorisco‌‌

Participants: Julie Dumas, Arthur Perais, Johan‌ Söderström, Nevena Vasilevska‌.

Grant: 300 k€,‌‌ Inria

We contribute both on HW/SW interaction to‌ improve the performance of‌ distributed garbage collection systems‌‌ and to improve the performance of multithreaded programs‌ through software-hinted management of‌ the hardware coherency mechanism.‌‌ Arthur Perais is co-PI with O. Sentiyeis (TARAN‌ team at IRISA)

11‌ Dissemination

Participants: Liliana Andrade‌‌, Julie Dumas, César Fuguet, Olivier‌ Muller, Arthur Perais‌, Frédéric Pétrot,‌‌ Laurence Pierre, Frédéric Rousseau.

11.1 Promoting‌ scientific activities

11.1.1 Scientific‌ events: organisation

Chair of‌‌ conference program committees

Frédéric Pétrot, program chair of‌ the 40th Conference on‌ Design of Circuits and‌‌ Integrated Systems

Member of the conference program committees‌

Arthur Perais, member of‌ ISCA, MICRO and HPCA‌‌ program commitee
César Fuguet, member of CF, ICCD,‌ ISCA program commitee
Laurence‌ Pierre, member of DATE‌‌ technical program commitee
Frédéric Pétrot, member of DSD‌ technical program commitee
Liliana‌ Andrade, member of DSD,‌‌ MovVe4SPS workshop (CPS-IoT week) and COMPAS program commitee‌

Member of steering committees‌

Arthur Perais, steering committee‌‌ chair for the ARCHI winter school
Liliana Andrade,‌ steering committee for the‌ FETCH winter school

11.1.2‌‌ Journal

Member of the editorial boards

Frédéric Pétrot,‌ co-guest-editor for a special‌ issue of Elsevier Microprocessors‌‌ and Microsystems.

Reviewer -‌ reviewing activities

Frédéric Pétrot, reviewer for IEEE Transactions‌ on Computer Aided Design of Circuits and Systems.‌
César Fuguet, reviewer for IEEE Transactions on Computer‌ Aided Design of Circuits and Systems and Elsevier‌ Microprocessors and Microsystems.
Liliana Andrade, reviewer for IEEE‌ Transactions on Computer Aided Design of Circuits and‌ Systems.
Arthur Perais, reviewer for ACM Transactions on‌ Architecture and Code Optimization and IEEE Computer Architecture‌ Letters
Frédéric Rousseau, reviewer for Elsevier Microprocessors and‌ Microsystems.

11.1.3 Leadership within the scientific community

César‌ Fuguet, co-chair of the Interconnection Task Group within‌ the OpenHW Foundation.
Arthur Perais, co-coordinator of the‌ "High-Performance Embedded Computing" track of GDR SOC²
Frédéric‌ Pétrot, member of the steering committee of GDR‌ SOC²

11.1.4 Research administration

Laurence Pierre is member‌ of the UGA IM2AG Reseach Commission.
Laurence Pierre‌ is member of the IM2AG Concil on education‌ and research (UFR)
Frédéric Rousseau is head of‌ the EEATS Doctoral School HdR council (ED220)
Frédéric‌ Pétrot is member of the MSTII Doctoral School‌ HdR council (ED217)
Liliana Andrade is member of‌ the Polytech Grenoble School council
Julie Dumas is‌ member of the Ensimag School council and restricted‌ council
Frédéric Pétrot is member of the Ensimag‌ School council and restricted council
Frédéric Rousseau is‌ member of Polytech restricted council
Liliana Andrade is‌ member of scientific council of TIMA laboratory
Frédéric‌ Rousseau is member of scientific council of TIMA‌ laboratory
Olivier Muller is member of TIMA laboratory‌ council and scientific council

11.2 Teaching - Supervision‌ - Juries - Educational and pedagogical outreach

11.2.1‌ Heavy teaching duties

Frédéric Rousseau, head of admission‌ at Polytech Grenoble
Liliana Andrade, head of 5th‌ year of the apprenticeship programme at Polytech Grenoble‌
Liliana Andrade, co-head of admission in the apprenticeship‌ programme at Polytech Grenoble
Julie Dumas, co-head of‌ "Ingénierie des systèmes d'information" programme at Ensimag
Liliana‌ Andrade, steering committee for the CFA FormaSup Auvergne-Rhône-Alped‌ and Polytech Grenoble - E2i Engineering Degree
Frédéric‌ Rousseau, steering committee for the CFA FormaSup Auvergne-Rhône-Alped‌ and Polytech Grenoble - E2i Engineering Degree
Olivier‌ Muller, responsible for student-entrepreneurship within Ensimag

Note that‌ 6 of the 8 members of the MADMAX‌ team are teachers, and they are responsibles for‌ modules with sometimes many students, which represents a‌ lot of work too.

11.2.2 Supervision

Student Name‌	Year	State	School	Supervisors
Ravenel Pierre	4A	(s)‌	MSTII	Pétrot Frédéric - Perais Arthur
Deshpande Chandana‌	4A	(s)	MSTII	Pétrot Frédéric - Perais Arthur‌
Isaac–Chassande Valentin	4A	(c)	EEATS	Rousseau Frédéric -‌ Durand Yves
Tomasi Ribeiro Eduardo	3A	(s)	MSTII‌	Pétrot Frédéric - Fuguet César
				Fabre Christian
Million‌ Davy	3A	(c)	MSTII	Pétrot Frédéric - Balkind‌ Jonathan
				Evans Adrian - Fuguet César
Romane Olivier‌	3A	(c)	MSTII	Pétrot Frédéric
				Muller Olivier
				Prost-Boucle‌ Adrien
Yerly Louka	2A	(c)	MSTII	Pétrot Frédéric‌ - Perais Arthur
Pham Van Quan	2A	(c)‌	MSTII	Pétrot Frédéric - Prost-Boucle Adrien
				Muller Olivier‌
McGovern Killian	2A	(c)	EEATS	Rousseau Frédéric -‌ Charles Henri-Pierre
Vasilevska Nevena	2A	(c)	IP Paris	Thomas Gaël - Dumas‌ Julie
				Derumigny Nicolas
Dubois‌ Jules	1A	(c)	EEATS‌‌	Rousseau Frédéric - Evans Adrian
				Guthmuller Eric
Söderström‌ Johan	1A	(c)	MSTII‌	Pétrot Frédéric - Dumas‌‌ Julie - Perais Arthur
Meebed Abdallah	1A	(c)‌	MSTII	Pétrot Frédéric -‌ Muller Olivier
Ilin Andrei‌‌	1A	(c)	MSTII	Pétrot Frédéric - Fuguet César‌
Sartori Dorian	1A	(c)‌	EEATS	Rousseau Frédéric -‌‌ Fuguet César

(s) means PhD defended during the‌ period, (c) means currently‌ pursuing the PhD

11.2.3‌‌ Juries

Frédéric Pétrot, reviewer of Aurélie Saulquin PhD‌ thesis, Cristal, Lille, France,‌ November 2025.

12 Scientific‌‌ production

12.1 Major publications

1 articleC. S.‌Chandana S. Deshpande,‌ A.Arthur Perais and‌‌ F.Frédéric Pétrot. Address/Data Instruction Steering in‌ Clustered General Purpose Processors‌.ACM Transactions on‌‌ Architecture and Code Optimization223September 2025‌, 1-24HAL DOI‌
2 articleD.Davy‌‌ Million, C.César Fuguet, A.Adrian‌ Evans, R.Rim‌ El Cheikh, A.‌‌Alireza Monemi, J.Jonathan Balkind and F.‌Frédéric Pétrot. Depth-first:‌ A deterministic and scalable‌‌ NoC routing protocol for 3.5D packaged architectures.‌IEEE Journal on Emerging‌ and Selected Topics in‌‌ Circuits and SystemsJuly 2025HAL DOI

12.2‌ Publications of the year‌

International journals

3 article‌‌C. S.Chandana S. Deshpande, A.Arthur‌ Perais and F.Frédéric‌ Pétrot. Address/Data Instruction‌‌ Steering in Clustered General Purpose Processors.ACM‌ Transactions on Architecture and‌ Code Optimization223‌‌September 2025, 1-24HAL DOI back to‌ text
4 articleD.‌Davy Million, C.‌‌César Fuguet, A.Adrian Evans, R.‌Rim El Cheikh,‌ A.Alireza Monemi,‌‌ J.Jonathan Balkind and F.Frédéric Pétrot.‌ Depth-first: A deterministic and‌ scalable NoC routing protocol‌‌ for 3.5D packaged architectures.IEEE Journal on‌ Emerging and Selected Topics‌ in Circuits and Systems‌‌July 2025HAL DOIback to text

International‌ peer-reviewed conferences

5 inproceedings‌Z.Zexin Fu,‌‌ R.Riccardo Tedeschi, G.Gianmarco Ottavi,‌ N.Nils Wistoff,‌ C.César Fuguet,‌‌ D.Davide Rossi and L.Luca Benini.‌ Ramping Up Open-Source RISC-V‌ Cores: Assessing the Energy‌‌ Efficiency of Superscalar, Out-of-Order Execution.CF '25:‌ Proceedings of the 22nd‌ ACM International Conference on‌‌ Computing FrontiersCF 2025 - 22nd ACM International‌ Conference on Computing Frontiers‌Cagliari, Sardinia, ItalyACM‌‌July 2025, 12-20HAL DOI back to‌ text back to text‌
6 inproceedingsG.Golnaz‌‌ Korkian, N.Neiel Leyva, A.Arnau‌ Bigas, N.Noelia‌ Oliete-Escuín, A.Abbas‌‌ Haghi, A.Alireza Monemi, C.César‌ Fuguet and L.Lluc‌ Alvarez. FetchFlare: An‌‌ Open-Source Strided Data Prefetcher for High-Performance Cache Hierarchies‌.2025 28th Euromicro‌ Conference on Digital System‌‌ Design (DSD)2025 28th Euromicro Conference on Digital‌ System Design (DSD)Salerno,‌ ItalyIEEEDecember 2025‌‌, 276-284HAL DOIback to text back‌ to text
7 inproceedings‌F.Frédéric Pétrot and‌‌ C.César Fuguet.‌ On the Hardware Implementation of Lala's 64-bit SEcDED‌ Codes.2025 IEEE International Symposium on Defect‌ and Fault Tolerance in VLSI and Nanotechnology Systems‌ (DFT)2025 IEEE International Symposium on Defect and‌ Fault Tolerance in VLSI and Nanotechnology Systems (DFT)‌Barcelona, SpainIEEENovember 2025, 1-4HAL‌DOI back to text
8 inproceedingsE.Eduardo‌ Tomasi, C.César Fuguet, C.Christian‌ Fabre and F.Frédéric Pétrot. HPC Workload‌ Analysis Using Distributed Cross-ISA Binary Instrumentation.40th‌ Conference on Design of Circuits and Integrated Systems‌2025 40th Conference on Design of Circuits and‌ Integrated Systems (DCIS)Santander, SpainIEEEDecember 2025‌, 144-149HAL DOIback to text back‌ to text
9 inproceedingsE.Eduardo Tomasi Ribeiro‌, C.César Fuguet, C.Christian Fabre‌ and F.Frédéric Pétrot. Hardware-software co-design for‌ supporting shared distributed virtual memory.CF '25‌ Companion: Proceedings of the 22nd ACM International Conference‌ on Computing Frontiers: Workshops and Special SessionsCF‌ 2025 - 22nd ACM International Conference on Computing‌ Frontiers: Workshops and Special SessionsCagliari, ItalyACM‌ (Association for Computing Machinery)2025, 58-61HAL‌DOI back to text

Reports & preprints

10‌ reportI.Ioannis Constantinou, A.Arthur Perais‌ and Y.Yiannakis Sazeides. The Non-Predictability of‌ Mispredicted Branches using Timing Information.University of‌ CyprusJanuary 2026HALback to text

Other‌ scientific publications

11 inproceedingsZ.Zachary Assoumani,‌ C.César Fuguet, R.Radu Mateescu and‌ W.Wendelin Serwe. On Benefits of Modeling‌ the HPDcache in LNT.RISC-V 2025 -RISC-V‌ Summit EuropeParis, France2025, 1-1HAL‌back to text
12 articleE.Eric Guthmuller‌, C.César Fuguet, A.Andrea Bocco‌, J.Jérome Fereyre, A.Adrian Evans‌, Y.Yves Durand and J.Jérôme Fereyre‌. Variable and extended precision (VRP) accelerator implemented‌ in a 22 nm SoC.Electronics Letters‌612025, ell2.70255HAL DOI back to‌ text
13 inproceedingsR.Riccardo Tedeschi, G.‌Gianmarco Ottavi, C.Côme Allart, N.‌Nils Wistoff, Z.Zexin Fu, F.‌Filippo Grillotti, F.Fabio de Ambroggi,‌ E.Elio Guidetti, J.-B.Jean-Baptiste Rigaud,‌ O.Olivier Potin, J. R.Jean Roch‌ Coulon, C.César Fuguet, L.Luca‌ Benini and D.Davide Rossi. CVA6S+: A‌ Superscalar RISC-V Core with High-Throughput Memory Architecture.‌RISC-V Summit Europe 2025Paris, FrancearXiv2025‌HAL DOI back to text

MADMAX - 2025

MADMAX - 2025

2025Activity report​​​‌Project-TeamMADMAX

Keywords​‌﻿﻿

Computer Science and Digital​​﻿﻿ Science

Other Research Topics​​​‌ and Application Domains

1 Team​‌﻿﻿ members, visitors, external collaborators​​﻿﻿

Research Scientists

Faculty Members

PhD Students​​﻿﻿

Administrative Assistant​​﻿﻿

2 Overall objectives﻿​﻿﻿

3﻿​﻿﻿ Research program

4 Application domains

5 Social and environmental﻿﻿﻿‌ responsibility

5.1 Footprint of research﻿‌​‌ activities

5.2 Impact of research﻿﻿﻿‌ results

6﻿﻿﻿‌ Highlights of the year﻿‌​‌

6.1 Awards

7 Latest​​​‌ software developments, platforms, open﻿​﻿﻿ data

7.1 Latest software​‌﻿﻿ developments

7.1.1 cv-hpdcache

7.1.2 qemu-riscv128﻿​﻿﻿

7.1.3 rv128-toolchain​‌﻿﻿

8 New results​‌﻿﻿

CVA6S+: A Superscalar RISC-V​​﻿﻿ Core with High-Throughput Memory​​​‌ Architecture

Depth-first: A deterministic﻿﻿﻿‌ and scalable NoC routing﻿‌​‌ protocol for 3.5D packaged﻿​​﻿ architectures

Hardware-software co-design﻿​​﻿ for supporting shared distributed​​​‌ virtual memory

Variable and​‌﻿﻿ extended precision (VRP) accelerator​​﻿﻿ implemented in a 22​​​‌ nm SoC

On Benefits​‌﻿﻿ of Modeling the HPDcache​​﻿﻿ in LNT

HPC Workload﻿​﻿﻿ Analysis Using Distributed Cross-ISA​‌﻿﻿ Binary Instrumentation

On the﻿﻿﻿‌ Hardware Implementation of Lala's﻿‌​‌ 64-bit SEcDED Codes

Ramping﻿​​﻿ Up Open-Source RISC-V Cores:​​​‌ Assessing the Energy Efficiency﻿﻿﻿‌ of Superscalar Out-of-Order Execution﻿‌​‌

FetchFlare:﻿​﻿﻿ An Open-Source Strided Data​‌﻿﻿ Prefetcher for High-Performance Cache​​﻿﻿ Hierarchies

Address/Data Instruction﻿​​﻿ Steering in Clustered General​​​‌ Purpose Processors

9​​​‌ Bilateral contracts and grants﻿﻿﻿‌ with industry

9.1 Bilateral﻿‌​‌ contracts with industry

HPDCache﻿​​﻿ Hardening

9.2﻿​﻿﻿ Bilateral Grants with Industry​‌﻿﻿

“Back to the Future”​​﻿﻿ – Predicting Hard to​​​‌ Predict Branches

10 Partnerships and cooperations​‌﻿﻿

10.1 International initiatives

Cooperation​​﻿﻿ with University of Santa​​​‌ Barbara

Cooperation with Barcelona SuperComputing​‌﻿﻿ Center

Cooperation with University of​​​‌ Cyprus

Cooperation with ETH Zurich​​​‌ and University of Bologna﻿﻿﻿‌

Cooperation with University of​​​‌ Murcia

10.2﻿​​﻿ European initiatives

10.2.1 Horizon​​​‌ Europe

EdgeAI

10.3 National initiatives

PEPR﻿​​﻿ IA Holigrail Project

PEPR Cloud Archi-CESAM Project﻿﻿﻿‌

Défi Inria Cocorisco﻿‌​‌

11﻿﻿﻿‌ Dissemination

11.1 Promoting​​​‌ scientific activities

11.1.1 Scientific﻿﻿﻿‌ events: organisation

Chair of﻿‌​‌ conference program committees

Member of﻿​​﻿ the conference program committees​​​‌

Member of steering committees﻿﻿﻿‌

11.1.2﻿‌​‌ Journal

Member of the﻿​​﻿ editorial boards

Reviewer -​​​‌ reviewing activities

11.1.3 Leadership within​​﻿﻿ the scientific community

11.1.4 Research administration﻿​﻿﻿

11.2 Teaching - Supervision​​​‌ - Juries - Educational﻿​﻿﻿ and pedagogical outreach

11.2.1​‌﻿﻿ Heavy teaching duties

11.2.2 Supervision

11.2.3﻿‌​‌ Juries

12 Scientific﻿‌​‌ production

12.1 Major publications﻿​​﻿

12.2​​​‌ Publications of the year﻿﻿﻿‌

International journals

International​​​‌ peer-reviewed conferences

Reports & preprints

Other​​​‌ scientific publications

2025Activity report‌Project-TeamMADMAX

Keywords‌

Computer Science and Digital Science

Other Research Topics‌ and Application Domains

1 Team‌ members, visitors, external collaborators

PhD Students

Administrative Assistant

2 Overall objectives

3 Research program

5 Social and environmental‌ responsibility

5.1 Footprint of research‌‌ activities

5.2 Impact of research‌ results

6‌ Highlights of the year‌‌

7 Latest‌ software developments, platforms, open data

7.1 Latest software‌ developments

7.1.2 qemu-riscv128

7.1.3 rv128-toolchain‌

8 New results‌

CVA6S+: A Superscalar RISC-V Core with High-Throughput Memory‌ Architecture

Depth-first: A deterministic‌ and scalable NoC routing‌‌ protocol for 3.5D packaged architectures

Hardware-software co-design for supporting shared distributed‌ virtual memory

Variable and‌ extended precision (VRP) accelerator implemented in a 22‌ nm SoC

On Benefits‌ of Modeling the HPDcache in LNT

HPC Workload Analysis Using Distributed Cross-ISA‌ Binary Instrumentation

On the‌ Hardware Implementation of Lala's‌‌ 64-bit SEcDED Codes

Ramping Up Open-Source RISC-V Cores:‌ Assessing the Energy Efficiency‌ of Superscalar Out-of-Order Execution‌‌

FetchFlare: An Open-Source Strided Data‌ Prefetcher for High-Performance Cache Hierarchies

Address/Data Instruction Steering in Clustered General‌ Purpose Processors

9‌ Bilateral contracts and grants‌ with industry

9.1 Bilateral‌‌ contracts with industry

HPDCache Hardening

9.2 Bilateral Grants with Industry‌

“Back to the Future” – Predicting Hard to‌ Predict Branches

10 Partnerships and cooperations‌

Cooperation with University of Santa‌ Barbara

Cooperation with Barcelona SuperComputing‌ Center

Cooperation with University of‌ Cyprus

Cooperation with ETH Zurich‌ and University of Bologna‌

Cooperation with University of‌ Murcia

10.2 European initiatives

10.2.1 Horizon‌ Europe

PEPR IA Holigrail Project

PEPR Cloud Archi-CESAM Project‌

Défi Inria Cocorisco‌‌

11‌ Dissemination

11.1 Promoting‌ scientific activities

11.1.1 Scientific‌ events: organisation

Chair of‌‌ conference program committees

Member of the conference program committees‌

Member of steering committees‌

11.1.2‌‌ Journal

Member of the editorial boards

Reviewer -‌ reviewing activities

11.1.3 Leadership within the scientific community

11.1.4 Research administration

11.2 Teaching - Supervision‌ - Juries - Educational and pedagogical outreach

11.2.1‌ Heavy teaching duties

11.2.3‌‌ Juries

12 Scientific‌‌ production

12.1 Major publications

12.2‌ Publications of the year‌

International‌ peer-reviewed conferences

Other‌ scientific publications