IROKO

IROKO - 2025

2025Activity‌ reportProject-TeamIROKO

RNSR: 202424577P

Research center Inria‌ Branch at the University of Montpellier
In partnership‌ with:Université de Montpellier
Team name: Data Driven‌ Environmental Sciences
In collaboration with:Laboratoire d'informatique, de‌ robotique et de microélectronique de Montpellier (LIRMM), Institut‌ Montpelliérain Alexander Grothendieck (IMAG)

Creation of the Project-Team:‌ 2024 October 01

Each year, Inria research teams‌ publish an Activity Report presenting their work and‌ results over the reporting period. These reports follow‌ a common structure, with some optional sections depending‌ on the specific team. They typically begin by‌ outlining the overall objectives and research programme, including‌ the main research themes, goals, and methodological approaches.‌ They also describe the application domains targeted by‌ the team, highlighting the scientific or societal contexts‌ in which their work is situated.

The reports‌ then present the highlights of the year, covering‌ major scientific achievements, software developments, or teaching contributions.‌ When relevant, they include sections on software, platforms,‌ and open data, detailing the tools developed and‌ how they are shared. A substantial part is‌ dedicated to new results, where scientific contributions are‌ described in detail, often with subsections specifying participants‌ and associated keywords.

Finally, the Activity Report addresses‌ funding, contracts, partnerships, and collaborations at various levels,‌ from industrial agreements to international cooperations. It also‌ covers dissemination and teaching activities, such as participation‌ in scientific events, outreach, and supervision. The document‌ concludes with a presentation of scientific production, including‌ major publications and those produced during the year.‌

Keywords

Computer Science and Digital Science

A3.1. Data‌
A3.1.2. Data management, quering and storage
A3.1.3. Distributed‌ data
A3.1.4. Uncertain data
A3.1.5. Control access, privacy‌
A3.1.7. Open data
A3.1.8. Big data (production, storage,‌ transfer)
A3.1.9. Database
A3.1.10. Heterogeneous data
A3.1.11. Structured‌ data
A3.2. Knowledge
A3.2.2. Knowledge extraction, cleaning
A3.3.‌ Data and knowledge analysis
A3.3.2. Data mining
A3.3.3.‌ Big data analysis
A3.4. Machine learning and statistics‌
A5.2. Data visualization
A5.3. Image processing and analysis‌
A5.3.3. Pattern recognition
A9. Artificial intelligence
A9.2. Machine‌ learning
A9.2.1. Supervised learning
A9.2.3. Reinforcement learning
A9.2.6.‌ Neural networks
A9.2.8. Deep learning
A9.12.3. Content retrieval‌

1 Team members, visitors, external collaborators

Research Scientists‌

Florent Masseglia [Team leader, INRIA,‌ Senior Researcher, HDR]
Reza Akbarinia [INRIA, Researcher,‌ HDR]
Christophe Botella‌ [INRIA, ISFP‌‌]
Benjamin Bourel [INRIA, Researcher]‌
Hervé Goëau [CIRAD‌, Researcher]
Alexis‌‌ Joly [INRIA, Senior Researcher, HDR‌]
Fabio Andre Machado‌ Porto [INRIA,‌‌ Senior Researcher, from Apr 2025 until Jun‌ 2025]
Christophe Pradal‌ [CIRAD, Researcher‌‌, from Mar 2025]
Maxime Ryckewaert [‌INRIA, Starting Research‌ Position, until Mar‌‌ 2025]
Joseph Salmon [UNIV MONTPELLIER,‌ Professor Detachement, HDR‌]
Patrick Valduriez [‌‌INRIA, Emeritus, HDR]

Faculty Members‌

Esther De Castro Pacitti‌ [UNIV MONTPELLIER,‌‌ Professor, from Sep 2025, HDR]‌
François Munoz [UGA‌, Associate Professor,‌‌ from Mar 2025]
Maximilien Servajean [LIRMM‌, Associate Professor Delegation‌, from Feb 2025‌‌ until Aug 2025]

Post-Doctoral Fellows

Aimie Berger‌ Dauxere [INRAE]‌
Jean-Baptiste Fermanian [UNIV‌‌ MONTPELLIER, Post-Doctoral Fellow, until Sep 2025‌]
Ilyass Moummad [‌INRIA, Post-Doctoral Fellow‌‌, from Mar 2025]
Lukas Picek [‌INRIA, Post-Doctoral Fellow‌, until Oct 2025‌‌]
Rebecca Pontes Salles [INRIA, Post-Doctoral‌ Fellow, until Nov‌ 2025]

PhD Students‌‌

Raphael Benerradi [UNIV MONTPELLIER]
Matteo Contini‌ [IFREMER, until‌ Oct 2025]
Guillaume‌‌ Coulaud [UNIV MONTPELLIER]
Lo'Ai Gandeel [‌INRIA, from Jul‌ 2025]
Sebastien Gigot-Leandri‌‌ [CNRS]
Théo Larcher [UNIV MONTPELLIER‌]
Cesar Leblanc [‌INRIA, until Oct‌‌ 2025]
Alex Maleknia [UNIV MONTPELLIER,‌ from Nov 2025]‌
Giulio Martellucci [INRAE‌‌, from Jun 2025]
Kawtar Zaher [‌INA, CIFRE]‌

Technical Staff

Antoine Affouard‌‌ [INRIA, Engineer]
Mathias Chouet [‌CIRAD]
Hugo Gresse‌ [INRIA, Engineer‌‌, until May 2025]
Benoit Lange [‌INRIA, Engineer]‌
Pierre Leroy [INRIA‌‌, Engineer]
Thomas Paillot [INRIA,‌ Engineer, from Sep‌ 2025]
Thomas Paillot‌‌ [INRIA, until Aug 2025]
Remi‌ Palard [CIRAD,‌ Engineer, from Nov‌‌ 2025]
Remi Palard [CIRAD, until‌ Oct 2025]
Lukas‌ Picek [INRIA,‌‌ Engineer, from Nov 2025]
Rebecca Pontes‌ Salles [INRIA,‌ Engineer, from Dec‌‌ 2025]
Julien Thomazo [LIRMM, Engineer‌, from Jun 2025‌ until Sep 2025]‌‌
Jozef Ba Tran [INRIA, Engineer,‌ from Nov 2025]‌
Axel Vaillant [INRIA‌‌, Engineer, until Feb 2025]

Interns‌ and Apprentices

Bronislav Abadie‌ [UNIV MONTPELLIER,‌‌ Intern, until Jun 2025]
Marion Cann‌ [INRIA, Intern‌, from Sep 2025‌‌ until Oct 2025]
Marion Cann [INRIA‌, Intern, from‌ Jun 2025 until Aug‌‌ 2025]
Raphael Lemarie [INRIA, Intern‌, from Jun 2025‌ until Jul 2025]‌‌
Massilya Raked [INRIA‌, Intern, from Mar 2025 until Jul‌ 2025]

Administrative Assistant

Anouk Renaud [INRIA‌, from Dec 2025]

Visiting Scientist

Diletta‌ Santovito [UNIV BOLOGNE, until Apr 2025‌]

External Collaborators

Fabio Andre Machado Porto [‌LNCC-PETROPOLIS, from Aug 2025]
Fabio Andre‌ Machado Porto [LNCC-PETROPOLIS, until Mar 2025‌]
Jean Marc Sadaillan [INRAE]
Jozef‌ Ba Tran [UNIV MONTPELLIER, from Mar‌ 2025 until Sep 2025]

2 Overall objectives‌

Environmental sciences combine various scientific disciplines to understand‌ and address critical environmental issues such as climate‌ change, pollution and biodiversity loss, and to develop‌ sustainable solutions to preserve the planet's ecosystems and‌ resources. Today, the increasing production of observation and‌ experimentation data in environmental sciences requires advanced data‌ science skills and tools to manage, analyze, and‌ interpret large-scale and complex datasets and make sense‌ of it. Data science focuses on extracting insights‌ from data through pattern identification, outcome prediction, and‌ process optimization. It is an interdisciplinary science that‌ relies on well-established research fields such as machine‌ learning, statistics, data mining, and data management, which‌ need to work in synergy.

Iroko 1 advocates‌ an interdisciplinary scientific approach to address the challenges‌ of environmental sciences by using and improving data‌ science. This approach should have a high impact‌ on both data science, by proposing new solutions‌ and new systems, and on environmental sciences, by‌ contributing to findings applied to real use cases‌ in biodiversity, agriculture and one-health.

The team’s research‌ focuses on the intersection of data science and‌ environmental sciences.

Data science is an interdisciplinary field‌ that utilizes scientific methods, processes, algorithms, and systems‌ to extract knowledge and insights from structured and‌ unstructured data. It combines various disciplines, including statistics,‌ computer science, and domain expertise, to tackle complex‌ problems and make data-driven decisions. The ultimate goal‌ of data science is to discover patterns and‌ trends, predict future outcomes, and optimize processes through‌ the analysis of vast amounts of data.

Deep‌ learning, a kind of machine learning, plays a‌ crucial role in data science. It employs artificial‌ neural networks, specifically deep neural networks, which imitate‌ the human brain's structure and functionality. Deep learning‌ algorithms extract knowledge from data using multiple layers‌ of abstraction, allowing them to identify patterns and‌ generate highly accurate predictions. These methods have been‌ effectively utilized in various applications such as image‌ and speech recognition, natural language processing, and recommendation‌ systems.

Data mining is another crucial aspect of‌ data science. It is the process of discovering‌ previously unknown, valid, and potentially useful patterns in‌ large datasets. Data mining techniques include clustering, classification,‌ association rule learning, and anomaly detection, among others.‌ These methods enable data scientists to gain insights‌ and identify trends, relationships, and dependencies within the‌ data, which can be used to inform decision-making‌ initiatives.

Time series analysis is a valuable method‌ in data science, focusing on ordered, often time-stamped data points. Key aspects‌ include comparing different time‌ series using techniques like‌‌ cross-correlation and dynamic time warping, and detecting anomalies‌ with statistical tests or‌ machine learning algorithms. Pattern‌‌ recognition in time series analysis aims to find‌ recurring motifs or sub-sequences,‌ helping to discover underlying‌‌ structures in the data. By identifying patterns and‌ anomalies, data scientists can‌ better understand system dynamics,‌‌ predict future behavior, and make informed decisions across‌ various domains.

Ultimately, models‌ are central not only‌‌ in data science but also in environmental sciences.‌ While machine learning models‌ are central in data‌‌ science, mechanistic models (mathematical, physical and process-based models)‌ allow us to capture‌ at different scales the‌‌ scientific knowledge of different disciplines (e.g.,‌ soil, plant, atmosphere, disease)‌ in order to simulate‌‌ the behavior of complex systems and to predict‌ their behavior under different‌ scenarios.

Environmental sciences encompass‌‌ a diverse range of disciplines that focus on‌ understanding the complex relationships‌ between humans and the‌‌ natural world. By studying the Earth's ecosystems, climate,‌ and resources, environmental scientists‌ address critical issues such‌‌ as climate change, pollution, habitat loss, and biodiversity‌ conservation. This multidisciplinary field‌ combines knowledge from areas‌‌ such as biology, chemistry, geology, meteorology, physics, and‌ agronomy to provide a‌ comprehensive understanding of the‌‌ environment and the challenges it faces. In addition‌ to understanding the Earth's‌ physical processes, environmental scientists‌‌ also investigate the ecological and social dimensions of‌ environmental problems, recognizing that‌ human well-being is intricately‌‌ linked to the health of ecosystems.

The primary‌ objective of environmental sciences‌ is to develop sustainable‌‌ solutions to preserve and protect the planet's ecosystems‌ and resources for present‌ and future generations. This‌‌ includes the conservation of biodiversity, which is essential‌ for maintaining ecosystems stability,‌ resilience, and the provision‌‌ of valuable ecosystem services. Agronomy, the study of‌ agricultural production and soil‌ management, is another key‌‌ component of environmental sciences. By optimizing agricultural practices‌ and promoting sustainable land‌ use, agronomists help ensure‌‌ global food security while minimizing negative environmental impacts.‌

One Health is an‌ emerging approach in environmental‌‌ sciences that emphasizes the interconnectedness of human, animal,‌ and environmental health. It‌ recognizes that the health‌‌ of people, animals, and ecosystems are interdependent, and‌ that collaborative, interdisciplinary efforts‌ are needed to address‌‌ complex challenges such as zoonotic diseases, antimicrobial resistance,‌ and climate change. Environmental‌ scientists engaging in One‌‌ Health research collaborate with public health experts, veterinarians,‌ ecologists, and social scientists‌ to develop integrated solutions‌‌ that promote health and well-being across species and‌ ecosystems.

To achieve their‌ objectives, environmental scientists engage‌‌ in research, data analysis, and policy development to‌ inform decision-making processes. They‌ collaborate with industries, governments,‌‌ and communities to promote environmentally responsible practices and‌ policies. This involves conducting‌ environmental impact assessments, developing‌‌ strategies for climate change adaptation and mitigation, and‌ designing programs for habitat‌ restoration and species conservation.‌‌ Environmental sciences require interdisciplinary collaboration, critical thinking, and‌ ethical commitment to the‌ well-being of the environment‌‌ and future generations.

Unsurprisingly,‌ and as envisioned by the fourth paradigm of‌ discovery 84, data production and analysis have‌ become fundamental activities of environmental sciences. The generation‌ of data has increased exponentially, from remote sensing‌ satellites that monitor climate patterns and land use‌ changes, to biodiversity databases that track species distribution‌ and abundance. These vast datasets, provide unprecedented opportunities‌ for understanding and responding to environmental changes. However,‌ this influx of data also presents significant challenges.‌ It requires advanced tools and methodologies to store,‌ manage, and analyze data. Environmental scientists must therefore‌ develop data literacy skills, and there is a‌ growing need for specialists in environmental data science.‌ This calls for combining at least computer science,‌ statistics, and environmental sciences to derive meaningful insights‌ from complex, large-scale datasets.

Objectives:

The objectives of‌ the team are improving data science and contributing‌ to new findings in environmental sciences.

We‌ expect our impact to be measured by three‌ main aspects:

Academic recognition of our contributions. This‌ aspect should be assessed as usual.
The interdisciplinary‌ extent of our results. This may involve results‌ in ecology, biology, climatology, or any other science‌ in which we collaborate with those scientists. The‌ results obtained through this collaboration, which might not‌ have otherwise been obtained, are significant to us.‌ For example, it may be measuring a change‌ in biodiversity in a region, selecting and improving‌ plant varieties adapted to specific environmental conditions, or‌ identifying climate anomalies in global measurement history.
Our‌ impact in the real world. We hope that‌ our work will help humanity reduce its environmental‌ footprint and eventually slow down the course of‌ global warming. This could be done by, for‌ example, preserving biodiversity in a particular area, replacing‌ one type of crop with another, or avoiding‌ the overuse of antibiotics in animal agriculture, just‌ to name a few of the ways we‌ are currently working on.

3 Research program

Iroko‌ develops data science methods and systems to support‌ data-driven environmental sciences. Our research program is‌ organized around three tightly connected themes: (i) Big‌ Data and Scalability, (ii) Machine Learning with‌ Humans in the Loop, and (iii) Multiscale‌ & Multimodal Data Analytics. Across these themes,‌ we pursue three cross-cutting objectives: (1) make analyses‌ reusable and reproducible through well-engineered data/model/workflow services, (2)‌ make learning reliable and trustworthy by explicitly handling‌ bias and uncertainty, and (3) maximize impact through‌ open science (software, models, and FAIR data whenever‌ possible).

3.1 Big Data and Scalability

Unified data–model–workflow‌ services.

Environmental science pipelines increasingly combine heterogeneous data‌ (e.g., images, omics, epidemiology, climate) with heterogeneous models‌ (statistical, machine learning, mechanistic) and complex workflows. Yet‌ current solutions remain largely domain-specific and ad hoc,‌ making it difficult to connect artifacts, reproduce results,‌ and reuse components across projects. Our goal is‌ to provide integrated data and model management together‌ with workflow services that can interoperate with established‌ environments such as Galaxy 71 and OpenAlea 95, as well as‌ distributed execution engines. Building‌ on our LifeSWS initiative‌‌ 72, we aim to treat all scientific‌ artifacts (datasets, models, metadata,‌ workflow components, intermediate results)‌‌ as first-class citizens, searchable via catalogs and executable‌ through standardized interfaces. Compared‌ to existing platforms (the‌‌ closest being CyVerse 83), our focus is‌ on tighter integration of‌ model life-cycle management, provenance,‌‌ and caching to support end-to-end scientific investigations.

Scalable‌ time-series analytics at climate‌ scale.

Many environmental questions‌‌ rely on large collections of time series and‌ call for motif discovery,‌ clustering, and anomaly detection.‌‌ At scale, naive distributed adaptations can be inefficient‌ due to communication and‌ synchronization costs 100.‌‌ We will therefore design distribution-aware algorithms for time-series‌ analytics, with a particular‌ focus on anomaly detection‌‌ in large climate datasets using Matrix Profile ideas‌ 101, and on‌ implementations that can leverage‌‌ modern distributed infrastructures (e.g., Spark) without sacrificing usability‌ for domain partners.

Mid-term:‌ a first operational version‌‌ of LifeSWS enabling integrated artifact search and workflow‌ execution across heterogeneous environments;‌ a distributed Matrix-Profile-based anomaly‌‌ detection prototype validated on large environmental time series.‌
Long-term: a production-grade, scalable‌ service stack (catalog, provenance,‌‌ caching, scheduling) enabling reproducible analyses across data modalities‌ and models; a toolbox‌ of distributed time-series operators‌‌ usable in multiple environmental and One Health contexts.‌

3.2 Machine Learning with‌ Humans in the Loop‌‌

Cooperative learning in citizen science.

Platforms such as‌ Pl@ntNet and iNaturalist continuously‌ improve their identification models‌‌ from community-produced observations and revisions. This cooperative learning‌ loop is powerful but‌ raises new issues: sparse‌‌ and opportunistic revisions, strong imbalance across taxa/regions/users, and‌ extreme scale (tens of‌ millions of observations; tens‌‌ of thousands of classes). Standard crowdsourcing inference tools‌ (e.g., Bayesian aggregation) are‌ not directly applicable at‌‌ this scale 86. We will develop end-to-end‌ human-in-the-loop models that represent‌ user behavior and its‌‌ impact on training dynamics, building on our initial‌ contributions 88 and leveraging‌ modern approaches to detect‌‌ label issues 94. A key aim is‌ to prevent negative feedback‌ loops and to learn‌‌ principled user weighting strategies that remain robust under‌ sparsity and imbalance.

Bias-aware‌ species distribution models from‌‌ opportunistic data.

To monitor biodiversity under rapid global‌ change 85, species‌ distribution models (SDMs) increasingly‌‌ rely on citizen science data, whose scale is‌ unmatched but whose biases‌ are substantial 73,‌‌ 79. We will continue to develop statistically‌ grounded bias-correction methods 78‌, 77 and extend‌‌ them to modern AI-based SDMs 76 and to‌ Bayesian dynamic SDMs 75‌. We will also‌‌ address the fundamental limitation of presence-only data by‌ inferring absences from visit‌ histories and multi-species information‌‌ 97, and by modeling observer profiles (persistent‌ users, learners, taxonomic preferences)‌ to better disentangle detectability‌‌ from ecological signals.

Uncertainty, trust, and interpretability.

Reliable‌ downstream decisions require explicit‌ management of predictive uncertainty.‌‌ We will build on our work on set-valued‌ classification and abstention mechanisms‌ 91, 82 and‌‌ investigate uncertainty quantification and‌ propagation for structured biodiversity predictions (assemblages, indicators, abundance‌ maps). This includes generic tools such as conformal‌ prediction 96, 81 and scalable Bayesian approximations‌ 70. Finally, we will strengthen user trust‌ through transparency and interpretability mechanisms, extending prior work‌ on user-facing uncertainty and interactive features in Pl@ntNet‌ 90.

Mid-term: new scalable HITL (Human In‌ The Loop) models evaluated on real Pl@ntNet-style revision‌ streams; open-source SDM training components with improved bias‌ handling; first results on uncertainty propagation for biodiversity‌ indicators.
Long-term: integration of HITL and uncertainty-aware decision‌ modules into Pl@ntNet/GeoPl@ntNet-like services; robust bias-aware dynamic SDM‌ pipelines for long-term monitoring and scenario analysis.

3.3‌ Multiscale & Multimodal Data Analytics

Multimodal foundation models‌ for biodiversity and agro-ecology monitoring.

High-resolution monitoring of‌ biotic components remains challenging despite major advances in‌ sensors and AI 74. We will study‌ learning strategies that combine smartphone observations, scientific imaging‌ workflows, drones, remote sensing, and environmental covariates (e.g.,‌ bioclimatic layers 80) into multimodal pipelines. Because‌ fully end-to-end multimodal training can be costly, we‌ will emphasize self-supervised and foundation-model approaches, then reuse‌ learned representations for downstream ecological tasks. We will‌ also prioritize interpretability, combining transparency principles with post-hoc‌ explanations (e.g., Shapley-based methods 98).

Multivariate time‌ series and scalable similarity.

Environmental and One Health‌ applications increasingly involve multivariate time series, where variables‌ interact across time and scales. We will develop‌ parallel anomaly detection and similarity search methods (including‌ kNN Matrix Profile variants) building on our prior‌ distributed indexing and analytics experience 100, 92‌. We will also investigate visualization and representation‌ learning for complex multivariate temporal data to support‌ interactive exploration by domain experts.

Biodiversity trajectories and‌ community structure.

Predicting biodiversity trajectories requires models that‌ capture long-term dependencies and integrate heterogeneous historical evidence.‌ We will investigate deep architectures (including transformer-based models)‌ for integrative forecasting 89, and contrast them‌ with interpretable dynamic models grounded in ecological mechanisms‌ (e.g., Bayesian DSDMs 75). At the community‌ level, we will leverage graph-based approaches to analyze‌ and predict species assemblages via co-occurrence networks, relating‌ network structure to dispersal, filtering, and interactions 93‌, and validating on vegetation survey datasets 99‌.

Mid-term: prototypes for multi-species monitoring from complex‌ imagery and environmental covariates; multivariate time-series analytics components;‌ first case studies on forecasting short-term biodiversity trends.‌
Long-term: reusable multimodal foundation models shared openly; operational‌ toolchains for multiscale forecasting and scenario analysis; new‌ graph-based methods to predict and interpret species assemblages.‌

4 Application domains

The application domains covered by‌ Iroko focus on the environment, with the specific‌ needs of data-intensive scientific applications, i.e., management‌ and analytics of large amounts of (streaming) data.‌ Since the interaction with scientists is critical to‌ identify and tackle data management problems, we are‌ dealing primarily with application domains for which Montpellier‌ has an excellent track record, i.e., agronomy,‌ botany and life sciences, with our scientific partners‌ CIRAD, INRAE and IRD.

Let us briefly illustrate some representative examples of‌ scientific applications on which‌ we will work.

Monitoring‌‌ and preservation of plant biodiversity. In the continuity‌ of Zenith, Iroko is‌ the host team for‌‌ the Pl@ntNet citizen science platform. This initiative, piloted‌ by a consortium of‌ four research organisms (Inria,‌‌ CIRAD, INRAE and IRD), began in 2011 and‌ has become one of‌ the largest citizen science‌‌ platforms in the world. Its mobile front-end, allowing‌ to identify and share‌ plant observations, is used‌‌ by more than 20 million users worldwide, of‌ which 15% are professionals‌ in the fields of‌‌ land management, biodiversity management, education, agriculture, trade and‌ tourism. Pl@ntNet is one‌ of the official publisher‌‌ of the Global Biodiversity Information Facility (GBIF), the‌ world's largest government-funded biodiversity‌ data infrastructure. More than‌‌ 13 million Pl@ntNet observations have been published and‌ have been used in‌ hundreds of scientific publications‌‌ on various themes ranging from conservation, to agro-ecology‌ or to the impact‌ of climate change.
Biological‌‌ data integration and analysis. Biology and its‌ applications, from medicine to‌ agronomy and ecology, are‌‌ now producing massive data, which is revolutionizing the‌ way life scientists work.‌ For instance, using plant‌‌ phenotyping platforms such as HIRROS and PhenoArch at‌ INRAE Montpellier, quantitative genetic‌ methods allow to identify‌‌ genes involved in phenotypic variation in response to‌ environmental conditions. These methods‌ produce large amounts of‌‌ data at different time intervals (minutes to months),‌ at different sites and‌ at different scales ranging‌‌ from small tissue samples to the entire plant‌ until whole plant population.‌ Analyzing such big data‌‌ creates new challenges for data management and data‌ integration, but also for‌ plant modeling. We will‌‌ address this application in the context of the‌ French initiative OpenAlea, with‌ CIRAD and INRAE.
One‌‌ Health approach to fight antimicrobial resistance (AMR).‌ Antimicrobial resistance (AMR) refers‌ to the ability of‌‌ microorganisms, such as bacteria to resist the effects‌ of antimicrobial drugs that‌ were previously effective in‌‌ treating infections. It is a growing public health‌ threat that can make‌ infections more difficult and‌‌ costly to treat, leading to longer hospital stays,‌ and increased mortality rates.‌ A promising approach for‌‌ fighting AMR would be the One Health approach‌ that recognizes that the‌ health of humans, animals,‌‌ and the environment are interconnected. However, our ongoing‌ PROMISE project with experts‌ from different health and‌‌ environmental sectors has revealed that addressing AMR through‌ the One Health approach‌ is a complex and‌‌ multifaceted issue, which poses significant challenges from the‌ data science point of‌ view, including the following:‌‌ 1) Heterogeneous data collection and standardization; 2) Multivariate‌ data analysis; 3) Predictive‌ modeling; and 4) Data‌‌ sharing and access. This application will eventually bring‌ together 21 professional networks‌ and 42 academic partners.‌‌ Iroko will be central to interdisciplinarity at the‌ interface with data analytics‌ in this application through‌‌ the PROMISE PPR project led by INSERM.

5‌ Social and environmental responsibility‌

5.1 Footprint of research‌‌ activities

The footprint of‌ IROKO’s research activities mainly stems from (i) large-scale‌ computation and storage (e.g., deep learning training on‌ GPUs, large-scale analytics, and data management) and (ii)‌ travel for collaboration and dissemination. In continuity with‌ practices established in the predecessor team, we take‌ several measures to mitigate this footprint.

We promote‌ computing frugality by prioritizing model reuse (transfer learning,‌ warm starts, reuse of pretrained models) and by‌ improving experimental pipelines to avoid unnecessary retraining. When‌ models are deployed, we explore compression and efficiency-oriented‌ architectures to reduce memory and computational requirements.

For‌ widely deployed services such as Pl@ntNet, GeoPl@ntNet, and‌ PROMISE, we adopt an eco-design approach by focusing‌ on purposeful, non-addictive functionalities and by optimizing workflows‌ to limit unnecessary computation, storage, and data transfers.‌

We also favor open and reusable software, models,‌ and FAIR data (Findable, Accessible, Interoperable, and Reusable),‌ which encourages reuse and reproducibility and reduces redundant‌ data collection and re-computation across projects. Finally, we‌ limit long-distance travel when possible, favor train for‌ domestic trips, and increasingly rely on hybrid or‌ remote meetings.

5.2 Impact of research results

The‌ team aims to produce data science results with‌ direct impact on environmental sciences, One Health, and‌ sustainable practices. In 2025, this impact is materialized‌ through operational platforms and openly shared resources that‌ are already reused beyond the team.

GeoPl@ntNet provides‌ high-resolution (50 $\times$ 50 m) plant species distribution‌ maps for more than 15,000 species across Europe,‌ with freely downloadable outputs and biodiversity indicators supporting‌ research, conservation planning, and territorial management.

Pl@ntNet continues‌ to act as a large-scale citizen observatory used‌ by more than 20 million users worldwide, including‌ a significant share of professionals. Its open data‌ published on GBIF were used in 292 scientific‌ publications in 2025.

In marine ecology, the Seatizen‌ Atlas dataset (more than 1.6 million underwater and‌ aerial images) supports large-scale training and reuse of‌ AI models for cost-effective coral reef and habitat‌ monitoring.

For agriculture and agroecology, the Deep-Plant-Disease dataset‌ (about 250K images covering 55 crops and 175‌ diseases) provides a large and diverse benchmark to‌ improve plant disease identification and generalization.

In public‌ health, the PROMISE multi-cloud platform supports One Health‌ surveillance and research on antimicrobial resistance by integrating‌ aggregated data from human, animal, and environmental sectors.‌

As a longer-term strategy, the team also explores‌ the transfer of its scalable data management and‌ learning techniques to other domains. This is a‌ key motivation behind our participation in initiatives such‌ as the OMICFINDER challenge, which aims to unlock‌ the potential of vast public genomic databases to‌ enable new advances in medicine, ecology, and agriculture.‌

6 Highlights of the year

6.1 Awards

Prix‌ science ouverte des données de la recherche -‌ Seatizen Atlas: a collaborative dataset of underwater and‌ aerial marine imagery 18. First author: Matteo‌ Contini (IROKO PhD student). Last author: Alexis Joly‌ (PhD director).

6.2 Other key achievements

Publication in‌ the journal Nature Plants: Learning the syntax of plant assemblages 23.‌ First author: César Leblanc‌ (IROKO PhD student). Last‌‌ author: Alexis Joly (PhD director). Most cited Nature‌ Plants article during several‌ weeks. Highlighted by Nature‌‌ Plants as “Crystal Ball Time" paper.
GeoPl@ntNet:‌ a new software of‌ the Pl@ntNet family dedicated‌‌ to the high-resolution mapping of plant biodiversity has‌ been released. It has‌ been already used by‌‌ more than 1K users per month.

7 Latest‌ software developments, platforms, open‌ data

7.1 New Features‌‌ in the Pl@ntNet Platform

Participants: Antoine Affouard,‌ Hugo Gresse, Jean-Christophe‌ Lombardo, Thomas Paillot‌‌, Joseph Salmon, Alexis Joly, Józef‌ Tran.

Pl@ntNet is‌ a large-scale citizen observatory‌‌ relying on AI technologies to support plant identification‌ and biodiversity monitoring through‌ mobile and web applications.‌‌ In 2025, platform developments focused on strengthening Pl@ntNet’s‌ role as an operational‌ biodiversity data infrastructure, with‌‌ particular emphasis on community-level identification, interoperability, and integration‌ into decision-support workflows, notably‌ within the GUARDEN European‌‌ project.

A major development effort concerned the consolidation‌ and deployment of community-level‌ plant identification services, extending‌‌ Pl@ntNet beyond individual plant observations. In particular, the‌ platform’s workflow for vegetation‌ survey and plot images‌‌ was strengthened and operationalized, enabling the identification of‌ multiple co-occurring plant species‌ from complex imagery such‌‌ as quadrats, drone acquisitions, and roadside surveys. These‌ services were deployed and‌ validated in several real-world‌‌ GUARDEN case studies and integrated into downstream biodiversity‌ monitoring and mapping pipelines.‌

Significant progress was also‌‌ made on scalability and interoperability. Pl@ntNet services were‌ further integrated with external‌ platforms and decision-support tools‌‌ through improved APIs, facilitating their use within broader‌ analytical chains combining citizen‌ science data, remote sensing,‌‌ and predictive modeling. In particular, Pl@ntNet identification services‌ were connected to GeoPl@ntNet‌ and the GUARDEN Decision‌‌ Support Applications, enabling the seamless flow from raw‌ observations to high-resolution biodiversity‌ indicators and maps served‌‌ through standard web services (e.g. WMS).

In parallel,‌ developments targeted data ingestion‌ and management workflows. New‌‌ mechanisms were implemented to support the batch import‌ of large collections of‌ plant observations, addressing the‌‌ needs of institutions and organizations willing to contribute‌ existing datasets to the‌ platform. These features reduce‌‌ barriers to data sharing and strengthen Pl@ntNet’s capacity‌ to act as a‌ hub for heterogeneous biodiversity‌‌ observations.

7.2 New platforms

7.2.1 GeoPl@ntNet

Participants: Lukas‌ Picek, César Leblanc‌, Benjamin Deneu,‌‌ Rémi Palard, Thomas Paillot, Christophe Botella‌, Alexis Joly.‌

GeoPl@ntNet is a new,‌‌ large-scale web application developed in the context of‌ the Pl@ntNet platform for‌ the exploration, analysis, and‌‌ dissemination of plant biodiversity information, offering an unprecedented‌ combination of taxonomic coverage,‌ spatial extent, and spatial‌‌ resolution. The application provides high-resolution distribution maps (50‌ $\times$ 50 m) for‌ more than 15,000 plant‌‌ species across the entire European continent, making it‌ one of the most‌ comprehensive operational systems currently‌‌ available for plant biodiversity mapping at this scale.‌ GeoPl@ntNet relies on state-of-the-art‌ deep learning–based species distribution‌‌ models that integrate heterogeneous‌ environmental data—such as satellite imagery, climatic variables, land-use‌ information, and topography—with millions of in situ plant‌ observations collected through the Pl@ntNet platform. Beyond interactive‌ visualization, the application allows users to explore regions‌ of interest, compute biodiversity indicators (including protected, invasive,‌ and endemic species), and access detailed, spatially explicit‌ reports to support research, conservation planning, and territorial‌ management. A key feature of GeoPl@ntNet is the‌ open availability of its outputs: all species distribution‌ maps are made freely downloadable, fostering transparency, reuse,‌ and integration into external scientific studies, public policies,‌ and operational workflows. By combining continental-scale coverage, fine‌ spatial resolution, and open data dissemination within a‌ single platform, GeoPl@ntNet represents a unique operational contribution‌ to large-scale plant biodiversity monitoring and decision support.‌ The application is already used by more than‌ 1K users per month.

7.2.2 PROMISE

Participants: Reza‌ Akbarinia, Benoit Lange, Florent Masseglia.‌

The objective of the PROMISE (PROfessional coMmunIty network‌ on antimicrobial reSistancE) project 87 is to build‌ a large data warehouse for managing and analyzing‌ antimicrobial resistance (AMR) data. The PROMISE platform, of‌ the same name, is a multi-cloud data management‌ and analytics platform developed in the context of‌ the PROMISE project to support One Health surveillance‌ and research on antimicrobial resistance. The platform integrates‌ data from the human, animal and environmental sectors.‌ At present, the data handled by the platform‌ are aggregated (no personal data) and largely derived‌ from public sources. PROMISE relies on a modular‌ architecture organized into five independent "bubbles" (diffusion, query,‌ storage, administration and processing) that can be deployed‌ on any cloud. Services are containerized (Docker) and‌ orchestrated with Kubernetes; inter-service communication is performed through‌ REST APIs, and WebSockets are used for notifications.‌

The diffusion bubble provides both the web user‌ interface and the API entry point, with a‌ React-based viewer and a Quarkus (Java) gateway that‌ routes requests to the relevant services. The administration‌ bubble manages authentication and observability (monitoring and metrics‌ collection). The query bubble normalizes user requests and‌ aggregates results, while the storage bubble isolates raw‌ data on the providers’ infrastructures, translates normalized queries‌ into database-specific queries, and returns aggregated outputs. Current‌ storage connectors support PostgreSQL, InfluxDB and MongoDB. The‌ processing bubble orchestrates analytics over aggregated time series‌ and supports correlation modules implemented in Python and‌ connected to an event bus. Finally, an HDS-oriented‌ deployment option is being investigated to enable the‌ use of more sensitive health data while preserving‌ a strict separation between raw data and aggregated‌ outputs.

Contact: Reza Akbarinia

7.3 Open data

Prix‌ science ouverte des données de la recherche -‌ Seatizen Atlas: a collaborative dataset of underwater and‌ aerial marine imagery 18. Seatizen Atlas is‌ a citizen science dataset made of more than‌ 1.6 M underwater and aerial imagery collected in‌ shallow tropical coastal areas by using various low‌ cost platforms operated either by citizens or researchers.‌ Data discovery and access rely on DOI assignment while data interoperability and‌ reuse is ensured by‌ complying with widely used‌‌ community standards. The open-source data workflow is provided‌ to ease contributions from‌ anyone collecting pictures.

Pl@ntNet‌‌ GBIF data - A new release of Pl@ntNet‌ open data has been‌ published on GBIF (the‌‌ world's largest open data infrastructure for biodiversity). In‌ 2025, this data has‌ been used in 292‌‌ scientific publications.

Deep-Plant-Disease Dataset - We aggregated‌ and published the largest‌ and most diverse dataset‌‌ ever built for plant disease identification 38.‌ It comprises about 250K‌ images across 55 crop‌‌ species, 175 disease classes, and 333 unique crop-disease‌ composition as well as‌ novel text data designed‌‌ to enhance model generalization in multi crop disease‌ identification.

8 New results‌

8.1 Distributed Data and‌‌ Model Management

8.1.1 A Logic-Based Approach for Knowledge‌ Graph Data Integration

Participants:‌ Fabio Porto, Patrick‌‌ Valduriez.

In the context of the Dinizia‌ Inria associated team with‌ Brazil, we started a‌‌ collaboration with the Boreal Inria team to study‌ the combination of a‌ knowledge graph with rule-based‌‌ reasoning. In particular, we are interested in leveraging‌ the InteGraal framework developed‌ within the Boreal team,‌‌ which enables semantic integration and reasoning over heterogeneous‌ data sources. In this‌ context, we proposed Gypscie-KG‌‌ 55, an ML (Machine Learning) system that‌ combines data integration, rule-based‌ reasoning, and prediction services‌‌ to provide semantic access to domain knowledge using‌ a knowledge graph. In‌ addition to providing integration‌‌ of heterogeneous ML data within a knowledge graph,‌ we explore the use‌ of logic-based declarative techniques‌‌ to enable reasoning and semantic querying over ML‌ data.

8.1.2 Federated Learning‌

Participants: Patrick Valduriez.‌‌

Federated Learning (FL) is a promising distributed machine‌ learning approach that enables‌ collaborative training of a‌‌ global model using multiple edge devices. The data‌ distributed among the edge‌ devices is highly heterogeneous.‌‌ Thus, FL faces the challenge of data distribution‌ and heterogeneity, where non-independent‌ and identically distributed (non-IID)‌‌ data across edge devices may result in a‌ significant accuracy drop. Furthermore,‌ the limited computation and‌‌ communication capabilities of edge devices increase the likelihood‌ of stragglers, thus leading‌ to slow model convergence.‌‌ To address this problem, we proposed the FedDHAD‌ FL framework 26,‌ which comes with two‌‌ novel methods: Dynamic Heterogeneous model aggregation (FedDH) and‌ Adaptive Dropout (FedAD). The‌ combination of these two‌‌ methods makes FedDHAD significantly outperform state-of-the-art solutions in‌ terms of accuracy (up‌ to 6.7% higher), efficiency‌‌ (up to 2.02 times faster), and computation cost‌ (up to 15.0% smaller).‌

8.1.3 Distributed Web Infrastructure‌‌ for Integrated Pest Management

Participants: Christophe Pradal.‌

Crop protection and pest‌ management are major economic‌‌ and environmental concerns throughout Europe. The consultation of‌ decision support systems (DSS)‌ to guide decisions relating‌‌ to Integrated Pest Management (IPM) is one of‌ the key principles of‌ IPM, reducing the ambiguity‌‌ around potential risks to crop health. Pests in‌ this context include invertebrate‌ pests, weeds and pathogens.‌‌

In 63, to‌ facilitate the use of these models, two Application‌ Programming Interfaces (APIs) were designed to access catalog‌ of DSS models and European online weather data‌ sources. While these APIs are integrated into the‌ IPM Decisions Platform (IPM Decisions Platform),‌ they are also open source, allowing other crop‌ protection and farm management software to inspect, download,‌ modify, install, run, and use them.

The scientific‌ platform OpenAlea provides a new service, the IPM‌ Decision Factory, that enables DSS researchers and developers‌ to advance, combine and create DSS interactively into‌ its scientific workflow management system. These workflows are‌ then automatically transformed into web services to be‌ readily integrated into the IPM Decisions platform. This‌ ensures that new DSS have access to required‌ weather data and can be made readily accessible‌ across Europe, for validation and use. OpenAlea.EpyMix 25‌ is a model describing canopy growth and epidemic‌ dynamics on species mixture that has been integrated‌ into the IPM Decision platform to understand how‌ weather data, provided by the platform, and wheat-based‌ crop mixtures are a promising strategy to improve‌ disease management.

8.2 Data Analytics

8.2.1 Event Detection‌ in Time Series

Participants: Esther Pacitti, Fabio‌ Porto, Rebecca Salles.

Event detection in‌ time series is a basic function in surveillance‌ and monitoring systems and has been extensively explored‌ over the years.

The new book 56 published‌ by Springer and authored by Eduardo Ogasawara (CEFET-RJ,‌ Brazil), Rebecca Salles (Iroko), Fabio Porto (LNCC, Brazil)‌ and Esther Pacitti (Iroko), reflects our productive collaboration‌ with Brazil in the context of the Dinizia‌ associated team. It provides a general taxonomy for‌ event detection according to the specific event types:‌ anomaly detection, change-point, and motif discovery. It discusses‌ state-of-the-art metric evaluations for event detection methods and‌ on online event detection, including the challenges of‌ incremental and adaptive learning.

Anomaly detection methods implicitly‌ define detection criteria, such as deviation measures, filter‌ thresholds, and candidate anomaly selection strategies. Choosing inappropriate‌ criteria results in inaccurate outputs, generating spurious alerts‌ or missing events. Adjusting these criteria is essential‌ for monitoring systems. To address this challenge, we‌ explored the fine-tuning of deviation measures, filter thresholds,‌ and candidate selection strategies 52. Experimental results‌ show that the proper choice of criteria significantly‌ improves anomaly detection performance, often with greater impact‌ than changing the detection methods.

Concept drift detection‌ (CDD) is the general problem of identifying significant‌ changes in streaming data distribution over time. Current‌ CDD methods face challenges in large-scale, multivariate datasets,‌ where single drift detectors (DD) often fail to‌ capture variable interdependencies. While ensemble drift detectors (EDD)‌ are usually adopted to mitigate the adoption of‌ a single DD, EDD may suffer when detections‌ do not converge. This misalignment can cause voting‌ mechanisms to neglect critical intervals with high detection‌ rates. To address this issue, we proposed a‌ fuzzy ensemble drift detector (FEDD) 44 that integrates‌ unsupervised threshold voting with fuzzy logic to provide time tolerance and reconcile‌ minor temporal misalignments in‌ drift detection. Our evaluation‌‌ shows that FEDD outperforms existing approaches by improving‌ detection robustness and coverage.‌

8.2.2 Scalable Multivariate Anomaly‌‌ Detection

Participants: Reza Akbarinia, Benoit Lange,‌ Florent Masseglia, Esther‌ Pacitti, Rebecca Salles‌‌.

The continuous monitoring of dynamic processes generates‌ vast amounts of streaming‌ multivariate time series data.‌‌ Detecting anomalies within them is crucial for real-time‌ identification of significant events,‌ such as environmental phenomena,‌‌ security breaches, or system failures, which can critically‌ impact sensitive applications. Despite‌ significant advances in univariate‌‌ time series anomaly detection, scalable and efficient solutions‌ for online detection in‌ multivariate streams remain underexplored.‌‌ This challenge becomes increasingly prominent with the growing‌ volume and complexity of‌ multivariate time series data‌‌ in streaming scenarios.

In 33, we provide‌ the first structured survey‌ primarily focused on scalable‌‌ and online anomaly detection techniques for multivariate time‌ series, offering a comprehensive‌ taxonomy. Additionally, we introduce‌‌ the Online Distributed Outlier Detection (2OD) methodology, a‌ novel well-defined and repeatable‌ process designed to benchmark‌‌ the online and distributed execution of anomaly detection‌ methods. Experimental results with‌ both synthetic and real-world‌‌ datasets, covering up to hundreds of millions of‌ observations, demonstrate that a‌ distributed approach can enable‌‌ centralized algorithms to achieve significant computational efficiency gains,‌ averaging tens and reaching‌ up to hundreds in‌‌ speedup, without compromising detection accuracy.

8.2.3 Detecting Anomalies‌ with Any Duration in‌ Climate Time Series

Participants:‌‌ Reza Akbarinia, Guillaume Coulaud, Florent Masseglia‌.

Detecting abnormal climate‌ events across temporal and‌‌ spatial scales is crucial to the understanding of‌ local and regional climate‌ trends. Existing methods often‌‌ depend on prior knowledge about the timing, location,‌ or duration of such‌ events, limiting their versatility.‌‌ In 15, we propose ClimBurst, an approach‌ to detect climate bursts‌ (unusually high or low‌‌ values of climate variables) without prior assumptions about‌ their temporal duration. ClimBurst‌ offers the ability to:‌‌ (a) identify climate bursts of any duration within‌ the time series of‌ single locations, (b) link‌‌ climate bursts across neighboring locations, and (c) analyze‌ the spatio‐temporal propagation of‌ these anomalies. Applying ClimBurst‌‌ to sea surface temperature data from the Mediterranean‌ Sea (1960–2021) shows some‌ detected hot bursts and‌‌ anomalies coincide in time with known severe marine‌ heatwaves. ClimBurst also shows‌ how detected hot (cold)‌‌ bursts are spatio‐temporally connected and these connected bursts‌ have increased (decreased) in‌ duration, intensity, spatial extent‌‌ and frequency historically.

In 40, we propose‌ a demonstration of ClimBurst‌ allowing users to interact‌‌ directly with our system to see both a‌ summary showing the presence/absence‌ of bursts over a‌‌ user-specified year and spatial range. The demonstration will‌ also allow users to‌ perform time-travel queries to‌‌ see how bursts propagate over space and time.‌

8.2.4 Energy Efficient Time‌ Series Anomaly Detection

Participants:‌‌ Reza Akbarinia, Benoit Lange, Florent Masseglia‌, Esther Pacitti,‌ Rebecca Salles.

Traditionally,‌‌ choosing an anomaly detection‌ method for a given application is mainly driven‌ by detection accuracy and runtime. However, with the‌ rapid evolution of hardware and connected devices, massive‌ amounts of time series data are produced, and‌ the real-time analysis of such time series brings‌ new demands not only for accurate and scalable‌ solutions, but also for energy consumption management. In‌ this scenario, any improvement in energy efficiency can‌ have a considerable impact on both the environmental‌ footprint and the monetary expenses. In 53,‌ we address the problem of benchmarking time series‌ anomaly detection methods based on the trade-off between‌ accuracy, runtime, and energy consumption. We introduce a‌ new metric for evaluating relative energy efficiency performance,‌ called saveUp, and provide a novel methodology, inspired‌ by skyline queries, for benchmarking methods based on‌ a more comprehensive set of metrics, including peak‌ power usage and total energy consumption. Experimental results‌ based on large datasets show that our methodology‌ is useful for selecting the methods that provide‌ the best performance with the lowest energy impacts.‌ Moreover, results indicate that speedup and saveUp are‌ not always directly correlated as believed a priori,‌ and sometimes it is best to "take it‌ slow" in favor of green applications.

8.2.5 Extending‌ Matrix Profile for Seasonal Anomaly Detection

Participants: Reza‌ Akbarinia, Guillaume Coulaud, Florent Masseglia.‌

Seasonal time series analysis is fundamental in domains‌ such as climate science, where detecting and understanding‌ anomalies, patterns, and data changes are essential. The‌ classical Matrix Profile approach does not consider the‌ data’s seasonality, failing to detect seasonal anomalies and‌ patterns. In 60, we propose the Interval‌ Matrix Profile (IMP), a novel extension of the‌ Matrix Profile specifically designed for analyzing periodic and‌ seasonal time series data. The Interval Matrix Profile‌ enables flexible interval-based comparisons across seasons, allowing the‌ detection of anomalies that conventional approaches miss. We‌ further propose the constrained k Nearest Neighbor Interval‌ Matrix Profile, designed to identify anomalies that may‌ appear across multiple periods, a common characteristic of‌ abnormal climate events and extreme weather phenomena. Our‌ approach leverages a scalable block-based algorithm that achieves‌ significant performance gains through caching, vectorization, and parallelism.‌ Additionally, we introduce a novel methodology to detect‌ the first or last occurrence of a pattern,‌ enabling the discovery of pattern emergence or disappearance‌ within seasonal time series. The algorithms are demonstrated‌ in case studies on temperature climate time series.‌ They effectively capture seasonal anomalies and find pattern‌ disappearance. Our results illustrate that the IMP consistently‌ outperforms the classical Matrix Profile both in the‌ accuracy of seasonal anomaly detection and in computational‌ efficiency.

8.3 Machine Learning for Biodiversity and Agroecology‌

8.3.1 Learning Ecological Structure with Large Language Models‌

Participants: César Leblanc, Hervé Goëau, Maximilien‌ Servajean, Alexis Joly, Diego Marcos,‌ Pierre Bonnet.

This research axis explores how‌ large language models (LLMs) can be adapted to‌ capture and exploit structured ecological knowledge, with a focus on plant communities‌ and functional traits. By‌ transferring ideas from natural‌‌ language processing to ecology, these works investigate how‌ latent structure in species‌ assemblages and unstructured textual‌‌ resources can be leveraged to improve biodiversity understanding‌ and modeling.

In 23‌, the team introduces‌‌ an approach inspired by language modeling to learn‌ the “syntax” of plant‌ assemblages, treating abundance-ordered species‌‌ lists as ecological sequences. Trained on more than‌ 10,000 European plant species,‌ the model captures latent‌‌ associations shaped by environmental constraints, dispersal processes, and‌ species interactions. The learned‌ representations can be fine-tuned‌‌ for multiple downstream tasks, including predicting missing species‌ in assemblages and classifying‌ habitat types, where the‌‌ method consistently outperforms co-occurrence-based models, expert systems, and‌ standard neural networks. This‌ work demonstrates how sequence-based‌‌ modeling provides a powerful and flexible framework for‌ representing plant community structure.‌

Complementing this community-level perspective,‌‌ 27 focuses on species-level functional information and addresses‌ the challenge of assembling‌ large trait databases. Leveraging‌‌ the information extraction capabilities of large language models,‌ this work proposes a‌ fully automatic pipeline to‌‌ extract plant morphological traits from unstructured online textual‌ descriptions. The approach successfully‌ reconstructs expert-curated species–trait matrices‌‌ with high accuracy, showing that LLMs can transform‌ heterogeneous textual resources into‌ structured ecological knowledge at‌‌ scale, albeit with current limitations linked to data‌ availability.

Together, these contributions‌ illustrate the potential of‌‌ large language models to bridge different levels of‌ ecological organization, from individual‌ traits to species assemblages,‌‌ and to open new avenues for scalable, data-driven‌ biodiversity modeling, mapping, and‌ conservation science.

8.3.2 Scalable‌‌ Plant Vision Models for Operational Monitoring

Participants: Hervé‌ Goëau, Vincent Espitalier‌, Alexis Joly,‌‌ Pierre Bonnet.

This research axis investigates how‌ large-scale plant vision models‌ can be designed and‌‌ adapted for operational monitoring tasks, with a strong‌ emphasis on scalability, robustness,‌ and reduced annotation requirements‌‌ in real-world conditions.

In 20, the team‌ addresses the early detection‌ of invasive alien plant‌‌ species along roadsides, a major vector for biological‌ invasions. Rather than relying‌ on object detection or‌‌ segmentation pipelines that require extensive manual annotation, this‌ work evaluates the reuse‌ of a global plant‌‌ identification model trained on citizen science data. Using‌ a vision transformer from‌ the Pl@ntNet platform, the‌‌ study compares multi-label classification and tiling-based strategies applied‌ to high-resolution roadside imagery.‌ The results show that‌‌ the tiling approach achieves strong detection performance even‌ without task-specific fine-tuning, demonstrating‌ the potential of large‌‌ pretrained models for large-scale invasive species monitoring at‌ low cost.

From a‌ methodological perspective, 16 contributes‌‌ to this axis by proposing PlantAIM, a hybrid‌ vision architecture that combines‌ global attention mechanisms with‌‌ local feature extraction. By fusing transformer-based and convolutional‌ representations, the model improves‌ robustness and generalization in‌‌ challenging plant visual recognition settings, including limited training‌ data and heterogeneous environments.‌ These architectural insights directly‌‌ support the development of scalable plant vision systems‌ capable of reliable deployment‌ in operational monitoring scenarios.‌‌

8.3.3 Conformal Prediction for‌ uncertainty quantification

Participants: Joseph Salmon, Jean-Baptiste Fermanian‌.

Deep neural networks in computer vision produce‌ overconfident predictions without statistical guarantees, making uncertainty calibration‌ essential. Conformal prediction provides distribution-free guarantees but struggles‌ in the long-tailed, highly unbalanced settings typical of‌ large citizen science platforms, where many classes are‌ rare. Recent work highlights both theoretical limitations and‌ possible adaptations, including transductive and grouped conformal approaches‌ 42, 61.

Handling ambiguity further requires‌ integrating domain knowledge and leveraging multiple observations of‌ the same instance to better separate aleatoric from‌ epistemic uncertainty. Recent conformal approaches extend classification to‌ multi-input settings by aggregating conformal p-values across observations,‌ reducing prediction set size while preserving class-conditional coverage.‌ Such aggregation frameworks are particularly well suited to‌ citizen science applications, where multiple images per instance‌ are available, and naturally support refined decision rules‌ and rejection for uncertain predictions 41.

8.3.4‌ AI-Based Species Distribution Modeling and Mapping

Participants: Christophe‌ Botella, Alexis Joly, Théo Larcher,‌ César Leblanc, Diego Marcos, François Munoz‌, Rémi Palard, Lukáš Picek, Maximilien‌ Servajean, Dennis Shasha, Benjamin Bourel.‌

This research axis focuses on advancing species distribution‌ modeling (SDMs) through deep learning and multimodal data‌ integration, with the goal of overcoming key limitations‌ of classical approaches, including limited training data, the‌ absence of biotic interactions, and insufficient spatial resolution‌ for biodiversity mapping.

A first line of work‌ investigates how deep learning can extend SDMs beyond‌ presence-only prediction. In 14, the team demonstrates‌ that convolutional neural network–based SDMs can effectively model‌ species abundance by exploiting transfer learning from large‌ presence-only datasets. This strategy significantly improves abundance predictions,‌ particularly for rare species and locally rare occurrences,‌ and leads to clear performance gains over classical‌ SDMs.

A complementary direction explores the explicit integration‌ of biotic structure into SDMs. In 35,‌ a cascading prediction framework is proposed in which‌ common and dominant plant species are first predicted‌ from environmental variables, and these predictions are then‌ used to inform the distribution of less common‌ species. By leveraging species co-occurrence patterns and competitive‌ hierarchies, this approach improves prediction accuracy at fine‌ spatial resolutions, especially in species-rich environments.

In parallel,‌ 46 presents a large-scale, multimodal deep-SDM pipeline for‌ very-high-resolution biodiversity mapping across Europe. Based on the‌ integration of remote sensing data, climate time series,‌ and species occurrence records at 50 $\times$ 50‌ m resolution, this work produces continental-scale species distribution‌ maps, biodiversity indicators, and habitat maps. The approach‌ enables joint modeling of interspecies dependencies and large-scale‌ inference from heterogeneous data sources, supporting operational biodiversity‌ monitoring at unprecedented spatial detail.

8.3.5 Coral Reef‌ Monitoring

Participants: Matteo Contini, Sylvain Bonhommeau,‌ Victor Illien, Sylvain Poulain, Serge Bernard‌, Julien Barde, Alexis Joly.

This‌ research axis, conducted in close collaboration with Ifremer‌ and IRD, develops scalable AI-based methods for coral‌ reef monitoring by combining citizen-driven data collection, multi-scale imaging, and deep learning.‌ The overarching objective is‌ to enable accurate, fine-grained‌‌ ecological assessment over large reef areas while relying‌ on low-cost and operational‌ data acquisition.

The Seatizen‌‌ Atlas 18 provides the data backbone of this‌ effort, bringing together more‌ than 1.6 million underwater‌‌ and aerial images collected in shallow tropical environments‌ by citizens and researchers‌ using diverse platforms. The‌‌ dataset captures the strong variability inherent to real-world‌ marine imagery and is‌ distributed through an open,‌‌ standards-compliant workflow, enabling large-scale training and reuse of‌ AI models for marine‌ biodiversity mapping.

Building on‌‌ this resource, the collaboration explores how fine-scale ecological‌ information extracted from underwater‌ imagery can be transferred‌‌ to broader spatial scales. In 17, a‌ multi-scale learning framework propagates‌ detailed coral and habitat‌‌ classifications from underwater images to drone-based aerial imagery‌ through knowledge distillation. This‌ approach is further extended‌‌ in 39, which introduces a weakly supervised‌ semantic segmentation method that‌ combines underwater-derived supervision, spatial‌‌ interpolation, and self-distillation to minimize annotation effort. Together,‌ these contributions demonstrate how‌ multi-scale deep learning and‌‌ weak supervision can support cost-effective, high-resolution coral reef‌ monitoring at scale.

8.3.6‌ Evaluation of Species Identification‌‌ and Prediction Algorithms

Participants: Alexis Joly, Lukáš‌ Picek, Hervé Goëau‌, Christophe Botella,‌‌ Diego Marcos, César Leblanc, Théo Larcher‌.

This research axis‌ focuses on the large-scale,‌‌ rigorous evaluation of species identification and prediction algorithms,‌ with the objective of‌ characterizing state-of-the-art performance under‌‌ realistic conditions and identifying key methodological challenges for‌ biodiversity-oriented AI systems.

A‌ central activity in this‌‌ area is the organization of the LifeCLEF evaluation‌ campaign 45, 57‌, which continues to‌‌ attract hundreds of research teams and data scientists‌ worldwide. The 2025 edition‌ featured five complementary, data-driven‌‌ tasks covering a wide range of ecological modalities‌ and problem settings: AnimalCLEF‌ for open-set individual animal‌‌ re-identification, BirdCLEF+ for species recognition in complex acoustic‌ soundscapes, FungiCLEF for few-shot‌ classification of rare species,‌‌ GeoLifeCLEF 51 for plant species distribution prediction from‌ multimodal environmental data, and‌ PlantCLEF 49 for identifying‌‌ multiple co-occurring plant species in vegetation-plot imagery. Together,‌ these benchmarks provide a‌ unique and controlled view‌‌ of current capabilities and limitations in species-level AI.‌

A key insight emerging‌ across tasks is the‌‌ persistent impact of domain shift, particularly when training‌ and test data differ‌ in geography, sensing modality,‌‌ or species composition. While baseline models offered strong‌ starting points, the most‌ effective solutions relied on‌‌ large-scale pretraining, self-supervised and semi-supervised learning, and multimodal‌ data fusion. The results‌ of BirdCLEF+ highlighted the‌‌ potential of unlabeled audio data through contrastive learning,‌ whereas GeoLifeCLEF exposed the‌ difficulty of generalizing even‌‌ with high-resolution, multimodal inputs. Similarly, FungiCLEF and PlantCLEF‌ confirmed that few-shot and‌ weakly supervised scenarios remain‌‌ challenging, despite progress enabled by vision transformers, prototype-based‌ methods, and metadata-aware pipelines.‌ Overall, multimodality consistently emerged‌‌ as a key driver of robustness and performance,‌ alongside growing interest in‌ efficient and deployable architectures.‌‌

Complementing these benchmarking efforts,‌ 29 explores species identification in a heritage biodiversity‌ context, focusing on herbarium specimens. This study compares‌ hyperspectral leaf reflectance measurements with RGB image-based identification‌ using Pl@ntNet, showing that spectral approaches can achieve‌ high species-level accuracy from relatively small datasets, even‌ in the absence of reproductive structures. The results‌ highlight the complementarity of spectral and vision-based methods‌ and point to practical solutions for reducing taxonomic‌ knowledge gaps in large digitized collections.

8.3.7 Importance‌ of fossil pollen data for vegetation species distribution‌ modeling

Participants: Benjamin Bourel, Christophe Botella.‌

Given the current acceleration of climate change, anticipating‌ future responses in plant biodiversity is a major‌ scientific and societal challenge. We propose a resolutely‌ innovative approach to improve the predictability of European‌ vegetation dynamics, based on a long-term perspective covering‌ the last 20,000 years. By combining, for the‌ first time on a European scale, more than‌ 72,000 harmonized fossil pollen records, high-resolution paleoclimate simulations‌ and indicators of anthropogenic pressure, we aim to‌ unravel the respective roles of climate and human‌ activities in past ecosystem transformations. We are tackling‌ a major conceptual barrier in ecology and paleoecology:‌ the validity of the principle of actualism and‌ the underestimation of plant species' climatic niches due‌ to niche truncation.

The preliminary results obtained in‌ 2025 during a Master's internship carried out by‌ Marion Cann and supervised by Benjamin Bourel and‌ Christophe Botella, researchers who will supervise this postdoctoral‌ project, are very encouraging. The coupling of LegacyPollen‌ 1.0 data with the paleoclimate simulations highlighted that‌ the climatic hypervolume occupied by Olea in Europe‌ increased by 25% when taking into account past‌ data (data in Europe since the Last Glacial‌ Maximum), in addition to present data. The integration‌ of fossil data therefore makes it possible to‌ identify plant communities with no modern analogues and‌ to reconstruct more realistic fundamental niches for key‌ taxa in European ecosystems. These methodological advances pave‌ the way for more robust species distribution models,‌ capable of improving biodiversity projections in the face‌ of future climate change.

9 Bilateral contracts and‌ grants with industry

Participants: Antoine Affouard, Jean-Christophe‌ Lombardo, Hugo Gresse, Alexis Joly.‌

CIFRE contract with INA (Institut National de l'Audiovisuel):‌ PhD of Kawtar Zaher.
Pl@ntNet API for developers‌: 32 companies have signed up for paid‌ use of the service (110K euros in revenue‌ in 2025).

10 Partnerships and cooperations

10.1 International‌ initiatives

10.1.1 Associate Teams in the framework of‌ an Inria International Lab or in the framework‌ of an Inria International Program

Dinizia

Title:
Data‌ Science for the Natural Environment
Duration:
2025-2027
Coordinator:‌
Esther Pacitti (Iroko) and Eduardo Ogasawara (CEFET-RJ, Rio‌ de Janeiro, Brazil)
Partners:
- CEFET-RJ, Rio de Janeiro,‌ RJ
- Fiocruz, Rio de Janeiro, RJ
- LNCC, Petropolis,‌ RJ
- UFF, Rio de Janeiro, RJ
- UFRJ, Rio‌ de Janeiro, RJ
Inria contact:
Esther Pacitti
Summary:‌
The overall objective of Dinizia is to develop‌ new data science solutions that will eventually contribute to findings in environmental‌ and related sciences. These‌ solutions will be in‌‌ terms of methods and real systems. Our technical‌ objective within data science‌ is to help managing‌‌ complex dataflows by organizing massive and heterogeneous data,‌ in connection with models‌ and making related artifacts‌‌ (datasets, time series, models, metadata, dataflow components, etc.)‌ easy to search, debug,‌ and parallelize. A technical‌‌ goal of this project is to make dataflows‌ work as seamlessly with‌ data as queries do‌‌ in business processing. The work program includes three‌ major research topics: detecting‌ events in large time‌‌ series, model life-cycle management, and scalable execution of‌ heterogeneous dataflows. To validate‌ our solutions, we capitalize‌‌ on our previous experience in developing major systems‌ for scientific applications: Pl@ntNet‌ and OpenAlea from Inria;‌‌ Savime and Harbinger from Brazil. With our main‌ application partners (Cirad and‌ INRAE in France, Fiocruz‌‌ and Centro de Operações Rio in Brazil), we‌ will validate our results‌ using real datasets and‌‌ models. The main applications are in agronomy, biodiversity‌ informatics and meteorology.

10.1.2‌ Participation in other International‌‌ Programs

IVADO-Inria Program: IROKO & University of‌ Montreal have been selected‌ as one of the‌‌ 8 projects funded within the 2025 edition of‌ the IVADO-Inria Program.‌ Alexis Joly visited the‌‌ IRBV lab in Montreal two weeks in October‌ and Etienne Laliberté visited‌ IROKO in Montpellier two‌‌ weeks in November. These exchanges have helped consolidate‌ and structure the scientific‌ collaboration already underway around‌‌ the interface between artificial intelligence and plant ecology,‌ particularly with regard to‌ the challenges of rapid‌‌ monitoring of plant biodiversity using drones. It has‌ strengthened exchanges between teams‌ at the University of‌‌ Montreal (Department of Biological Sciences, IRBV, Mila) and‌ the research teams leading‌ the Pl@ntNet platform (Inria‌‌ IROKO, UMR AMAP).

10.2 International research visitors

10.2.1‌ Visits of international scientists‌

Inria International Chair

Participants:‌‌ Reza Akbarinia, Alexis Joly, Patrick Valduriez‌.

Fabio Porto, Laboratório‌ Nacional de Computação Científica‌‌ (LNCC, Brasil), holds an Inria International Chair for‌ a cumulative duration of‌ 12 months, spread over‌‌ the period from January 2024 to December 2028.‌

Other international visits to‌ the team

Dennis Shasha‌‌

Status
Researcher
Institution of origin:
University of New-York‌
Country:
USA
Dates:
April‌ 7 - June 7‌‌
Context of the visit:
DeepPEP contract
Mobility program/type‌ of mobility:
research stay,‌ lecture

Tiffany Ding

Status‌‌
PhD Student
Institution of origin:
University of Berkeley‌ (California)
Country:
USA
Dates:‌
March 1 - June‌‌ 30
Context of the visit:
Chaire ANR CAMELOT‌
Mobility program/type of mobility:‌
research stay

10.3 European‌‌ initiatives

10.3.1 Horizon Europe

B3

B3 project on‌ cordis.europa.eu

Title:
Biodiversity Building‌ Blocks for policy
Duration:‌‌
From March 1, 2023 to August 31, 2026‌
Partners:
- INSTITUT NATIONAL DE‌ RECHERCHE EN INFORMATIQUE ET‌‌ AUTOMATIQUE (INRIA), France
- UNIVERSITATEA OVIDIUS DIN CONSTANTA (OVIDIUS‌ UNIVERSITY OF CONSTANTA), Romania‌
- MARTIN-LUTHER-UNIVERSITAT HALLE-WITTENBERG (MLU), Germany‌‌
- Global Biodiversity Information Facility (GBIF), Denmark
- EIGEN VERMOGEN‌ VAN HET INSTITUUT VOOR‌ NATUUR- EN BOSONDERZOEK (EV‌‌ INBO), Belgium
- LA TROBE‌ UNIVERSITY (LTU), Australia
- JUSTUS-LIEBIG-UNIVERSITAET GIESSEN (JLU), Germany
- UNIVERSIDADE‌ DE AVEIRO (UAveiro), Portugal
- SOUTH AFRICAN NATIONAL BIODIVERSITY‌ INSTITUTE (SANBI), South Africa
- AGENTSCHAP PLANTENTUIN MEISE (AGENCE‌ JARDIN BOTANIQUE DE MEISE), Belgium
- ALMA MATER STUDIORUM‌ - UNIVERSITA DI BOLOGNA (UNIBO), Italy
- PENSOFT PUBLISHERS‌ (PENSOFT), Bulgaria
- STELLENBOSCH UNIVERSITY (SU UNIVERSITY OF STELLENBOSCH),‌ South Africa
Inria contact:
Alexis Joly
Summary:

The‌ world is changing rapidly; climate change, land use‌ change, pollution and natural resource exploitation are creating‌ a global crisis for biodiversity whose magnitude and‌ dynamics are hard to quantify. Decision makers at‌ all levels need up-to-date information from which to‌ evaluate policy options. For this reason rapid, reliable,‌ repeatable monitoring of biodiversity data is needed at‌ all scales from local to global. Only by‌ leveraging large volumes of data, advanced modeling techniques‌ and powerful computing tools can we hope to‌ synthesize these data within timescales that are relevant‌ to policy.

Data on biodiversity come from a‌ diverse range of sources, citizen scientists, museums, herbaria‌ and researchers are all major contributors, but increasingly‌ new technologies are being deployed, such as automatic‌ sensors, camera traps, eDNA and satellite tracking. Integrating‌ these data is a major challenge, but is‌ necessary if we are to create dependable information‌ on biodiversity change. B3 will use the concept‌ of data cubes to simplify and standardize access‌ to biodiversity data using the Essential Biodiversity Variables‌ framework. These cubes will be used, in conjunction‌ with other environmental data and scenarios, as the‌ basis for models and indicators of past, current‌ and future biodiversity.

The overarching goal of the‌ project is to provide easy access to tools‌ in a cloud computing environment, in real-time and‌ on-demand, with state-of-the-art prediction models of biodiversity, that‌ will output models and indicators of biodiversity status‌ and change. The project envisages a future where‌ primary biodiversity data are seamlessly integrated into monitoring‌ and forecasting such that policy and management can‌ proactively respond to problems while at the same‌ time reduce the costs of monitoring and management,‌ and the negative impacts of biodiversity change.

GUARDEN‌

GUARDEN project on cordis.europa.eu

Title:
safeGUARDing biodivErsity aNd‌ critical ecosystem services across sectors and scales
Duration:‌
From November 1, 2022 to October 31, 2025‌
Partners:
- INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET‌ AUTOMATIQUE (INRIA), France
- PARC NATIONAL DE PORT-CROS (CONSERVATOIRE‌ BOTANIQUE NATIONAL MEDITERRANEEN DE PORQUEROLLES), France
- STICHTING NATURALIS‌ BIODIVERSITY CENTER (NATURALIS), Netherlands
- YPOURGEIO GEORGIAS, AGROTIKIS ANAPTYXIS‌ KAI PERIVALLONTOS (MINISTRY OF AGRICULTURE, RURAL DEVELOPMENT AND‌ ENVIRONMENT OF CYPRUS), Cyprus
- DREVEN SRL, Belgium
- PLYMOUTH‌ MARINE LABORATORY LIMITED (PML), United Kingdom
- UNIVERSITY OF‌ ANTANANARIVO, Madagascar
- CHAROKOPEIO PANEPISTIMIO (HAROKOPIO UNIVERSITY OF ATHENS‌ (HUA)), Greece
- INSTITUT METROPOLI (BARCELONA INSTITUTE OF REGIONAL‌ AND METROPOLITAN STUDIES), Spain
- AGENCIA ESTATAL CONSEJO SUPERIOR‌ DE INVESTIGACIONES CIENTIFICAS (CSIC), Spain
- DRAXIS ENVIRONMENTAL SA‌ (DRAXIS), Greece
- EBOS TECHNOLOGIES LIMITED (eBOS), Cyprus
- CENTRE‌ DE COOPERATION INTERNATIONALE EN RECHERCHE AGRONOMIQUE POUR LE‌ DEVELOPPEMENT - C.I.R.A.D. EPIC (CIRAD), France
- AGENTSCHAP PLANTENTUIN‌ MEISE (AGENCE JARDIN BOTANIQUE DE MEISE), Belgium
- ENVECO ANONYMI ETAIRIA PROSTASIAS KAI‌ DIAHIRISIS PERIVALLONTOS A.E. (ENVECO‌ S.A. ENVIRONMENTAL PROTECTION AND‌‌ MANAGEMENT), Greece
- AREA METROPOLITANA DE BARCELONA (AMB), Spain‌
- FREDERICK UNIVERSITY FU (FREDERICK‌ UNIVERSITY FU), Cyprus
- EREVNITIKO‌‌ PANEPISTIMIAKO INSTITOUTO SYSTIMATON EPIKOINONION KAI YPOLOGISTON (RESEARCH UNIVERSITY‌ INSTITUTE OF COMMUNICATION AND‌ COMPUTER SYSTEMS), Greece
Inria‌‌ contact:
Alexis Joly
Summary:
GUARDEN’s main mission is‌ to safeguard biodiversity and‌ its contributions to people‌‌ by bringing them at the forefront of policy‌ and decision-making. This will‌ be achieved through the‌‌ development of user-oriented Decision Support Applications (DSAs), and‌ leveraging on Multi-Stakeholder Partnerships‌ (MSPs). They will take‌‌ into account policy and management objectives and priorities‌ across sectors and scales,‌ build consensus to tackle‌‌ data gaps, analytical uncertainties or conflicting objectives, and‌ assess options to implement‌ adaptive transformative change. To‌‌ do so, GUARDEN will make use of a‌ suite of methods and‌ tools using Deep Learning,‌‌ Earth Observation, and hybrid modeling to augment the‌ amount of standardized and‌ geo-localized biodiversity data, build-up‌‌ a new generation of predictive models of biodiversity‌ and ecosystem status indicators‌ under multiple pressures (human‌‌ and climate), and propose a set of complementary‌ ecological indicators likely to‌ be incorporated into local‌‌ management and policy. The GUARDEN approach will be‌ applied at sectoral case‌ studies involving end users‌‌ and stakeholders through Multi-Stakeholder Partnerships, and addressing critical‌ cross-sectoral challenges (at the‌ nexus of biodiversity and‌‌ deployment of energy/transport infrastructure, agriculture, and coastal urban‌ development). Thus, the GUARDEN‌ DSAs shall help stakeholders‌‌ engaged in the challenge to improve their holistic‌ understanding of ecosystem functioning,‌ biodiversity loss and its‌‌ drivers and explore the potential ecological and societal‌ impacts of alternative decisions.‌ Upon the acquisition of‌‌ this new knowledge and evidence, the DSAs will‌ help end-users not only‌ navigate but also (re-)shape‌‌ the policy landscape to make informed all-encompassing decisions‌ through cross-sectoral integration.

MAMBO‌

MAMBO project on cordis.europa.eu‌‌

Title:
Modern Approaches to the Monitoring of BiOdiversity‌
Duration:
From September 1,‌ 2022 to August 31,‌‌ 2026
Partners:
- INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE‌ ET AUTOMATIQUE (INRIA), France‌
- AARHUS UNIVERSITET (AU), Denmark‌‌
- STICHTING NATURALIS BIODIVERSITY CENTER (NATURALIS), Netherlands
- THE UNIVERSITY‌ OF READING, United Kingdom‌
- HELMHOLTZ-ZENTRUM FUR UMWELTFORSCHUNG GMBH‌‌ - UFZ, Germany
- ECOSTACK INNOVATIONS LIMITED, Malta
- UK‌ CENTRE FOR ECOLOGY AND‌ HYDROLOGY, United Kingdom
- CENTRE‌‌ DE COOPERATION INTERNATIONALE EN RECHERCHE AGRONOMIQUE POUR LE‌ DEVELOPPEMENT - C.I.R.A.D. EPIC‌ (CIRAD), France
- PENSOFT PUBLISHERS‌‌ (PENSOFT), Bulgaria
- UNIVERSITEIT VAN AMSTERDAM (UvA), Netherlands
Inria‌ contact:
Alexis Joly
Summary:‌
EU policies, such as‌‌ the EU biodiversity strategy 2030 and the Birds‌ and Habitats Directives, demand‌ unbiased, integrated and regularly‌‌ updated biodiversity and ecosystem service data. However, efforts‌ to monitor wildlife and‌ other species groups are‌‌ spatially and temporally fragmented, taxonomically biased, and lack‌ integration in Europe. To‌ bridge this gap, the‌‌ MAMBO project will develop, test and implement enabling‌ tools for monitoring conservation‌ status and ecological requirements‌‌ of species and habitats for which knowledge gaps‌ still exist. MAMBO brings‌ together the technical expertise‌‌ of computer science, remote‌ sensing, social science expertise on human-technology interactions, environmental‌ economy, and citizen science, with the biological expertise‌ on species, ecology, and conservation biology. MAMBO is‌ built around stakeholder engagement and knowledge exchange (WP1)‌ and the integration of new technology with existing‌ research infrastructures (WP2). MAMBO will develop, test, and‌ demonstrate new tools for monitoring species (WP3) and‌ habitats (WP4) in a co-design process to create‌ novel standards for species and habitat monitoring across‌ the EU and beyond. MAMBO will work with‌ stakeholders to identify user and policy needs for‌ biodiversity monitoring and investigate the requirements for setting‌ up a virtual lab to automate workflow deployment‌ and efficient computing of the vast data streams‌ (from on-the-ground sensors, and remote sensing) required to‌ improve monitoring activities across Europe (WP4). Together with‌ stakeholders, MAMBO will assess these new tools at‌ demonstration sites distributed across Europe (WP5) to identify‌ bottlenecks, analyze the cost-effectiveness of different tools, integrate‌ data streams and upscale results (WP6). This will‌ feed into the co-design of future, improved and‌ more cost-effective monitoring schemes for species and habitats‌ using novel technologies (WP7), and thus lead to‌ a better management of protected sites and species.‌

JAMRAI 2

Participants: Reza Akbarinia, Benoit Lange‌, Florent Masseglia.

Title:
Joint Action Antimicrobial‌ Resistance and Healthcare Associated Infections 2
Duration:
2024‌ to 2027
Partners:
- Institut National de la Santé‌ et de la Recherche Médicale (INSERM), France
- Agence‌ Nationale de Sécurité du Médicament et des Produits‌ de Santé (ANSM), France
- Agence Nationale de Sécurité‌ Sanitaire (Anses), France
- Centre hospitalier universitaire de Nantes‌ (CHUN), France,
- Service public fédéral Santé publique, Sécurité‌ de la Chaîne alimentaire, Belgium
- GESUNDHEIT ÖSTERREICH GMBH,‌ Austria
- Ministry of Health of the Republic of‌ Cyprus, Cyprus
- STATNI ZDRAVOTNI USTAV, Czech Republic
- STATENS‌ SERUM INSTITUT, Denmark
- DANMARKS TEKNISKE UNIVERSITET, Denmark
- And‌ more than 100 more institutions from different European‌ countries.
Inria contact:
Reza Akbarinia
Summary:
EU-JAMRAI2, in‌ which more than 120 institutions from different European‌ countries participate, is a European collaborative initiative aimed‌ at analyzing and understanding antimicrobial resistance and the‌ infectious diseases associated with it. One of the‌ objectives of the project is to gain a‌ deeper understanding of the mechanisms underlying antimicrobial resistance‌ and its transmission across populations. In this project,‌ we plan to analyze data from humans, animals,‌ and the environment sectors, to support public health‌ policymakers in making informed decisions. Because each country‌ has its own health management system, the initial‌ focus of the project is to evaluate and‌ identify key features that can be applied across‌ European countries. We plan to harmonize analyses among‌ participating nations. To facilitate data analytics, a selection‌ of standardized metrics from diverse domains is essential.‌ The project also aims to consolidate data from‌ multiple countries into a single platform, enabling researchers‌ from different fields to perform integrated analyses.

10.4‌ National initiatives

PARAD (PARSADA), (2025-2030), 7.7 MEuros.

Participants:‌ Benjamin Bourel, Alexis Joly, Thomas Paillot.

The cross-disciplinary PARAD‌ project aims to anticipate,‌ innovate and support the‌‌ agroecological transition in weed management by overcoming the‌ obstacles created by the‌ reduction in herbicides and‌‌ the withdrawal of molecules through (i) a better‌ understanding of the biological‌ characteristics (traits) of weeds‌‌ that are identified as being responsible for the‌ failure of practices or‌ the conditions that allow‌‌ species to circumvent practices (WP1), (ii) quantifying/optimizing existing‌ agroecological levers (WP2), (iii)‌ promoting technical and technological‌‌ innovation in order to detect, identify and manage‌ weeds using alternative methods‌ to herbicides (WP3), (iv)‌‌ quantitative analyses through simulations and field trials of‌ weed management effectiveness from‌ a multi-criteria assessment (crop‌‌ yield, GHGs (greenhouse gases) emissions,

impact on biodiversity,‌ etc.) (WP4), (v) support‌ for the collective design‌‌ of systemic solutions, through case studies involving farmers‌ and other local stakeholders‌ (WP5), (vi) a renewed‌‌ interest in the recognition and biology of weeds‌ in order to take‌ the right action through‌‌ initial and ongoing training (WP6).

Led by INRAE,‌ PARAD brings together 19‌ funded partners and 146‌‌ permanent staff from these organizations. For practical reasons,‌ Iroko did not wish‌ to participate directly in‌‌ this project as a partner. However, Iroko is‌ fully involved in WP3.‌ Benjamin Bourel is Pl@ntNet‌‌ advisor of WP3. Iroko is involved in defining‌ standardized protocols for data‌ acquisition via smartphone and‌‌ equivalent devices, on a 1m² plot scale. It‌ is also involved in‌ defining annotation formats and‌‌ metadata. The aim is to ensure the effective‌ integration, storage and reuse‌ of this data within‌‌ INRIA's Pl@ntNet ecosystem. This will enable the project‌ to take full advantage‌ of existing infrastructure, tools‌‌ and communities. To this end, the Pl@ntNet API‌ and batch import tools‌ are being developed to‌‌ explicitly support plot-related data. This is a mutually‌ beneficial relationship for Iroko‌ and PARAD. PARAD benefits‌‌ from Pl@ntNet's experience and infrastructure for plant identification.‌ For its part, Pl@ntNet‌ benefits from the project's‌‌ numerous partners, infrastructure and resources to acquire large‌ quantities of plant images‌ labeled by professionals and‌‌ to test these APIs and the new Pl@ntNet‌ features (notably the one‌ for multi-species identification).

Past2ECO‌‌ (PEPR Agroécologie et Numérique), (2026-2031), 3 MEuros.

Participants:‌ Benjamin Bourel, Alexis‌ Joly.

Agroecology relies‌‌ on the development of new genetic diversity (even‌ lost ones) and practices‌ (including varietal mixture) ensuring‌‌ ecosystemic services (beyond yield stability) with limited to‌ zero inputs (fertilizers and‌ pesticides) and facing climatic‌‌ variability and extreme events in the context of‌ ongoing climate change. Past2ECO‌ proposes to investigate both‌‌ genetic diversity and agricultural practices relevant to agroecology‌ by combining between- and‌ within-crop past (exploiting herbarium‌‌ specimens) and contemporaneous (leveraging seed banks) genetic diversity‌ of wheat and sorghum,‌ to provide practical and‌‌ knowledge-driven solutions for climate-resilient agriculture and support the‌ agroecological transition.

Past2ECO brings‌ together complementary expertise in‌‌ botany, genomics, biology, agronomy, AI and computer vision‌ from eight institutes in‌ integrating historical genetic knowledge,‌‌ cutting-edge genomics, and AI-based‌ phenotyping tools to guide agroecological crop transitions. Past2ECO‌ aims to decipher historic diversity (WP1), unveil adaptive‌ genomic footprints (WP2) evaluated in the field (WP3),‌ all using AI-based image analysis technologies (WP4). The‌ WP4 of the project is led by Iroko‌ (Benjamin Bourel).

Triticeae and sorghum herbarium collections available‌ are estimated to contain 18 101 specimens from‌ 114 countries spanning 321 years from 1700-2021. After‌ curation and classification, using AI-based morphometry, the analysis‌ of ancient DNA, compared to that of worldwide‌ accessions hosted in seed banks, will allow us‌ to document the genetic diversity and the evolutionary‌ trajectory of the use of varietal mixtures from‌ past to present-day, and assess changes in their‌ soil-root metagenome associations. Exploiting diversity, from the past‌ to the present using innovative genomic offset statistics,‌ we will be able to predict the optimal‌ genotypes and varietal mixtures for current and future‌ climate. We will validate how such predicted adaptive‌ diversity manifests phenotypically in the field, how they‌ can be used to guide the development of‌ agroecological practices such as varietal mixtures in a‌ climatic gradient, and what their benefits and evolutionary‌ dynamics are under realistic on-farm diversity management practices.‌

Past2ECO builds a strong bridge between computer science‌ and agroecology in unveiling specimen classification from AI-based‌ geometric morphometrics, the development of new machine learning‌ methods using neural networks on hyperspectral images to‌ discriminate between varieties, and digital leaf phenotypes from‌ herbarium specimens as an indicator of adaptation to‌ global climate change.

Overall, Past2ECO will contribute to‌ PEPR ‘Agroecology and ICT’ by delivering a cutting-edge‌ proof-of-concept project aiming to decipher and exploit past‌ (so-called lost or underused) adaptive diversity of current‌ and future major crop species, wheat and sorghum,‌ to design varieties for agroecological transitions in the‌ context of climate change.

Pl@ntAgroEco (PEPR Agroécologie et‌ Numérique), (2023-2027), 1.6 MEuros.

Participants: Antoine Affouard,‌ Christophe Botella, Hervé Goëau, Hugo Gresse‌, Alexis Joly, Thomas Paillot.

Agroecology‌ necessarily involves crop diversification, but also the early‌ detection of diseases, deficiencies and stresses (hydric, etc.),‌ as well as better management of biodiversity. The‌ main stumbling block is that this paradigm shift‌ in agricultural practices requires expert skills in botany,‌ plant pathology and ecology that are not generally‌ available to those working in the field, such‌ as farmers or agri-food technicians. Digital technologies, and‌ artificial intelligence in particular, can play a crucial‌ role in removing this barrier to access to‌ knowledge.

The aim of the Pl@ntAgroEco project will‌ be to design, experiment with and develop new‌ high-impact agro-ecology services within the Pl@ntNet platform. This‌ includes : AI and plant science research; agile‌ development of new components within the platform; organizing‌ participatory science programs and animating the Pl@ntNet user‌ community. The project is leaded by Iroko (Alexis‌ Joly).

FishPredict (ANR), (2022-2025), 500 KEuros.

Participants: Benjamin‌ Bourel, Alexis Joly, Maximilien Servajean,‌ Julien Thomazo.

FishPredict ANR project funded in the context of the‌ IA-Biodiv challenge. The projects‌ aims at predicting the‌‌ biodiversity of reef fishes using AI technologies. Alexis‌ Joly is co-leading of‌ the whole project jointly‌‌ with David Mouillot, marine ecologist at the MARBEC‌ lab.

DeepPEP (ANR), (2025-2027),‌ 25 KEuros.

Participants: Reza‌‌ Akbarinia, Dennis Shasha, Patrick Valduriez.‌

The DeepPEP project, between‌ CNRS, INRAE and Inria‌‌ Iroko, aims to enhance the fundamental understanding of‌ nutrient homeostasis in plants‌ and develop new biostimulants‌‌ using signaling peptides. The main objective is to‌ acquire fundamental knowledge on‌ the control of nutrient‌‌ homeostasis in plants and to develop new biotechnological‌ resources in the form‌ of signaling peptides as‌‌ biostimulants. The project seeks to create new AI‌ algorithms for designing peptides‌ that interact with any‌‌ protein and develop potential biostimulants to enhance nitrogen‌ (N) and phosphorus (P)‌ efficiency in agriculture. In‌‌ this project, Iroko provides its expertise in time‌ series query processing

PPR‌ Antibiorésistance: structuring tool "PROMISE"‌‌ (2021-2024), 240 KEuros.

Participants: Reza Akbarinia, Florent‌ Masseglia.

The objective‌ of the PROMISE (PROfessional‌‌ coMmunIty network on antimicrobial reSistancE) project is to‌ build a large data‌ warehouse for managing and‌‌ analyzing antimicrobial resistance (AMR) data. It gathers 21‌ existing professional networks and‌ 42 academic partners from‌‌ three sectors, human, animal, and environment. The project‌ is based on the‌ following transdisciplinary and cross-sectoral‌‌ pillars: i) fostering synergies to improve the One‌ Health surveillance of antibiotic‌ consumption and AMR, ii)‌‌ data sharing for improving the knowledge of professionals,‌ iii) improving clinical research‌ by analyzing the shared‌‌ data.

PNR "Beerisk" (2022-2025). 200 KEuros.

Participants: Reza‌ Akbarinia, Florent Masseglia‌.

The objective of‌‌ this project is to analyze honeybee daily mortality‌ rates, represented as time‌ series, in order to‌‌ detect anomalies and study the lethal effects of‌ bees exposure to pesticides.‌

Plan national Ecoantibio "INTERSECTION"‌‌ (2024-2028), 175 Keuros

Participants: Reza Akbarinia, Florent‌ Masseglia.

The objective‌ of the INTERSECTION project‌‌ is to produce intersectoral and territorial indicators for‌ monitoring resistance and use‌ of antibiotics in France,‌‌ and to facilitate the use and analysis of‌ these indicators, in a‌ One health approach.

PEPR‌‌ agroécologie et numérique "RootSystemTracker" (2024-2027), 144 Keuros

Participants:‌ Reza Akbarinia, Christophe‌ Pradal, Lo'Ai Gandeel‌‌.

Roots play a crucial role in nutrient‌ and water uptake, atmospheric‌ carbon fixation, and soil‌‌ interactions, significantly influencing resource use efficiency and crop‌ resilience to environmental stresses.‌ The objective of the‌‌ RootSystemTracker project is to develop efficient methods for‌ the spatio-temporal phenotyping of‌ plant root architectures using‌‌ heterogeneous data. This involves automatically capturing their topology‌ and geometry over time,‌ despite challenges such as‌‌ root occlusions and variability in observation conditions.

Inria‌ Challenge OMICFINDER (2023-2027), 1‌ Engineer - 24 months‌‌

Participants: Reza Akbarinia, Rebecca Pontes Salles,‌ Florent Masseglia.

While‌ genomic sequencing is enabling‌‌ crucial advances in medicine, ecology, and agriculture, the‌ exponential growth of public‌ databases (48 petabytes by‌‌ 2023) remains largely untapped‌ due to the lack of efficient querying methods.‌ OMICFINDER proposes an innovative global search engine that‌ makes it possible to query nucleotidic sequences against‌ the vast amount of publicly available genomic data.‌ Combining novel algorithms, semantic web technologies, and distributed‌ indexing with a focus on environmental sustainability, it‌ aims to unlock this treasure trove of information‌ – bringing the equivalent of a search engine‌ to genomics at last. The project is led‌ by Pierre Peterlongo (GenScale team, Inria Rennes).

10.4.1‌ Others

Participants: Alexis Joly, Jean-Christophe Lombardo,‌ Hervé Goëau, Hugo Gresse, Mathias Chouet‌, Antoine Affouard, David Margery.

Pl@ntNet‌ consortium: In 2025, CNRS has joined the Pl@ntNet‌ consortium as a new member. This contract, initially‌ signed by four founding research organisms (Inria, CIRAD,‌ IRD, INRAE) aims at sustaining the Pl@ntNet platform‌ in the long term. It has been initiated‌ in November 2019 in the context of the‌ InriaSOFT national program of Inria. Each partner subscribes‌ a yearly subscription (10-20K euros per year) to‌ cover engineering costs for maintenance and technological developments.‌ Depending on the membership status, each partner has‌ one vote in the steering committee and/or the‌ technical committee of the platform. He can also‌ use the platform in his own projects and‌ benefit from a certain number of service days‌ within the platform. The consortium is not fixed‌ and is intended to be extended to other‌ members in the coming years.

10.5 Regional initiatives‌

Regional project "DACLIM" (2023-2026), 70 Keuros

Participants: Reza‌ Akbarinia, Florent Masseglia, Guillaume Coulaud.‌

The objective of this project is to develop‌ scalable techniques based on massive data distribution to‌ enable the efficient detection of anomalies in large‌ climate databases. The detection of anomalies in climate‌ data can provide climatologists with insights into the‌ behavior of various climatological variables, understanding of extreme‌ events such as heatwaves and cold snaps, as‌ well as the prediction of these types of‌ events.

10.6 Public policy support

CESE consultation on‌ the impact of AI on the environment

The‌ CESE (Conseil Economique, Social et Environnemental) is one‌ of the 3 assemblies of the French constitution,‌ made up of elected representatives of civil society‌ (unions, associations, companies, students, etc.). Its role is‌ to provide advice on economic, social and environmental‌ policies to guide public decision-making (governmental in particular).‌ Alexis Joly took part in the consultation entitled‌ “Impacts of artificial intelligence: risks and opportunities for‌ the environment”. He was consulted and interviewed on‌ several occasions and was one of the 3‌ experts invited to the final plenary session that‌ voted on the recommendations.

OECD report on the‌ advancement of the productivity of science with citizen‌ science and artificial intelligence

Alexis Joly is a‌ co-author of the chapter “Advancing the productivity of‌ science with citizen science and artificial intelligence” in‌ the OECD report Artificial Intelligence in Science: Challenges,‌ Opportunities and the Future of Research (PDF). He participated in‌ all preparatory meetings and‌ contributed approximately 15% of‌‌ the chapter, based on Iroko’s expertise in citizen‌ science, large-scale ecological data,‌ and AI-driven biodiversity monitoring.‌‌

11 Dissemination

11.1 Promoting scientific activities

11.1.1 Scientific‌ events: organization

General chair,‌ scientific chair

Alexis Joly‌‌ : Main organizer of DeepSDM 2025, first conference‌ on Deep Species Distribution‌ Models (100 attendees).
Joseph‌‌ Salmon : Main organizer of MLMTP (Machine Learning‌ seminar)

11.1.2 Scientific events:‌ selection

Member of the‌‌ conference program committees

Reza Akbarinia : ICDM 2025,‌ ECML-PKDD 2025, IEEE BigData‌ 2025.
Florent Masseglia :‌‌ ICDM 2025, ECML-PKDD 2025, DS 2025, PAKDD 2025,‌ SAC 2025.

Reviewer

Alexis‌ Joly : CLEF 2025,‌‌ ACM MM 2025

11.1.3 Journal

Editor, Associate editor‌

Reza Akbarinia : associate‌ editor of IEEE Transactions‌‌ on Knowledge and Data Engineering (TKDE).
Joseph Salmon‌ : Associate Editor of‌ IEEE Transaction on Image‌‌ Processing
Joseph Salmon : Action Editor of the‌ Journal of Machine Learning‌ Research
Joseph Salmon :‌‌ Associate Editor of the Electronic Journal of Statistics‌

Member of the editorial‌ boards

Reza Akbarinia :‌‌ Transactions on Large Scale Data and Knowledge Centered‌ Systems (TLDKS).
Patrick Valduriez‌ : Distributed and Parallel‌‌ Databases.

Reviewer - reviewing activities

Florent Masseglia :‌ Data & Knowledge Engineering.‌

11.1.4 Invited talks

Patrick‌‌ Valduriez gave an invited talk on AI and‌ Scientific Research on:
- July‌ 2, 2025 at Cirad,‌‌ Montpellier;
- October 16, 2025 at Inria, Montpellier;
- July‌ 17, 2025 at the‌ Dinizia workshop, CEFET-RJ, Rio‌‌ de Janeiro, Brazil;
- November 6, 2025 at Workshop‌ of the Artificial Intelligence‌ Institute, LNCC, Petropolis,‌‌ Brazil.
Joseph Salmon
- invited talk for IA Connect‌ (launch of IA Montpellier‌ Méditerranée)
- invited talk at‌‌ OSCII 2025
Alexis Joly
- MILA Tea talk -‌ November 17, 2025 -‌ Montreal
- IRBV seminar -‌‌ October 23, 2025 - Montreal
- Congrès annuel du‌ Collège des Sociétés savantes‌ académiques de France -‌‌ February 4, 2025 - Montpellier
- EU JRC workshop‌ on trees and habitat‌ mapping - November 18-19,‌‌ 2025 - Ispra

11.1.5 Leadership within the scientific‌ community

Esther Pacitti :‌ Member of the Steering‌‌ Committee of the BDA conference.
Reza Akbarinia :‌ Member of the Steering‌ Committee of the BDA‌‌ conference.
Alexis Joly
- Scientific and Technical director of‌ Pl@ntNet platform
- Coordinator of‌ the LifeCLEF international virtual‌‌ lab
Joseph Salmon head of the Statistics and‌ Data Science specialty, doctoral‌ school EDI2S

11.1.6 Scientific‌‌ expertise

Christophe Pradal : member of the INRAE‌ evaluation committee CSS (Scientific‌ Specialist Commission) in Plant‌‌ Integrated Biology
Reza Akbarinia : member of the‌ evaluation committee (section 27),‌ University of Montpellier.
Alexis‌‌ Joly :
- GENCI expert committee (AI thematic‌)
- Scientific Advisory Board‌ of the chaire "Angèle‌‌ St-Pierre / Hugo Larochelle" related to AI applied‌ to environment
- Expert for‌ SNSF (Swiss National Science‌‌ Foundation) - project scientific evaluation
Joseph Salmon :‌
- elected member of the‌ "Commission de Section 26",‌‌ Univ. Montpellier.
- head of the jury for hiring‌ assistant professor (Univ. Montpellier,‌ Faculté des Science)
- Member‌‌ of the Steering Committee‌ for MathPhDInFrance
Patrick Valduriez : consultant on big‌ data for the Software Heritage project

11.1.7 Research‌ administration

Florent Masseglia : deputy scientific director of‌ Inria for the domain "Perception, Cognition and Interaction",‌ 50% of his time until September 2025.
Reza‌ Akbarinia : Scientific referent for research data at‌ Inria branch of Montpellier; Member of Inria national‌ commission for research data.
Esther Pacitti : manager‌ of Polytech' Montpellier's International Relations for the computer‌ science department (100 students).
Patrick Valduriez : scientific‌ manager for the Latin America zone at Inria's‌ Direction of Foreign Relationships (DRI) and scientific director‌ of the Inria-Brasil strategic partnership.
Christophe Pradal :‌ Team leader with C. Granier of the PhenoMEn‌ team of the AGAP Institute.
Alexis Joly :‌ co-manager of a Collaborative Doctoral Partnership between the‌ EU Joint Research Centre of Ispra and the‌ University of Montpellier

11.2 Teaching - Supervision -‌ Juries - Educational and pedagogical outreach

11.2.1 Teaching‌

Esther Pacitti :

IG3: Database design, physical organization,‌ 54h, level, L3, 50 students.
IG4: Distributed Databases‌ and NoSQL, 80h , level M1, 50 students.‌
Large Scale Information Management (Iot, Recommendation Systems, Graph‌ Databases), 27h, level M2, 20 students.
Supervision of‌ industrial projects
Supervision of master internships.
Supervision of‌ computer science discovery projects.

Joseph Salmon :

HAX603X:‌ Stochastic Modeling, 20h, level L3, 50 students.
Supervision‌ of master internships.
Supervision of data science discovery‌ projects.

11.2.2 Supervision

PhD & HDR:

PhD (defended):‌ Cesar Leblanc, Predicting biodiversity future trajectories through deep‌ learning. Advisors: Alexis Joly , Maximilien Servajean, Pierre‌ Bonnet.
PhD (defended 58): Matteo Contini, Multi-scale‌ monitoring of coastal marine biodiversity. Advisors: Sylvain Bonhommeau,‌ Alexis Joly .
PhD in progress: Kawtar Zaher,‌ Novel class retrieval through interactive learning. Advisors: Olivier‌ Buisson, Alexis Joly .
PhD in progress: Guillaume‌ Coulaud, Anomaly Detection in Big Climate Data. Advisors:‌ Reza Akbarinia , Audrey Brouillet, Florent Masseglia .‌
PhD in progress: Loaï Gandeel, Automatic methods for‌ spatio-temporal reconstruction of root architecture. Advisors: Reza Akbarinia‌ , Romain Fernandez, Christophe Pradal .
PhD in‌ progress: Raphaël Benerradi, species trends estimation from citizen‌ science data. Advisors: Christophe Botella , Alexis Joly‌ , Maximilien Servajean.
PhD in progress: Théo Larcher,‌ multi-scale species prediction. Advisors: Alexis Joly , Joseph‌ Salmon , Pierre Bonnet, Marijn Van der Velde.‌
PhD in progress: Sébastien Gigot-Leandri, decision-oriented site occupancy‌ models. Advisors: Alexis Joly , Maximilien Servajean, David‌ Mouillot.
PhD in progress: Alex Maleknia , Influence‌ functions and their applications to machine learning. Advisors:‌ Joseph Salmon , E. Chzhen.

11.2.3 Juries

Members‌ of the team participated in the following PhD‌ or HDR committees:

Reza Akbarinia :
- Sara Jarrad,‌ Sorbonne University (PhD reviewer)
- Omar Ghannou, Aix-Marseille University‌ (PhD reviewer)
Joseph Salmon :
- Benjamin Charlier (directeur‌ du jury d'HDR, Univ. Montpellier)
- Grégoire Pacreau (rapporteur‌ de la thèse, École Polytechnique)
Alexis Joly :‌
- Cesar Leblanc PhD defense, Univ. of Montpellier, (PhD‌ director)
- Matteo Contini PhD defense, Univ. of Montpellier‌ (as PhD director)

11.2.4 Educational and pedagogical outreach

Patrick Valduriez , Chiche‌ (2 actions): Lycée Mermoz,‌ Lycée Clémenceau, Montpellier.
Florent‌‌ Masseglia , Chiche (4 actions): lycée Philippe de‌ Girard, Avignon.

11.3 Popularization‌

11.3.1 Specific official responsibilities‌‌ in science outreach structures

Alexis Joly :
- Member‌ of the steering committee‌ of Pl@ntNet citizen science‌‌ platform
- Scientific Advisory Board of project Le Féral‌

11.3.2 Productions (articles, videos,‌ podcasts, serious games, ...)‌‌

Joseph Salmon : Scientific blogging
Pl@ntNet team:
- Pl@ntNet‌ website, in particular‌ the news section
- Pl@ntNet‌‌ documentation (new in 2025)
- Videos: Participation in Xprize‌, Identify a plant‌ with Pl@ntNet, PlantNet‌‌ for dummies
- Articles: Telabotanica news, CIRAD blog‌, Frandroid app of‌ the week, Gabon‌‌ Actu.

11.3.3 Participation in Live events

Joseph‌ Salmon (2 actions): Mois‌ des mathématiques appliquées et‌‌ industrielles. (Lycée Joffre, Montpellier), IAUM (Univ. Montpellier, audience‌ high schoolers)

11.3.4 Others‌ science outreach relevant activities‌‌

Joseph Salmon : Fête des sciences (carasciences), Montpellier‌

12 Scientific production

12.1‌ Major publications

1 article‌‌C.Christophe Botella, A.Alexis Joly,‌ P.Pierre Bonnet,‌ F.François Munoz and‌‌ P.Pascal Monestiez. Jointly estimating spatial sampling‌ effort and habitat suitability‌ for multiple species from‌‌ opportunistic presence‐only data.Methods in Ecology and‌ Evolution125February‌ 2021, 933-945HAL‌‌DOI
2 articleB.Benjamin Bourel, A.‌Alexis Joly, M.‌Maximilien Servajean, S.‌‌Simon Bettinger, J. A.José Antonio Sanabria-Fernández‌ and D.David Mouillot‌. From Presence‐Only to‌‌ Abundance Species Distribution Models Using Transfer Learning.‌Ecology Letters287‌July 2025, e70177‌‌HAL DOI
3 articleB.Benjamin Deneu,‌ M.Maximilien Servajean,‌ P.Pierre Bonnet,‌‌ C.Christophe Botella, F.François Munoz and‌ A.Alexis Joly.‌ Convolutional neural networks improve‌‌ species distribution modelling by capturing the spatial structure‌ of the environment.‌PLoS Computational Biology17‌‌4April 2021, e1008856HAL DOI
4‌ inproceedingsC.Camille Garcin‌, M.Maximilien Servajean‌‌, A.Alexis Joly and J.Joseph Salmon‌. Stochastic smoothing of‌ the top-K calibrated hinge‌‌ loss for deep imbalanced classification.Proceedings of‌ ICML 2022ICML 2022‌ - 39th International Conference‌‌ on Machine Learning162Baltimore, United StatesPMLR‌2022, 7208-7222HAL‌
5 articleG.Gaëtan‌‌ Heidsieck, D.Daniel de Oliveira, E.‌Esther Pacitti, C.‌Christophe Pradal, F.‌‌Francois Tardieu and P.Patrick Valduriez. Cache-aware‌ scheduling of scientific workflows‌ in a multisite cloud‌‌.Future Generation Computer Systems1222021,‌ 172-186HAL DOI
6‌ articleC.César Leblanc‌‌, P.Pierre Bonnet, M.Maximilien Servajean‌, W.Wilfried Thuiller‌, M.Milan Chytrý‌‌, S.Svetlana Aćić, O.Olivier Argagnon‌, I.Idoia Biurrun‌, G.Gianmaria Bonari‌‌, H.Helge Bruelheide, J. A.Juan‌ Antonio Campos, A.‌Andraž Čarni, R.‌‌Renata Ćušterevska, M.Michele de Sanctis,‌ J.Jürgen Dengler,‌ T.Tetiana Dziuba,‌‌ E.Emmanuel Garbolino,‌ U.Ute Jandt, F.Florian Jansen,‌ J.Jonathan Lenoir, J. E.Jesper Erenskjold‌ Moeslund, A.Aaron Pérez-Haase, R.Remigiusz‌ Pielech, J.Jozef Sibik, Z.Zvjezdana‌ Stančić, D.Domas Uogintas, T.Thomas‌ Wohlgemuth and A.Alexis Joly. Learning the‌ syntax of plant assemblages.Nature Plants11‌October 2025, 2026–2040HAL DOI
7 article‌T.Tanguy Lefort, A.Antoine Affouard,‌ B.Benjamin Charlier, J.-C.Jean-Christophe Lombardo,‌ M.Mathias Chouet, H.Hervé Goëau,‌ J.Joseph Salmon, P.Pierre Bonnet and‌ A.Alexis Joly. Cooperative learning of Pl@ntNet's‌ Artificial Intelligence algorithm: how does it work and‌ how can we improve it?Methods in Ecology‌ and EvolutionFebruary 2025. In press. HAL‌DOI
8 articleM.Maxime Metz, M.‌Matthieu Lesnoff, F.Florent Abdelghafour, R.‌Reza Akbarinia, F.Florent Masseglia and J.-M.‌Jean-Michel Roger. A “big-data” algorithm for KNN-PLS‌.Chemometrics and Intelligent Laboratory Systems203August‌ 2020, 104076HALDOI
9 articleT.‌Tanmoy Mondal, R.Reza Akbarinia and F.‌Florent Masseglia. kNN matrix profile for knowledge‌ discovery from time series.Data Mining and‌ Knowledge Discovery373May 2023, 1055-1089‌HAL DOI
10 bookD.Daniel Oliveira,‌ J.Ji Liu and E.Esther Pacitti.‌ Data-Intensive Workflow Management: For Clouds and Data-Intensive and‌ Scalable Computing Environments.14Synthesis Lectures on‌ Data Management4Morgan&Claypool PublishersMay 2019,‌ 1-179HAL DOI
11 bookT.Tamer Özsu‌ and P.Patrick Valduriez. Principles of Distributed‌ Database Systems - Fourth Edition.Springer2020‌, 1-674HAL DOI
12 articleD.-E. E.‌Djamel-Edine Edine Yagoubi, R.Reza Akbarinia,‌ F.Florent Masseglia and T.Themis Palpanas.‌ Massively Distributed Time Series Indexing and Querying.‌IEEE Transactions on Knowledge and Data Engineering32‌12020, 108-120HAL DOI
13 inproceedings‌C.Chao Zhang, R.Reza Akbarinia and‌ F.Farouk Toumani. Efficient Incremental Computation of‌ Aggregations over Sliding Windows.KDD 2021 -‌ 27th ACM SIGKDD Conference on Knowledge Discovery and‌ Data MiningSingapore (Virtual), Singapore2021, 2136-2144‌HAL DOI

12.2 Publications of the year

International‌ journals

14 articleB.Benjamin Bourel, A.‌Alexis Joly, M.Maximilien Servajean, S.‌Simon Bettinger, J. A.José Antonio Sanabria-Fernández‌ and D.David Mouillot. From Presence‐Only to‌ Abundance Species Distribution Models Using Transfer Learning.‌Ecology Letters287July 2025, e70177‌HAL DOI back to text
15 articleA.‌Audrey Brouillet, G.Guillaume Coulaud, D.‌Dennis Shasha, R.Reza Akbarinia and F.‌Florent Masseglia. ClimBurst: A Novel Method to‌ Detect Climatological Anomalies Over Time and Space.‌Geophysical Research Letters5219October 2025,‌ e2025GL117095HAL DOI back to text
16 article‌A. Y.Abel Yu Hao Chai, S. H.Sue Han Lee‌, F. S.Fei‌ Siang Tay, H.‌‌Hervé Goëau, P.Pierre Bonnet and A.‌Alexis Joly. PlantAIM:‌ A new baseline model‌‌ integrating global attention and local features for enhanced‌ plant disease identification.‌Smart Agricultural Technology10‌‌March 2025, 100813HAL DOI back to‌ text
17 articleM.‌Matteo Contini, V.‌‌Victor Illien, J.Julien Barde, S.‌Sylvain Poulain, S.‌Serge Bernard, A.‌‌Alexis Joly and S.Sylvain Bonhommeau. From‌ underwater to drone: A‌ novel multi-scale knowledge distillation‌‌ approach for coral reef monitoring.Ecological Informatics‌89November 2025,‌ 103149 (16p.)HAL DOI‌‌back to text
18 articleM.Matteo Contini‌, V.Victor Illien‌, M.Mohan Julien‌‌, M.Mervyn Ravitchandirane, V.Victor Russias‌, A.Arthur Lazennec‌, T.Thomas Chevrier‌‌, C. L.Cam Ly Rintz, L.‌Léanne Carpentier, P.‌Pierre Gogendeau, C.‌‌César Leblanc, S.Serge Bernard, A.‌Alexandre Boyer, J.‌Justine Talpaert Daudon,‌‌ S.Sylvain Poulain, J.Julien Barde,‌ A.Alexis Joly,‌ S.Sylvain Bonhommeau and‌‌ T.Thomas Chevrier. Seatizen Atlas: a collaborative‌ dataset of underwater and‌ aerial marine imagery.‌‌Scientific Data 1212025, 67HAL‌DOI back to text‌back to text back‌‌ to text
19 articleS.Sabine Demotes-Mainard,‌ H.Hervé Autret,‌ C.Christophe Pradal,‌‌ J.Julien Le Gall, V.Vincent Guérin‌, N.Nathalie Leduc‌, D.Didier Combes‌‌, C.Christophe Renaud, M.Michaël Chelle‌ and J.Jessica Bertheloot‌. Simulating light quantity‌‌ and quality over plant organs using a ray-tracing‌ method to investigate plant‌ responses in growth chambers‌‌.Biosystems Engineering258September 2025, 104256‌HAL DOI
20 article‌V.Vincent Espitalier,‌‌ J.-C.Jean-Christophe Lombardo, H.Hervé Goëau,‌ C.Christophe Botella,‌ T. T.Toke Thomas‌‌ Høye, M.Mads Dyrmann, P.Pierre‌ Bonnet and A.Alexis‌ Joly. Adapting a‌‌ global plant identification model to detect invasive alien‌ plant species in high-resolution‌ road side images.‌‌Ecological Informatics89May 2025, 103129HAL‌DOI back to text‌
21 articleS.Sabrina‌‌ Kumschick, L.Lysandre Journiac, O.Océane‌ Boulesnane-Genguant, C.Christophe‌ Botella, R.Robin‌‌ Pouteau and M.Mathieu Rouget. Mapping potential‌ environmental impacts of alien‌ species in the face‌‌ of climate change.Biological Invasions271‌January 2025, 43‌HAL DOI
22 article‌‌L.Lorraine Latchoumane, M.Martin Ecarnot,‌ R.Ryad Bendoula,‌ J.-M. J.Jean-Michel J.‌‌ -M. Roger, S.Sílvia Mas Garcia,‌ H.Héloïse Villesseche,‌ F.Flora Tavernier,‌‌ M.Maxime Ryckewaert, N. J.Nathalie J.‌ B. Gorretta, P.‌Pierre Roumet and E.‌‌Elsa Ballini. Early detection of Zymoseptoria tritici‌ infection on wheat leaves‌ using hyperspectral imaging data‌‌.Data in Brief‌59April 2025, 111404HAL DOI
23‌ articleC.César Leblanc, P.Pierre Bonnet‌, M.Maximilien Servajean, W.Wilfried Thuiller‌, M.Milan Chytrý, S.Svetlana Aćić‌, O.Olivier Argagnon, I.Idoia Biurrun‌, G.Gianmaria Bonari, H.Helge Bruelheide‌, J. A.Juan Antonio Campos, A.‌Andraž Čarni, R.Renata Ćušterevska, M.‌Michele de Sanctis, J.Jürgen Dengler,‌ T.Tetiana Dziuba, E.Emmanuel Garbolino,‌ U.Ute Jandt, F.Florian Jansen,‌ J.Jonathan Lenoir, J. E.Jesper Erenskjold‌ Moeslund, A.Aaron Pérez-Haase, R.Remigiusz‌ Pielech, J.Jozef Sibik, Z.Zvjezdana‌ Stančić, D.Domas Uogintas, T.Thomas‌ Wohlgemuth and A.Alexis Joly. Learning the‌ syntax of plant assemblages.Nature Plants11‌October 2025, 2026–2040HAL DOI back to‌ text back to text
24 articleT.Tanguy‌ Lefort, A.Antoine Affouard, B.Benjamin‌ Charlier, J.-C.Jean-Christophe Lombardo, M.Mathias‌ Chouet, H.Hervé Goëau, J.Joseph‌ Salmon, P.Pierre Bonnet and A.Alexis‌ Joly. Cooperative learning of Pl@ntNet's Artificial Intelligence‌ algorithm: how does it work and how can‌ we improve it?Methods in Ecology and Evolution‌February 2025. In press. HAL DOI
25‌ articleS.Sébastien Levionnois, N.Noémie Gaudio‌, R.Rémi Mahmoud, C.Christophe Pradal‌ and C.Corinne Robert. Impact of wheat-legume‌ mix intercrops on wheat epidemics by modelling.‌Field Crops Research336February 2026, 110212‌HAL DOI back to text
26 articleJ.‌Ji Liu, B.Beichen Ma, Q.‌Qiaolin Yu, Y.Yang Zhou, J.‌Jingbo Zhou, R.Ruoming Jin, D.‌Dejing Dou, H.Huaiyu Dai, H.‌Haixun Wang and P.Patrick Valduriez. Efficient‌ Federated Learning with Heterogeneous Data and Adaptive Dropout‌.ACM Transactions on Knowledge Discovery from Data‌ (TKDD)198September 2025, 1-31HAL‌DOI back to text
27 article D.Diego‌ Marcos, R.Robert van de Vlasakker,‌ I. N.Ioannis N. Athanasiadis, P.Pierre‌ Bonnet, H.Hervé Goëau, A.Alexis‌ Joly, W. D.W. Daniel Kissling,‌ C.César Leblanc, A. S.André S.J.‌ van Proosdij and K. P.Konstantinos P. Panousis‌. Fully automatic extraction of morphological traits from‌ the web: Utopia or reality? Applications in Plant‌ Sciences 13 3 June 2025 HAL DOI back‌ to text
28 articleG.Giulio Martellucci,‌ H.Hervé Goëau, P.Pierre Bonnet,‌ F.Fabrice Vinatier and A.Alexis Joly.‌ PlantCLEF 2025: Advancing AI-based Multi-Species Plant Identification in‌ Vegetation Quadrats for Supporting Environmental Law and Biodiversity‌ Monitoring.Biodiversity Information Science and Standards9‌December 2025, e181733HAL DOI
29 article‌B. M.Barbara M. Neto‐bradley, P.Pierre‌ Bonnet, H.Hervé Goëau, A.Alexis Joly, J.Jeannine‌ Cavender‐bares and D. A.‌David A. Coomes.‌‌ Using reflectance spectra and Pl@ntNet to identify herbarium‌ specimens: a case study‌ with <i>Lithocarpus</i>.New‌‌ PhytologistJune 2025HALDOI back to text‌
30 articleE.Ellen‌ Paixão, H.Helga‌‌ Balbi, E.Esther Pacitti, F.Fabio‌ Porto, J.Joel‌ Santos and E.Eduardo‌‌ Ogasawara. FFT-Based Anomaly Detectors: Cutoff Frequency Adjustment‌ and SMA-Based Approach.‌Journal of Information and‌‌ Data Management2025. In press. HAL
31‌ article L.Lukas Rauch‌, R.René Heinrich‌‌, I.Ilyass Moummad, A.Alexis Joly‌, B.Bernhard Sick‌ and C.Christoph Scholz‌‌. Can Masked Autoencoders Also Listen to Birds?‌ Transactions on Machine Learning‌ Research Journal August 2025‌‌ HAL
32 articleF.Francesco Reyes, B.‌Benjamin Pitchers, C.‌Christophe Pradal and P.-É.‌‌Pierre-Éric Lauri. Young apple tree development under‌ agroforestry radiative conditions: a‌ multi-scale morphological and architectural‌‌ dataset.AoB Plants17August 2025,‌ plaf029HAL DOI
33‌ articleR.Rebecca Salles‌‌, B.Benoit Lange, R.Reza Akbarinia‌, F.Florent Masseglia‌, E.Eduardo Ogasawara‌‌ and E.Esther Pacitti. Scalable and accurate‌ online multivariate anomaly detection‌.Information Systems131‌‌June 2025, 102524HAL DOI back to‌ text
34 articleR.‌ A.Rodrigo A. P.‌‌ Silva, W.Wesley Ferreira, E.Esther‌ Pacitti, Y. Y.‌Yuri Y. Frota and‌‌ D.Daniel de Oliveira. A Greedy Constructive‌ Heuristic for Executing Cloud-based‌ Workflows with Data Confidentiality‌‌ Restrictions.SN Computer Science7January 2026‌, 1-52/92HAL DOI‌
35 articleJ.Jiexun‌‌ Xu, K.Kimberly Low, C.Christophe‌ Botella, B.Benjamin‌ Deneu, D.Dennis‌‌ Shasha and A.Alexis Joly. Cascading predictions‌ from common to uncommon‌ species improves species distribution‌‌ models for plants.Ecological Informatics93February‌ 2026, 103424HAL‌DOI back to text‌‌

International peer-reviewed conferences

36 inproceedingsA.Ananthu Aniraj‌, C. F.Cassio‌ F. Dantas, D.‌‌Dino Ienco and D.Diego Marcos. PDiscoFormer:‌ Relaxing Part Discovery Constraints‌ with Vision Transformers.‌‌Proceedings of the European Conference on Computer Vision‌ (ECCV)ECCV 2024 -‌ 18th European Conference on‌‌ Computer Vision15143Lecture Notes in Computer Science‌Milano, ItalySpringer Nature‌ SwitzerlandNovember 2025,‌‌ 256-272HAL DOI
37 inproceedingsJ. S.Juan‌ Sebastián Cañas, S.‌Stefan Kahl, T.‌‌Tom Denton, M. P.María Paula Toro-Gómez‌, S.Susana Rodriguez-Buritica‌, J. L.Jose‌‌ Luis Benavides-Lopez, J. S.Juan Sebastián Ulloa‌, P.Paula Caycedo-Rosales‌, H.Holger Klinck‌‌, H.Hervé Goëau, R.Robert Planqué‌, W.-P.Willem-Pier Vellinga‌ and A.Alexis Joly‌‌. Overview of BirdCLEF+ 2025: Multi-Taxonomic Sound Identification‌ in the Middle Magdalena,‌ Colombia.CEUR workshop‌‌CLEF 2025 - Working Notes of the Conference‌ and Labs of the‌ Evaluation ForumCEUR-4038CEUR‌‌ Workshop Proceedings232Madrid,‌ SpainCEUR2025, 2909-2919HAL
38 inproceedings‌A. Y.Abel Yu Hao Chai, K.‌ L.Kelly Li Zhen Jee, S. H.‌Sue Han Lee, F. S.Fei Siang‌ Tay, J.Jules Vandeputte, H.Hervé‌ Goeau, P.Pierre Bonnet and A.Alexis‌ Joly. Deep-Plant-Disease Dataset Is All You Need‌ for Plant Disease Identification.ACM digital library‌ACM MM 2025 - 33. ACM International Conference‌ on MultimediaMM '25: Proceedings of the 33rd‌ ACM International Conference on MultimediaDublin, IrelandACM‌October 2025, 12578-12584HAL DOI back to‌ text
39 inproceedingsM.Matteo Contini, V.‌Victor Illien, S.Sylvain Poulain, S.‌Serge Bernard, J.Julien Barde, S.‌Sylvain Bonhommeau and A.Alexis Joly. The‌ point is the mask: scaling coral reef segmentation‌ with weak supervision.ICCV 2025 - Joint‌ Workshop on Marine VisionHonolulu, Hawaii, United States‌October 2025HAL back to text
40 inproceedings‌G.Guillaume Coulaud, B.Benoit Lange,‌ D.Dennis Shasha, A.Audrey Brouillet,‌ R.Reza Akbarinia and F.Florent Masseglia.‌ ClimBurst: A Dynamic Visualization Tool to Display Climatological‌ Anomalies over Time and Space.Proceedings of‌ the 34th ACM International Conference on Information and‌ Knowledge ManagementCIKM 2025 - 34th ACM International‌ Conference on Information and Knowledge ManagementSeoul, South‌ KoreaACMNovember 2025, 6629-6633HAL DOI‌back to text
41 inproceedingsJ.-B.Jean-Baptiste Fermanian‌, M.Mohamed Hebiri and J.Joseph Salmon‌. Class conditional conformal prediction for multiple inputs‌ by p-value aggregation.NeurIPS 2025, The Thirty-Ninth‌ Annual Conference on Neural Information Processing SystemsSan‌ Diego (CA), United StatesDecember 2025HAL back‌ to text
42 inproceedingsJ.-B.Jean-Baptiste Fermanian,‌ P.Pierre Humbert and G.Gilles Blanchard.‌ Transductive Conformal Inference for Full Ranking.NeurIPS‌ 2025, The Thirty-Ninth Annual Conference on Neural Information‌ Processing SystemsSan Diego (CA), United StatesDecember‌ 2025HAL back to text
43 inproceedingsC.‌Camille Garcin, M.Maximilien Servajean, A.‌Alexis Joly and J.Joseph Salmon. A‌ two-head loss function for deep Average-K classification.‌WACV 2025 - IEEE/CVF Winter Conference on Applications‌ of Computer VisionTucson, United StatesIEEE2025‌, 7358-7367HAL DOI
44 inproceedingsL.Lucas‌ Giusti Tavares, J.Janio Lima, M.‌Matheus Melo, C.Chao Chen, J.‌ M.Jonathan M. Garibaldi, G.Gabriel dos‌ Santos Scatena, A. H.Anna Helena Reali‌ Costa, E. S.Edson Satoshi Gomi,‌ R.Rebecca Salles, E.Esther Pacitti,‌ I.Ismael Santos, I. G.Isabela Guimarães‌ Siqueira, F.Fabio Porto, D.Diego‌ Carvalho, R.Rafaelli Coutinho and E.Eduardo‌ Ogasawara. Fuzzy-Based Ensemble Method for Robust Concept‌ Drift Detection in Multivariate Time Series.IEEE‌ XploreIJCNN 2025 - International Joint Conference on‌ Neural NetworksRome, ItalyIEEE2025, 1-12HAL DOI back to‌ text
45 inproceedingsA.‌Alexis Joly, L.‌‌Lukáš Picek, S.Stefan Kahl, H.‌Hervé Goëau, L.‌Lukáš Adam, C.‌‌Christophe Botella, M.Maximilien Servajean, D.‌Diego Marcos, C.‌Cesar Leblanc, T.‌‌Théo Larcher, J.Jiří Matas, K.‌Klára Janoušková, V.‌Vojtěch Čermák, K.‌‌Kostas Papafitsoros, R.Robert Planqué, W.-P.‌Willem-Pier Vellinga, H.‌Holger Klinck, T.‌‌Tom Denton, P.Pierre Bonnet and H.‌Henning Müller. LifeCLEF‌ 2025 Teaser: Challenges on‌‌ Species Presence Prediction and Identification, and Individual Animal‌ Identification.Lecture notes‌ in computer science47.‌‌ European Conference on Information Retrieval - ECIR 2025‌Advances in Information Retrieval.‌ ECIR 2025 : part‌‌ VLNCS-15576Lucca, ItalySpringer Nature SwitzerlandApril‌ 2025, 373-381HAL‌DOI back to text‌‌
46 inproceedingsC.César Leblanc, L.Lukáš‌ Picek, B.Benjamin‌ Deneu, P.Pierre‌‌ Bonnet, M.Maximilien Servajean, R.Rémi‌ Palard and A.Alexis‌ Joly. Mapping Biodiversity‌‌ at Very-High Resolution in Europe.CVPR 2025‌ - IEEE/CVF Computer Vision‌ and Pattern Recognition Conference‌‌Nashville, TN, United StatesIEEEJune 2025,‌ 2340-2349HAL DOI back‌ to text
47 inproceedings‌‌J. Z.Jerad Zherui Liaw, A. Y.‌Abel Yu Hao Chai‌, S. H.Sue‌‌ Han Lee, P.Pierre Bonnet and A.‌Alexis Joly. Can‌ Language Improve Visual Features‌‌ For Distinguishing Unseen Plant Diseases?ICPR 2024 -‌ 27th International Conference on‌ Pattern Recognition15330Lecture‌‌ Notes in Computer ScienceKolkata, IndiaSpringer Nature‌ SwitzerlandJanuary 2025,‌ 296-311HAL DOI
48‌‌ inproceedingsJ.Janio Lima, H.Hélio Castro‌, L.Luiz Oliveira‌, E.Ellen Paixão‌‌, L.Lais Baroni, R.Rebecca Salles‌, R.Ricardo Vargas‌ and E.Eduardo Ogasawara‌‌. UniTED: A Unified Time Series Event Detection‌ Repository.Brazilian e-Science‌ Workshop - SBCOPENLIBBreSci‌‌ 2025 - XIX Brazilian e-Science WorkshopAnais do‌ XIX Brazilian e-Science Workshop‌ 2025Brasil, BrazilSeptember‌‌ 2025, 1-8HALDOI
49 inproceedingsG.‌Giulio Martellucci, H.‌Hervé Goëau, P.‌‌Pierre Bonnet, F.Fabrice Vinatier and A.‌Alexis Joly. Overview‌ of PlantCLEF 2025: Multi-Species‌‌ Plant Identification in Vegetation Quadrat Images ⋆ Notebook‌ for the LifeCLEF Lab‌ at CLEF 2025.‌‌CLEF 2025 - Working Notes of the Conference‌ and Labs of the‌ Evaluation Forum4038CEUR‌‌ - Workshop ProceedingsMadrid, SpainCEUR2025,‌ 2942-2954HAL back to‌ text
50 inproceedingsR.‌‌Rodrigo Parracho, F.Fernando Alexandrino, M.‌Matheus de Souza Figueiredo‌, L.Lucas Pereira‌‌ da Silva, B.Bruno Dutra de Macedo‌, A.Arthur Lamblet‌ Vaz, D.Davi‌‌ Louback, V. C.Victor Coculilo Desouzart,‌ R.Rebecca Salles,‌ F.Fabio Porto,‌‌ D.Diego Carvalho and E.Eduardo Ogasawara.‌ Leveraging Large Language Models‌ for Time Series Prediction‌‌ on Low-Frequency Data.‌SBBD 2025 - 40º Simpósio Brasileiro de Banco‌ de DadosFortaleza, BrazilSociedade Brasileira de Computação‌ - SBC2025, 196-208HAL DOI
51‌ inproceedingsL.Lukáš Picek, C.César Leblanc‌, T.Théo Larcher, M.Maximilien Servajean‌, P.Pierre Bonnet and A.Alexis Joly‌. Overview of GeoLifeCLEF 2025: Plant Species Presence‌ Prediction with Environmental and High-resolution Remote Sensing Data‌.CLEF 2025 - Working Notes of the‌ Conference and Labs of the Evaluation Forum4038‌CEUR Workshop ProceedingsMadrid, SpainCEUR2025,‌ 2932-2941HAL back to text
52 inproceedingsE.‌Edson Pinto Sobrinho, J.Jéssica Souza,‌ J.Janio Lima, L.Lucas Giusti,‌ E.Eduardo Bezerra, R.Rafaelli Coutinho,‌ L.Lais Baroni, E.Esther Pacitti,‌ F.Fabio Porto, K.Kele Belloze and‌ E.Eduardo Ogasawara. Fine-Tuning Detection Criteria for‌ Enhancing Anomaly Detection in Time Series.SBBD‌ 2025 - 40º Simpósio Brasileiro de Banco de‌ DadosFortaleza, BrazilSociedade Brasileira de Computação -‌ SBC2025, 209-221HAL DOI back to‌ text
53 inproceedingsR.Rebecca Salles, B.‌Benoit Lange, R.Reza Akbarinia, F.‌Florent Masseglia and E.Esther Pacitti. Energy‌ Efficient Time Series Anomaly Detection.ICECET 2025‌ - International Conference on Electrical, Computer and Energy‌ TechnologiesParis, FranceJuly 2025HAL back to‌ text

Conferences without proceedings

54 inproceedingsM.Meije‌ Gawinowski, L.Laurène Perthame, F.Frédéric‌ Rees, C.Céline Richard-Molard, C.Christophe‌ Pradal, P.Pierre Barbillon and A.Alexandra‌ Jullien. Méta-modélisation des interactions plante-plante en 3D‌ : application à l'association colza-féverole.Journées scientifiques‌ 2025 du PEPR Agroécologie et NumériqueDijon, France‌January 2025HAL
55 inproceedingsG.Gabriela Moraes‌, F.Fabio Porto, F.Federico Ulliana‌, J.-F.Jean-François Baget, M.Michel Leclère‌, P.Pierre Bisquert, B.Bernardo Gonçalves‌ and P.Patrick Valduriez. Gypscie-KG: Building a‌ Logic-Based Approach for Knowledge Graph Data Integration View‌ in ML Systems.LAGO 2025 – LLMs,‌ Análise de Grafos e Ontologias (SBBD)Fortaleza, Brazil‌September 2025HAL back to text

Scientific books‌

56 bookE. S.Eduardo S. Ogasawara,‌ R.Rebecca Salles, F.Fabio Porto and‌ E.Esther Pacitti. Event Detection in Time‌ Series.Synthesis Lectures on Data Management (SLDM)‌SpringerFebruary 2025, 1-178HAL back to‌ text

Scientific book chapters

57 inbookL.Lukáš‌ Picek, S.Stefan Kahl, H.Hervé‌ Goëau, L.Lukáš Adam, T.Théo‌ Larcher, C.Cesar Leblanc, M.Maximilien‌ Servajean, K.Klára Janoušková, J.Jiří‌ Matas, V.Vojtěch Čermák, K.Kostas‌ Papafitsoros, R.Robert Planqué, W.-P.Willem-Pier‌ Vellinga, H.Holger Klinck, T.Tom‌ Denton, J.Juan Sebastián Cañas, G.‌Giulio Martellucci, F.Fabrice Vinatier, P.‌Pierre Bonnet and A.Alexis Joly. Overview of LifeCLEF 2025: Challenges‌ on Species Presence Prediction‌ and Identification, and Individual‌‌ Animal Identification.16089Experimental IR Meets Multilinguality,‌ Multimodality, and Interaction: 16th‌ International Conference of the‌‌ CLEF Association, CLEF 2025Lecture Notes in Computer‌ ScienceSpringer Nature Switzerland‌January 2026, 338-362‌‌HAL DOI back to text

Doctoral dissertations and‌ habilitation theses

58 thesis‌M.Matteo Contini.‌‌ Multi-scale mapping of changes in tropical reefs.‌Université de MontpellierNovember‌ 2025HAL back to‌‌ text

Reports & preprints

59 miscE.Eva‌ Coindre, R.Romain‌ Boulord, L.Laurine‌‌ Chir, V.Virgilio Freitas, M.Maxime‌ Ryckewaert, T.Thomas‌ Laisné, V.Virginie‌‌ Bouckenooghe, M.Maëlle Lis, L.Llorenç‌ Cabrera-Bosquet, A.Agnès‌ Doligez, T.Thierry‌‌ Simonneau, B.Benoît Pallas, A.Aude‌ Coupel-Ledru and V.Vincent‌ Segura. Robustness of‌‌ high-throughput prediction of leaf ecophysiological traits using near‌ infrared spectroscopy and poro-fluorometry‌.February 2025HAL‌‌DOI
60 reportG.Guillaume Coulaud, R.‌Reza Akbarinia, A.‌Audrey Brouillet and F.‌‌Florent Masseglia. Leveraging Data Seasonality and Matrix‌ Profile for Anomaly Detection:‌ Application to Climate Time‌‌ Series.RR-9572InriaJanuary 2025HAL back‌ to text
61 misc‌T.Tiffany Ding,‌‌ J.-B.Jean-Baptiste Fermanian and J.Joseph Salmon.‌ Conformal Prediction for Long-Tailed‌ Classification.July 2025‌‌HAL back to text
62 miscK.Kian‌ Faizi, P.Preyanka‌ Mehta, A.Amy‌‌ Maida, T.Taylor Humphreys, E.Elizabeth‌ Berrigan, L.Leo‌ Mc Kee-Reid, R.‌‌Robbie Mc Corkell, A.Arush Tagade,‌ J.Jessica Rumbelow,‌ J.Julia Showalter,‌‌ L.Lukas Brent, C.Clementine Coroenne,‌ A.Audrey Rigaud,‌ A.Arjun Chandrasekhar,‌‌ S.Saket Navlakha, A.Antoine Martin,‌ C.Christophe Pradal,‌ S.Sanghwa Lee,‌‌ W.Wolfgang Busch and M. P.Matthieu Pierre‌ Platre. Growth Cost‌ and Transport Efficiency Tradeoffs‌‌ Define Root System Optimization Across Varying Developmental Stages‌ and Environments in Arabidopsis‌.July 2025,‌‌ 1-25HAL DOI
63 miscH.Hanna Huitu‌, T.-E.Tor-Einar Skog‌, C.Christophe Pradal‌‌, A.Antonio Calatayud, T.Tor Skaslien‌, B.Brita Linnestad‌, A.Ari Ronkainen‌‌, C.Christian Fournier, M.Marc Labadie‌, D.Dave Skirvin‌, M.Matti Pastell‌‌, D.David Melchior, J. T.Johannes‌ Tobiassen Langvatn and B.‌Berit Nordskog. Developing‌‌ a generic DSS model metadata catalogue and APIs‌ for crop protection.‌August 2025HAL DOI‌‌back to text
64 miscM.Maxime Ryckewaert‌, D.Diego Marcos‌, C.Christophe Botella‌‌, M.Maximilien Servajean, P.Pierre Bonnet‌ and A.Alexis Joly‌. Applying the maximum‌‌ entropy principle to neural networks enhances multi-species distribution‌ models.January 2026‌HAL

Other scientific publications‌‌

65 inproceedingsR.Romane Dubois, L.Lydia‌ Bousset, M.Melen‌ Leclerc, N.Nicolas‌‌ Parisey and A.Alexis‌ Joly. Weakly supervised segmentation of leaf symptoms‌ in field conditions.Workshop Franco-Britannique organisé par‌ le réseau « Modélisation et statistique pour la‌ santé des animaux et des plantes »Paris,‌ FranceOctober 2025HAL
66 inproceedingsT.Tristan‌ Gérault, R.Romain Barillot, C.Christophe‌ Pradal, M.Marion Gauthier, C.Céline‌ Richard-Molard, B.Bruno Andrieu, A.Alexandra‌ Jullien and F.Frédéric Rees. Modelling C‌ and N root-soil exchanges with 3D root and‌ shoot growth based on plant ecophysiology: the Wheat-BRIDGES‌ approach.Rhizosphere 6 - Rooting for Earth‌Edinburgh, United Kingdom2025HAL
67 miscI.‌Ilyass Moummad. Audiocarnet - Deep Representation Learning‌ from Unlabeled Bioacoustic Data.November 2025HAL‌

Scientific popularization

68 inbookI.Ioana Manolescu and‌ P.Patrick Valduriez. De nouvelles architectures pour‌ les Big Data.Le calcul à découvert‌CNRS EditionsJanuary 2025HAL

Software

69 software‌F.Frédéric Rees, C.Christophe Pradal,‌ M.Marion Gauthier and T.Tristan Gérault.‌ RhizoDep: a Functional-Structural Root Model to simulate rhizodeposition‌.1.0.0April 2025 lic: CeCILL-C.HAL‌Software Heritage VCS

12.3 Cited publications

70 article‌M.Moloud Abdar, F.Farhad Pourpanah,‌ S.Sadiq Hussain, D.Dana Rezazadegan,‌ L.Li Liu, M.Mohammad Ghavamzadeh,‌ P.Paul Fieguth, X.Xiaochun Cao,‌ A.Abbas Khosravi, U. R.U Rajendra‌ Acharya and others. A review of uncertainty‌ quantification in deep learning: Techniques, applications and challenges‌.Information Fusion762021, 243--297back‌ to text
71 incollectionE.Enis Afgan,‌ J.Jeremy Goecks, D.Dannon Baker,‌ N.Nate Coraor, A.Anton Nekrutenko and‌ J.James Taylor. Galaxy: A Gateway to‌ Tools in e-Science.Guide to e-Science, Next‌ Generation Scientific Research and DiscoveryComputer Communications and‌ NetworksSpringer2011, 145--177back to text‌
72 articleR.Reza Akbarinia, C.Christophe‌ Botella, A.Alexis Joly, F.Florent‌ Masseglia, M.Marta Mattoso, E.Eduardo‌ Ogasawara, D.Daniel de Oliveira, E.‌Esther Pacitti, F.Fabio Porto, C.‌Christophe Pradal, D.Dennis Shasha and P.‌Patrick Valduriez. Life Science Workflow Services (LifeSWS):‌ motivations and architecture.Transactions on Large-Scale Data-‌ and Knowledge-Centered SystemsURL: https://hal-lirmm.ccsd.cnrs.fr/lirmm-04173545back to text‌
73 articleT.Tatsuya Amano, J. D.‌James DL Lamming and W. J.William J‌ Sutherland. Spatial gaps in global biodiversity information‌ and the role of citizen science.Bioscience‌6652016, 393--400back to text‌
74 articleM.Marc Besson, J.Jamie‌ Alison, K.Kim Bjerge, T. E.‌Thomas E Gorochowski, T. T.Toke T‌ H\o{}ye, T.Tommaso Jucker, H. M.‌Hjalte MR Mann and C. F.Christopher F‌ Clements. Towards the fully automated monitoring of‌ ecological communities.Ecology Letters25122022, 2753--2775back to‌ text
75 articleC.‌Christophe Botella, P.‌‌Pierre Bonnet, C.Cang Hui, A.‌Alexis Joly and D.‌ M.David M Richardson‌‌. Dynamic species distribution modeling reveals the pivotal‌ role of human-mediated long-distance‌ dispersal in plant invasion‌‌.Biology1192022, 1293back‌ to text back to‌ text
76 articleC.‌‌Christophe Botella, A.Alexis Joly, P.‌Pierre Bonnet, P.‌Pascal Monestiez and F.‌‌François Munoz. A deep learning approach to‌ species distribution modelling.‌Multimedia Tools and Applications‌‌ for Environmental & Biodiversity Informatics2018, 169--199‌back to text
77‌ articleC.Christophe Botella‌‌, A.Alexis Joly, P.Pierre Bonnet‌, F.François Munoz‌ and P.Pascal Monestiez‌‌. Jointly estimating spatial sampling effort and habitat‌ suitability for multiple species‌ from opportunistic presence-only data‌‌.Methods in Ecology and Evolution125‌2021, 933--945back‌ to text
78 article‌‌C.Christophe Botella, A.Alexis Joly,‌ P.Pascal Monestiez,‌ P.Pierre Bonnet and‌‌ F.François Munoz. Bias in presence-only niche‌ models related to sampling‌ effort and species niches:‌‌ Lessons for background point selection.PLoS One‌1552020,‌ e0232078back to text‌‌
79 articleM.Mark Chandler, L.Linda‌ See, K.Kyle‌ Copas, A. M.‌‌Astrid MZ Bonde, B. C.Bernat Claramunt‌ López, F.Finn‌ Danielsen, J. K.‌‌Jan Kristoffer Legind, S.Siro Masinde,‌ A. J.Abraham J‌ Miller-Rushing, G.Greg‌‌ Newman and others. Contribution of citizen science‌ towards international biodiversity monitoring‌.Biological conservation213‌‌2017, 280--294back to text
80 article‌S. E.Stephen E‌ Fick and R. J.‌‌Robert J Hijmans. WorldClim 2: new 1-km‌ spatial resolution climate surfaces‌ for global land areas‌‌.International journal of climatology37122017‌, 4302--4315back to‌ text
81 articleM.‌‌Matteo Fontana, G.Gianluca Zeni and S.‌Simone Vantini. Conformal‌ prediction: a unified review‌‌ of theory and new challenges.Bernoulli29‌12023, 1--23‌back to text
82‌‌ articleC.Camille Garcin, M.Maximilien Servajean‌, A.Alexis Joly‌ and J.Joseph Salmon‌‌. A two-head loss function for deep Average-K‌ classification.arXiv preprint‌ arXiv:2303.181182023back to‌‌ text
83 articleS.Stephen Goff and others‌. The iPlant Collaborative:‌ Cyberinfrastructure for Plant Biology‌‌.Frontiers in Plant Science22011back‌ to text
84 book‌T.Tony Hey,‌‌ S.Stewart Tansley, K.Kristin Tolle and‌ J.Jim Gray.‌ The Fourth Paradigm: Data-Intensive‌‌ Scientific Discovery.Microsoft ResearchOctober 2009back‌ to text
85 article‌W.Walter Jetz,‌‌ M. A.Melodie A McGeoch, R.Robert‌ Guralnick, S.Simon‌ Ferrier, J.Jan‌‌ Beck, M. J.Mark J Costello,‌ M.Miguel Fernandez,‌ G. N.Gary N‌‌ Geller, P.Petr‌ Keil, C.Cory Merow and others.‌ Essential biodiversity variables for mapping and monitoring species‌ populations.Nature ecology & evolution34‌2019, 539--551back to text
86 inproceedings‌H.-C.Hyun-Chul Kim and Z.Zoubin Ghahramani.‌ Bayesian classifier combination.Artificial Intelligence and Statistics‌PMLR2012, 619--627back to text
87‌ inproceedingsB.Benoit Lange, R.Reza Akbarinia‌ and F.Florent Masseglia. A One-Health Platform‌ for Antimicrobial Resistance Data Analytics.CIKM '24:‌ Proceedings of the 33rd ACM International Conference on‌ Information and Knowledge ManagementCIKM '24: Proceedings of‌ the 33rd ACM International Conference on Information and‌ Knowledge ManagementBoise, United StatesOctober 2024,‌ 5230-5233HAL DOI back to text
88 techreport‌T.T. Lefort, B.B. Charlier,‌ A.A. Joly and J.J. Salmon.‌ Identify ambiguous tasks combining crowdsourced labels by weighting‌ Areas Under the Margin.2022, arXiv:2209.15380‌back to text
89 articleY.Yuanyuan Liu‌, S.Shaoqiang Wang, J.Jinghua Chen‌, B.Bin Chen, X.Xiaobo Wang‌, D.Dongze Hao and L.Leigang Sun‌. Rice Yield Prediction and Model Interpretation Based‌ on Satellite and Climatic Indicators Using a Transformer‌ Method.Remote Sensing14192022,‌ 5045back to text
90 phdthesisT.Titouan‌ Lorieul. Uncertainty in predictions of Deep Learning‌ models for fine-grained classification.Université MontpellierDecember‌ 2020HAL back to text
91 phdthesisT.‌Titouan Lorieul. Uncertainty in predictions of deep‌ learning models for fine-grained classification.Université Montpellier‌2020back to text
92 articleT.Tanmoy‌ Mondal, R.Reza Akbarinia and F.Florent‌ Masseglia. kNN Matrix Profile for Knowledge Discovery‌ from Time Series.Data Mining and Knowledge‌ Discovery (DMKD)2023back to text
93 article‌M.Marc Ohlmann, F.François Munoz,‌ F.François Massol and W.Wilfried Thuiller.‌ Assessing mutualistic metacommunity capacity by integrating spatial and‌ interaction networks.arXiv preprint arXiv:2206.110292022back‌ to text
94 inproceedingsG.G. Pleiss,‌ T.T. Zhang, E. R.E. R.‌ Elenberg and K. Q.K. Q. Weinberger.‌ Identifying mislabeled data using the area under the‌ margin ranking.NeurIPS2020back to text‌
95 inproceedingsC.Christophe Pradal, C.Christian‌ Fournier, P.Patrick Valduriez and S. C.‌Sarah Cohen Boulakia. OpenAlea: scientific workflows combining‌ data analysis and simulation.International Conference on‌ Scientific and Statistical Database Management (SSDBM)2015,‌ 11:1--11:6back to text
96 articleG.Glenn‌ Shafer and V.Vladimir Vovk. A Tutorial‌ on Conformal Prediction..Journal of Machine Learning‌ Research932008back to text
97‌ articleA. J.Arco J van Strien,‌ T.Tim Termaat, D.Dick Groenendijk,‌ V.Victor Mensing and M.Marc Kéry.‌ Site-occupancy models may offer new opportunities for dragonfly‌ monitoring based on daily species lists.Basic and Applied Ecology11‌62010, 495--503‌back to text
98‌‌ inproceedingsM.Mukund Sundararajan and A.Amir Najmi‌. The many Shapley‌ values for model explanation‌‌.International conference on machine learningPMLR2020‌, 9269--9278back to‌ text
99 articleC.‌‌Cyrille Violle, P.Philippe Choler, B.‌Benjamin Borgy, E.‌Eric Garnier, B.‌‌Bernard Amiaud, G.Guilhem Debarros, S.‌Sylvain Diquelou, S.‌Sophie Gachet, C.‌‌Claudy Jolivet, J.Jens Kattge and others‌. Vegetation ecology meets‌ ecosystem science: Permanent grasslands‌‌ as a functional biogeography case study.Science‌ of the Total Environment‌5342015, 43--51‌‌back to text
100 articleD.-E. E.Djamel-Edine‌ Edine Yagoubi, R.‌Reza Akbarinia, F.‌‌Florent Masseglia and T.Themis Palpanas. Massively‌ Distributed Time Series Indexing‌ and Querying.IEEE‌‌ Transactions on Knowledge and Data Engineering321‌2020, 108-120HAL‌DOI back to text‌‌back to text
101 inproceedingsC.-C. M.Chin-Chia‌ Michael Yeh, Y.‌Yan Zhu, L.‌‌Liudmila Ulanova, N.Nurjahan Begum, Y.‌Yifei Ding, H.‌ A.Hoang Anh Dau‌‌, D. F.Diego Furtado Silva, A.‌Abdullah Mueen and E.‌ J.Eamonn J. Keogh‌‌. Matrix Profile I: All Pairs Similarity Joins‌ for Time Series: A‌ Unifying View That Includes‌‌ Motifs, Discords and Shapelets.IEEE 16th International‌ Conference on Data Mining,‌ ICDM 2016, December 12-15,‌‌ 2016, Barcelona, SpainIEEE Computer Society2016,‌ 1317--1322URL: https://doi.org/10.1109/ICDM.2016.0179DOI‌back to text

IROKO - 2025

IROKO - 2025

2025Activity​‌﻿﻿ reportProject-TeamIROKO

Keywords

Computer Science and​​﻿﻿ Digital Science

Other Research Topics and​​﻿﻿ Application Domains

1 Team members, visitors,​​﻿﻿ external collaborators

Research Scientists​​​‌

Faculty Members​​​‌

Post-Doctoral Fellows

PhD Students﻿‌​‌

Technical Staff

Interns​​​‌ and Apprentices

Administrative Assistant​​﻿﻿

Visiting Scientist

External Collaborators

2 Overall objectives​​​‌

3 Research program

3.1 Big Data﻿​﻿﻿ and Scalability

Unified data–model–workflow​‌﻿﻿ services.

Scalable​​​‌ time-series analytics at climate﻿﻿﻿‌ scale.

3.2 Machine Learning with﻿﻿﻿‌ Humans in the Loop﻿‌​‌

Cooperative learning in citizen﻿​​﻿ science.

Bias-aware﻿﻿﻿‌ species distribution models from﻿‌​‌ opportunistic data.

Uncertainty,﻿​​﻿ trust, and interpretability.

3.3​‌﻿﻿ Multiscale & Multimodal Data​​﻿﻿ Analytics

Multimodal foundation models​​​‌ for biodiversity and agro-ecology﻿​﻿﻿ monitoring.

Multivariate time​‌﻿﻿ series and scalable similarity.​​﻿﻿

Biodiversity trajectories and​​​‌ community structure.

4 Application domains

5​​​‌ Social and environmental responsibility﻿﻿﻿‌

5.1 Footprint of research﻿‌​‌ activities

5.2 Impact​​﻿﻿ of research results

6 Highlights of the​​﻿﻿ year

6.1 Awards

6.2 Other﻿​﻿﻿ key achievements

7 Latest​​​‌ software developments, platforms, open﻿﻿﻿‌ data

7.1 New Features﻿‌​‌ in the Pl@ntNet Platform﻿​​﻿

7.2 New platforms﻿​​﻿

7.2.1 GeoPl@ntNet

7.2.2 PROMISE

7.3 Open data

8 New results﻿﻿﻿‌

8.1 Distributed Data and﻿‌​‌ Model Management

8.1.1 A﻿​​﻿ Logic-Based Approach for Knowledge​​​‌ Graph Data Integration

8.1.2 Federated Learning﻿﻿﻿‌

8.1.3 Distributed Web Infrastructure﻿‌​‌ for Integrated Pest Management﻿​​﻿

8.2 Data﻿​﻿﻿ Analytics

8.2.1 Event Detection​‌﻿﻿ in Time Series

8.2.2 Scalable Multivariate Anomaly﻿‌​‌ Detection

8.2.3 Detecting Anomalies​​​‌ with Any Duration in﻿﻿﻿‌ Climate Time Series

8.2.4 Energy Efficient Time﻿﻿﻿‌ Series Anomaly Detection

8.2.5 Extending​‌﻿﻿ Matrix Profile for Seasonal​​﻿﻿ Anomaly Detection

8.3 Machine Learning​​﻿﻿ for Biodiversity and Agroecology​​​‌

8.3.1 Learning Ecological Structure﻿​﻿﻿ with Large Language Models​‌﻿﻿

8.3.2 Scalable﻿‌​‌ Plant Vision Models for﻿​​﻿ Operational Monitoring

8.3.3 Conformal Prediction for​​​‌ uncertainty quantification

8.3.4​​​‌ AI-Based Species Distribution Modeling﻿​﻿﻿ and Mapping

8.3.5 Coral Reef​​​‌ Monitoring

8.3.6﻿﻿﻿‌ Evaluation of Species Identification﻿‌​‌ and Prediction Algorithms

8.3.7 Importance​‌﻿﻿ of fossil pollen data​​﻿﻿ for vegetation species distribution​​​‌ modeling

9 Bilateral contracts and​​​‌ grants with industry

10 Partnerships﻿​﻿﻿ and cooperations

10.1 International​‌﻿﻿ initiatives

10.1.1 Associate Teams​​﻿﻿ in the framework of​​​‌ an Inria International Lab﻿​﻿﻿ or in the framework​‌﻿﻿ of an Inria International​​﻿﻿ Program

Dinizia

10.1.2﻿﻿﻿‌ Participation in other International﻿‌​‌ Programs

10.2﻿​​﻿ International research visitors

10.2.1​​​‌ Visits of international scientists﻿﻿﻿‌

Inria International Chair

Other international visits to﻿﻿﻿‌ the team

Dennis Shasha﻿‌​‌

Tiffany Ding

10.3 European﻿‌​‌ initiatives

10.3.1 Horizon Europe﻿​​﻿

B3

GUARDEN​​​‌

MAMBO﻿﻿﻿‌

JAMRAI 2

10.4​​​‌ National initiatives

2025Activity‌ reportProject-TeamIROKO

Computer Science and Digital Science

Other Research Topics and Application Domains

1 Team members, visitors, external collaborators

Research Scientists‌

Faculty Members‌

PhD Students‌‌

Interns‌ and Apprentices

Administrative Assistant

2 Overall objectives‌

3.1 Big Data and Scalability

Unified data–model–workflow‌ services.

Scalable‌ time-series analytics at climate‌ scale.

3.2 Machine Learning with‌ Humans in the Loop‌‌

Cooperative learning in citizen science.

Bias-aware‌ species distribution models from‌‌ opportunistic data.

Uncertainty, trust, and interpretability.

3.3‌ Multiscale & Multimodal Data Analytics

Multimodal foundation models‌ for biodiversity and agro-ecology monitoring.

Multivariate time‌ series and scalable similarity.

Biodiversity trajectories and‌ community structure.

5‌ Social and environmental responsibility‌

5.1 Footprint of research‌‌ activities

5.2 Impact of research results

6 Highlights of the year

6.2 Other key achievements

7 Latest‌ software developments, platforms, open‌ data

7.1 New Features‌‌ in the Pl@ntNet Platform

7.2 New platforms

8 New results‌

8.1 Distributed Data and‌‌ Model Management

8.1.1 A Logic-Based Approach for Knowledge‌ Graph Data Integration

8.1.2 Federated Learning‌

8.1.3 Distributed Web Infrastructure‌‌ for Integrated Pest Management

8.2 Data Analytics

8.2.1 Event Detection‌ in Time Series

8.2.2 Scalable Multivariate Anomaly‌‌ Detection

8.2.3 Detecting Anomalies‌ with Any Duration in‌ Climate Time Series

8.2.4 Energy Efficient Time‌ Series Anomaly Detection

8.2.5 Extending‌ Matrix Profile for Seasonal Anomaly Detection

8.3 Machine Learning for Biodiversity and Agroecology‌

8.3.1 Learning Ecological Structure with Large Language Models‌

8.3.2 Scalable‌‌ Plant Vision Models for Operational Monitoring

8.3.3 Conformal Prediction for‌ uncertainty quantification

8.3.4‌ AI-Based Species Distribution Modeling and Mapping

8.3.5 Coral Reef‌ Monitoring

8.3.6‌ Evaluation of Species Identification‌‌ and Prediction Algorithms

8.3.7 Importance‌ of fossil pollen data for vegetation species distribution‌ modeling

9 Bilateral contracts and‌ grants with industry

10 Partnerships and cooperations

10.1 International‌ initiatives

10.1.1 Associate Teams in the framework of‌ an Inria International Lab or in the framework‌ of an Inria International Program

10.1.2‌ Participation in other International‌‌ Programs

10.2 International research visitors

10.2.1‌ Visits of international scientists‌

Other international visits to‌ the team

Dennis Shasha‌‌

10.3 European‌‌ initiatives

10.3.1 Horizon Europe

GUARDEN‌

MAMBO‌

10.4‌ National initiatives

PARAD (PARSADA), (2025-2030), 7.7 MEuros.

Past2ECO‌‌ (PEPR Agroécologie et Numérique), (2026-2031), 3 MEuros.

Pl@ntAgroEco (PEPR Agroécologie et‌ Numérique), (2023-2027), 1.6 MEuros.

FishPredict (ANR), (2022-2025), 500 KEuros.

DeepPEP (ANR), (2025-2027),‌ 25 KEuros.

PPR‌ Antibiorésistance: structuring tool "PROMISE"‌‌ (2021-2024), 240 KEuros.

PNR "Beerisk" (2022-2025). 200 KEuros.

Plan national Ecoantibio "INTERSECTION"‌‌ (2024-2028), 175 Keuros

PEPR‌‌ agroécologie et numérique "RootSystemTracker" (2024-2027), 144 Keuros

Inria‌ Challenge OMICFINDER (2023-2027), 1‌ Engineer - 24 months‌‌

10.4.1‌ Others

10.5 Regional initiatives‌

Regional project "DACLIM" (2023-2026), 70 Keuros

10.6 Public policy support

CESE consultation on‌ the impact of AI on the environment

OECD report on the‌ advancement of the productivity of science with citizen‌ science and artificial intelligence

11.1 Promoting scientific activities

11.1.1 Scientific‌ events: organization