EVERGREEN

EVERGREEN - 2024

2024Activity reportProject-TeamEVERGREEN

RNSR: 202424505L

Research center Inria Branch at the University of Montpellier
In partnership with:INRAE, CIRAD
Team name: Earth obserVation and machine lEarning foR aGRo-Environmental challENges
In collaboration with:Territoires, Environnement, Télédétection et Information Spatiale
Domain:Digital Health, Biology and Earth
Theme:Earth, Environmental and Energy Sciences

Keywords

Computer Science and Digital Science

A3.1.10. Heterogeneous data
A3.4.1. Supervised learning
A3.4.2. Unsupervised learning
A3.4.6. Neural networks
A3.4.8. Deep learning
A5.3.2. Sparse modeling and image representation
A5.3.3. Pattern recognition
A5.4.1. Object recognition
A9.2. Machine learning
A9.3. Signal analysis

1 Team members, visitors, external collaborators

Research Scientists

Dino Ienco [Team leader, INRAE, Senior Researcher]
Cassio Fraga Dantas [INRAE, Researcher]
Raffaele Gaetano [CIRAD, Researcher]
Roberto Interdonato [CIRAD, Researcher]
Diego Marcos Gonzalez [INRIA, Advanced Research Position]

Post-Doctoral Fellows

Pallavi Jain [CIHEAM, Post-Doctoral Fellow]
Konstantinos-Panagiotis Panousis [INRIA, Post-Doctoral Fellow, until Sep 2024]

PhD Students

Azza Abidi [INRAE, until May 2024]
Ananthu Aniraj [INRIA]
Bruno Bio Nikki Sare [CIRAD]
Juan Li [INRAE]
Hugo Riffaud–De Turckheim [INRIA, from Nov 2024]
Isabelle Rocamora [UNIV MONTPELLIER, until Sep 2024]
Quentin Yeche [ATOS, CIFRE]

Technical Staff

Rémi Cresson [INRAE, Engineer]
Christopher Jabea [INRAE, Engineer, from Mar 2024]

Administrative Assistant

Claire-Marine Parodi [INRIA]

Visiting Scientists

Roger Ferrod [Univ Torino, from Jun 2024]
Giuseppe Guarino [UNIV NAPLES, from Nov 2024]
Francisco Mena Toro [DFKI, from Nov 2024]
Timothée Stassin [German Research Centre for Geosciences, from Nov 2024]
Valérie Zermatten [EPFL LAUSANNE, from Sep 2024]

2 Overall objectives

Nowadays, modern space missions continuously collect information about the earth surface, generating massive amounts of data. The multitude of Earth Observation (EO) systems allows to acquire data via different sensors (e.g., optical, radar, LiDAR - Light Detection And Ranging) at different spatial and temporal resolutions with diverse spectral characteristics. On the one hand, this huge volume of collected information opens up new opportunities to better understand and monitor natural, agricultural and other anthropized spaces at different scales. On the other hand, the quantity and diversity of the collected information sets up new challenges to the remote sensing community. As a matter of fact, in order to take the most out of the digital revolution that is impacting the domain, recent and future analysis tasks require a paradigm shift towards data-intensive methodologies. The main objectives of the EVERGREEN project-team is to develop machine learning models and tools for the exploitation and analysis of Earth Observation (EO) data in accordance with the constraints of the operational settings and in constant interaction with potential users and targeted stakeholders. Examples of possible applications range from land use/land cover mapping to natural resources monitoring, including territorial planning as well as biodiversity mapping. More broadly, that may include all possible EO-based applications that support the modern agro-environmental transition.

3 Research program

EVERGREEN is an interdisciplinary team working on the design and application of machine learning techniques to deal with the analysis of Earth observation data to support modern agro-environmental challenges. Our research is organized along the three following research axes:

Tailored Machine Learning methods for EO data (Section 3.1) is the first fundamental research axis. It focuses on advancing the methodologies related to Satellite Image Time Series (SITS) management and on tackling the multi-source exploitation of EO data.
Adoption and development of advanced learning paradigms to support Earth Observation data analysis (Section 3.2) is the second fundamental research axis of the team. The research objectives related to this axis are devoted to make a step further with the exploitation of multiple EO data sources, dealing with ground truth paucity leveraging semi-supervised and self-supervised learning settings and advancing the spatio/temporal transferability of machine learning models for EO data.
Interaction between Domain expert and Machine Learning models (Section 3.3) is the last research axis. Here the goals are to introduce a priori knowledge to guide the learning process and design explainabilty/interpretability neural network models.

3.1 Tailored Machine Learning methods for EO data analysis

The research objectives about this topic are devoted to: i) advance the methodologies related to Satellite Image Time Series (SITS) management and ii) tackle the multi-source exploitation of EO data. Improving the management of SITS data requires to directly cope with signals that are non-stationary (the spectral and temporal responses over land elements sharing the same spatio-temporal dynamics may vary across space and time), temporally discontinuous (sudden events - e.g. human intervention - generally alter the signal responses), and/or affected by missing observations (e.g. due to cloud coverage), but that can exhibit a strong spatial correlation (“close” observations of a same land element are likely to be similar). To tackle such points, we aim to develop new approaches capable of coping with missing values in SITS data, and to integrate external or background knowledge to guide the learning process and explicitly consider the dependency of a SITS signal with its spatial context. Concerning the multi-source exploitation of Earth Observation data, ad hoc solutions exist but there is still a lack of a general methodological framework to leverage the complementarity of different sources according to the considered downstream task. This is especially the case when one of the involved sources is represented by SITS data. Our goal is to provide multi-source EO data fusion paradigms related to a specific downstream task. For instance, if the downstream task is classification or regression, our framework should reduce as much as possible the intermediate steps existing between the raw data and their use for the particular task at hand (i.e. avoid to resampling data source at the same spatial and/or temporal scale or avoid to separate feature extraction and model calibration). The reduction of such intermediate steps are directly related to possible bias affecting intermediate products as well as the volume of data to manage.

3.2 Advanced learning paradigms to support EO data analysis

The research objectives related to this topic are devoted to: i) going further with the complementary exploitation of multiple EO data sources, ii) dealing with ground truth paucity leveraging semi-supervised machine learning settings and iii) advancing the spatio/temporal transferability of machine learning models for EO data. Ameliorate the analysis of EO data, regarding particular applications, require to intelligently manage heterogeneous and complementary information taking the most out of the combination of the different sensors. To this end, we aim to conceive and design new methodological frameworks for multi-source and cross-modal EO data analysis. In this direction, we will investigate settings related to the general domain of Knowledge Distillation and Adversarial Training to tackle the scenario in which some modalities are missing at inference time. While such methodological settings are largely investigated in the context of standard computer vision applications, they are still unexplored in the remote sensing field. Despite the huge amount of sensors data we can dispose on a study area, the time and costly acquisition of ground truth to calibrate machine and deep learning models can negatively influence the deployment of such strategies in an operational context. Here, we will provide research studies contributing to the general domain of self-supervised learning, partially labeled and semi-supervised scenarios (i.e. positive unlabeled learning), spatial active learning strategies and, weakly supervised setting. Last but least point, this axes also targets the investigation of spatio/temporal transferability of machine learning models for EO data analysis with a particular focus on how to adapt a classification model learnt on a study area to generalize over another study area characterized by different climate/environmental conditions as well as transfer a model learnt on data coming from a time period to data coming from the same, or similar, study area acquired at a different period of time.

3.3 Interaction between Domain expert and Machine Learning models

The research objectives associated to this topic are devoted to: i) integrate a priori knowledge (expert or biophysical) in the learning process of the machine learning models, ii) design learning models that explicitly allow to interpret the decision process under different dimensions (i.e. temporal, spatial, ...) and iii) move towards multi-modal exploitation of EO data. Concerning the first point, related to the integration of a priori knowledge, both domain expert or physical modeling can be exploited to guide the exploration of model parameters, with the aim to reduce the possible search space avoiding implausible solutions supplied by the model. The second point involves the design of learning models that explicitly permit to interpret and explain the decision process. This research direction is related to the current necessity to get insights on how machine learning models take their decisions with the aim to supply additional information to the end user and raising up the level of transparency and trustworthiness associated to the decision process. The third direction, related to the multi-modal exploitation of EO data, cover aspects related to the integration of EO data with non-EO data such as, for instance, text or audio modalities. Such a multi-modal analysis rise new challenging related to how integrate data coming from a remote sensing modality with non spatially explicit information going further with the analysis of heterogeneous data sources and opening novel research questions about multi-modal data integration and mining.

4 Application domains

The application scope of the team is mainly guided by the application domains of the INRAE et CIRAD partners with applications related to agricultural and environmental monitoring and assessment. In addition, the application domains of the team is constantly growing and changing with the aim to answer to societal questions related to the modern agro-environmental transition and the challenges it is raising up.

4.1 Food Security

The role of remote sensing in the assessment of food security indicators, especially those concerning food supply (one of the four pillars for food security along with accessibility, quality and stability) through the monitoring of agricultural activities, has been long proven in the last decades. Products typically targeted by these activities go, across scales, from cropland and crop type mapping, to the detection of anomalies in vegetation growth as well as crop yield estimation and forecast. Leveraging remote sensing for the design of novel spatio-temporal indicators related to agricultural production and food security is of paramount importance to support policy makers and social actors in their decision processes. Additionally, remote sensing derived information can provide up-to-date information in order to assess the underlying sustainability of the agricultural production. This is even more important in the context of tropical agricultural systems.

This domain application is at the core of the EVERGREEN activity with multiple research efforts devoted to the analysis of land use and land cover mapping that are of fundamental importance in order to subsequently extract spatio-temporal indicators to characterize agricultural production. For instance, actions related to this application domain are conducted in the context of the CIFRE PhD thesis of Quentin Yeche (INRAE/ATTOS) on the topic of parcel identification/extraction and characaterization, the PhD thesis of Azza Abidi (University of Montpellier, University of Manouba/Tunisie) and Bruno Bio Nikki (CIRAD) related to the analysis of multi-temporal/multi-source remote sensing data for land cover mapping in conventional and tropical agricultural systems.

4.2 Forest monitoring

New challenges are arising about the quantity and quality of the information about forest cover which can effectively support decision making processes at global, national and local scales. The need for a deeper description of the forest cover seems to emerge, by means of a larger set of biophysical or structural indicators carrying information about its diversity in terms of species and their “role” in the landscape. Notable examples are i) the possibility of discriminating between proper forests and tree cover related to agricultural exploitation, which do not provide the same carbon sequestration potential, and ii) to precisely identify the spatial extent of a forest cover, also with respect to changes in its density and “greenness”, especially in transition areas between different ecosystems and/or climatic zones. Among these indicators, many have been proven to be derived from EO data. The EVERGREEN team is increasing its activities related to the analysis and monitoring of forest areas due to the paramount importance of this natural resource. For instance, members of the EVERGREEN team are involved in both European (HORIZON Eco2Adapt) and National projects (ANR PREDISPOSE) related to the analysis of forest disturbances (i.e. forest fires) or they are collaborating with international partners (i.e. DLR/Germany, Wageningen University/Netherlands) on the analysis of forest covers and its properties in Southern Countries (i.e. Africa).

4.3 Biodiversity mapping and monitoring

Biodiversity loss is now considered to be an existential threat to humankind on par with climate change. In order to advance the understanding of the underlying phenomena behind this phenomena, we first need to have a clear picture of the current ranges and population densities of species globally. This is a crucial challenge that will require vast amounts of data in the form of species observations coupled with Earth observation-based habitat suitability. To this end, our objective is to link ground-level data to remote sensing imagery in order to map fundamental niches of species and monitor their spatial shifts under climate change and other anthropogenic pressure. Actions related to this application domain cover the collaboration with the Iroko team, via the co-superivision of a post-doc researcher on species distribution modeling and spatial biases, international collaborations covering the development of visual-language model for ecology mapping with the EPFL University (Switzerland) through the visiting research stay of Valerie Zermatten and the national project IMPACT, funded by the OFB (Office français de la biodiversité) on the detection of possible plant disease outbreaks via remote sensing multi-temporal data.

5 Social and environmental responsibility

5.1 Footprint of research activities

Work trips. While the sanitary crisis had drastically cut the number of work trips of the team, recent years have seen an increase in the physical participation in conferences and various committees. However compared to the pre-covid period, one can note that the majority of movements are national or at best European, with very few trips outside of Europe and, when it is possible, trains are preferred to planes.
Utilization of computing resources. EVERGREEN contributed in 2024 with a new cluster node to the NEF computing platform. Being a team specialized in computer vision and machine learning for remote sensing data, a recurrent task in EVERGREEN is to run CPU/GPU-intensive algorithms on large data collections. To this end, our strategy towards computing resources is driving us to increase the use of regional and national infrastructures (i.e. Jean Zay) in order to leverage sustainable computing platforms instead of local server/workstation with a general positive effect on energy consumption.

5.2 Impact of research results

We estimate that our research work can have several impacts on the society due to the fact that EVERGREEN is working on challenges related to modern agro-environmental analysis. We give below two examples of impact of our research results:

Most of the research work is conducted in collaboration with scientists from environmental and agricultural sciences considering both applied research and operational scenarios. Such interdisciplinary work pave the way to the deployment of our research contributions in projects related to a more sustainable and reasoned management of natural (i.e. forest, water, ...) and agricultural resources.
A part of our research work is conducted in partnership with companies, through CIFRE PhDs and collaboration actions. Hence, the addressed research problems concern an important challenge for the companies, and the solutions proposed are evaluated on their relevance to tackle this challenge.

6 Highlights of the year

The beginning of the EVERGREEN project-team in January 8th, 2024. After more than 3 years of path, EVERGREEN has started officially its activity this year with around 20 persons involved in the team among them, 5 are permanents and 2 with HDR.
Among the numerous publications of the year, we can highlight major conference publications: 1 paper at ECCV (European Conference on Computer Vision), 1 paper at NeurIPS (Conference on Neural Information Processing Systems), 1 paper at ECML/PKDD (European Conference on Machine Learning) and 1 paper at BMVC (British Machine Vision Conference). We can also underline several journal publications in both machine learning (i.e. Neurocomputing) and remote sensing (i.e. ISPRS Journal of Photogrammetry and Remote Sensing, Remote Sensing of Environment) fields.
Organization of the 6th edition of the MACLEAN (Machine Learning for Earth Observation) workshop co-located with the ECML/PKDD 2024 conference and the co-organization of the MVEO (Machine Vision for Earth Observation and Environment Monitoring) workshop co-located with BMVC 2024 conference.

6.1 Awards

Best paper award at the British Machine Vision Conference (BMVC) 2024.

7 New software, platforms, open data

For the EVERGREEN research team, the main objective of the software implementations is to experimentally validate the results obtained and ease the transfer of the developed methodologies to the community.

7.1 Reproducibility efforts

As a common practice in the team, we put effort in always making the software associated to our papers freely available and easily accessible in a public repository (e.g., on Github or Gitlab). To enhance visibility of these individual efforts, we maintain in our official website a centralized list of the software repositories associated to the team's research activities. In addition, we contribute to reproducibility efforts related to the release of open data in partnership with our institute colleagues, ensuring community access to available, ready-to-use data for subsequent machine learning analysis 38.

7.2 New software

7.2.1 PDiscoFormer

Name:
PDiscoFormer
Keywords:
Deep learning, Computer vision, Artificial intelligence
Functional Description:
Computer vision methods that explicitly detect object parts and reason on them are a step towards inherently interpretable models. Existing approaches that perform part discovery driven by a fine-grained classification task make very restrictive assumptions on the geometric properties of the discovered parts, they should be small and compact. Although this prior is useful in some cases, in this paper we show that pre-trained transformer-based vision models, such as self-supervised DINOv2 ViT, enable the relaxation of these constraints. In particular, we find that a total variation (TV) prior, which allows for multiple connected components of any size, substantially outperforms previous work. We test our approach on three fine-grained classification benchmarks: CUB, PartImageNet and Oxford Flowers, and compare our results to previously published methods as well as a re-implementation of the state-of-the-art method PDiscoNet with a transformer-based backbone. We consistently obtain substantial improvements across the board, both on part discovery metrics and the downstream classification task, showing that the strong inductive biases in self-supervised ViT models require to rethink the geometric priors that can be used for unsupervised part discovery.
Release Contributions:
Developed the library from scratch
URL:
https://github.com/ananthu-aniraj/pdiscoformer
Contact:
Ananthu Aniraj

7.3 New platforms

7.3.1 MORINGA: an open-source platform for automatic land cover classification from multi-sensor imagery

Participants: Raffaele Gaetano.

Started in 2015 in the framework of the activities of the Land Cover Scientific Expertise Center as part of the French Land Surface Data and Services Hub - THEIA, the developement of the MORINGA 1 processing chain was initially aimed at providing a “turn key” solution, addressed to thematic specialists with relatively low programming skills, for the automatic land cover classification from multi-sensor, multi-resolution and multi-temporal satellite imagery. It particularly targets the needs for accurate land cover mapping in the context of tropical agricultural systems, where several specificities (landscape heterogeneity and fragmentation, small field sizes, high cloud coverage during cropping seasons) call for the combination of different resolutions and acquisition modes to both capture spatial details and temporal profiles. Leveraging an object-based approach and a suitable supervised classification framework based on legacy machine learning techniques, the MORINGA processing chain takes in charge imagery provided by different satellite missions, both at high (Sentinel-1 and -2, Landsat 8/9, etc.) and very high (Pléiades, SPOT6/7) spatial resolution, and automatically manages their pre-processing and ingestion in the object based machine learning framework, with limited user interaction. Reference data are also automatically processed to provide the best possible validation also in cases of data paucity, which are rather common in the targeted application. To date, MORINGA has become a feature-rich, modular platform for remote sensing image analysis, which can also be used as a lower-level API to ease common image processing tasks. Thanks to the support of thematic experts and cartography specialists, it has since been used for the production of high quality land cover maps in many different scientific and dissemination contexts (see 38 for a notable example in 2024). The software package is currently bound to evolve to a larger remote sensing based land cover workbench, including novel deep learning techniques for both image pre-processing/enhancement and multi-sensor classification.

7.3.2 Benchopt: an open optimization benchmarking platform

Participants: Cássio Fraga Dantas.

Participants: Dino Ienco, Diego Marcos, Raffaele Gaetano, Cássio Fraga Dantas, Roberto Interdonato.

A worth-mentioning open-source initiative, in which participates the team member Cassio F. Dantas, is the Benchopt platform 37 2. It consists in a larger-scale collective effort involving researchers from various institutions in a national level, including École Normale Supérieure Paris and Institut Mines-Télécom as well as other Inria teams such as MIND (Paris) and DANTE (Lyon). Benchopt provides a collaborative framework to automate, reproduce and publish benchmarks of optimization algorithms across programming languages and hardware architectures. Among the initiated benchmarks are several inverse problems and machine learning tasks which are often present in the backbones of many application domains, including in remote sensing: from ordinary and non-negative least-squares, to several denoising and regression problems, or even image classification with residual neural networks. But more importantly, the idea is for the research community to contribute by adding new competing algorithms or even new benchmarks on their problems of interest. This may benefit the entire community by providing a reliable, transparent and reproducible comparison between existing approaches, allowing practitioners to easily choose a suitable off-the-shelf approach for their application scenario and alleviating researchers from the burden of reimplementation when proposing new algorithms. Since its official release marked by the NeurIPS publication in 2022 37, the platform is permanently maintained by an expanding community that remains in direct contact via a Discord channel and organizes yearly in-person meetups (Benchopt sprints). The purpose of these events is to harness collective efforts to drive key enhancements to the platform, identify new working groups and kick-start projects to be continued remotely after the event. ———————————–

8 New results

In this section, we briefly summarize and reference the major research results published in 2024. The research works are organized into three subsections: i) Tailored Machine Learning Methods for EO Data Analysis; ii) Advanced Learning Paradigms to Support EO Data Analysis; and iii) Interaction Between Domain Experts and Machine Learning Models.

8.1 Tailored Machine Learning methods for EO data analysis

8.1.1 DIAMANTE: A data-centric semantic segmentation approach to map tree dieback induced by bark beetle infestations via satellite images

Participants: Dino Ienco.

Collaborators: Giuseppina Andresini (University of Bari, Italy), Annalisa Appice (University of Bari, Italy), Vito Recchia (University of Bari, Italy).

Keywords: Forest mapping, Forest Disturbances, deep learning, multi-sensor fusion, semantic segmentation.

Forest tree dieback inventory has a crucial role in improving forest management strategies.

Figure 1: Late fusion approach to merge together information from Sentinel-1 and Sentinel-2 remote sensing data for the downstream task of semantic segmentation of forest tree dieback. See Section 8.1.1 for more details.

This inventory is traditionally performed in forests through laborious and time-consuming human assessment of individual trees. On the other hand, the large amount of Earth satellite data that are publicly available with the Copernicus program and can be processed through advanced deep learning techniques has recently been established as an alternative to field surveys for forest tree dieback tasks. However, to realize its full potential, deep learning requires a fine understanding of satellite data since the data collection and preparation steps are essential as the model development step. Here, we have explored the performance of a data-centric semantic segmentation approach to detect forest tree dieback events due to bark beetle infestation in satellite images. The proposed approach prepares a multisensor data set collected using both the SAR Sentinel-1 sensor and the optical Sentinel-2 sensor and uses this dataset to train a multi-sensor semantic segmentation model (Figure 1). The evaluation shows the effectiveness of the proposed approach in a real inventory case study that regards non-overlapping forest scenes from the Northeast of France acquired in October 2018. The selected scenes host bark beetle infestation hotspots of different sizes, which originate from the mass reproduction of the bark beetle in the 2018 infestation.

This work has been published in the Journal of Intelligent Information Systems (Springer) journal 7.

8.1.2 Generation of country-scale canopy height maps over Gabon using deep learning and TanDEM-X InSAR data

Participants: Dino Ienco.

Collaborators: Daniel Carcereria (German Aerospace Center (DLR), Germany), Paola Rizzoli (German Aerospace Center (DLR), Germany), Luca Dell’Amore (German Aerospace Center (DLR), Germany), José-Luis Bueso-Bello (German Aerospace Center (DLR), Germany), Lorenzo Bruzzone (Università degli Studi di Trento, Italy)

Keywords: Forest height, forest parameter regression, deep learning, bistatic SAR, interferometric coherence, InSAR, TanDEM-X, LVIS.

Operational canopy height mapping at high resolution remains a challenging task at country-level. Most of the existing state-of-the-art inversion methods propose physically-based schemes which are specifically tuned for local scales.

Figure 2: Country-scale mosaic of Gabon representing the CHM, generated using TanDEM-X acquisitions from the first global covered of the mission (Dec. 2010 - end of 2011). See Section 8.1.2 for more details.

Only few approaches in the literature have attempted to produce country or global scale estimates, mostly by means of data-driven approaches and multi-spectral data sources. In this work, we propose a robust deep learning approach that exploits single-pass interferometric TanDEM-X data to generate accurate forest height estimates from a single interferometric bistatic acquisition. The model development is driven by considerations on both the final performance and the trustworthiness of the model for large-scale deployment in the context of tropical forests. We train and test our model over the five tropical sites of the AfriSAR 2016 campaign, situated in the West Central state of Gabon, performing spatial cross-validation experiments to test its generalization capability. We define a specific training dataset and input predictors to develop a robust model for country-scale inference, by finding an optimal trade-off between the model performance and the large-scale reliability. The proposed model achieves an overall estimation bias of 0.12 m, a mean absolute error of 3.90 m, a root mean squared error of 5.08 m and a coefficient of determination of 0.77. Finally, we generate a time-tagged country-scale canopy height map of Gabon at 25 m resolution (Figure 2), discussing the potential and challenges of these kinds of products for their application in different scenarios and for the monitoring of forest changes.

This work has been published in the Remote Sensing of Environment (Elsevier) journal 9.

8.1.3 Early Season Forecasting of Corn Yield at Field Level from Multi-Source Satellite Time Series Data

Participants: Dino Ienco.

Collaborators: Johann Desloires (Airbus DS), Antoine Botrel (Syngenta Seeds, France)

Keywords: Yield forecasting, machine learning, thermal time, Sentinel-2, land surface temperature, early season forecasting.

Crop yield forecasting during an ongoing season is crucial to ensure food security and commodity markets. For this reason, here, a scalable approach to forecast corn yields at the field-level using machine learning and satellite imagery from Sentinel-2 and Landsat missions is proposed. The model, evaluated on 1319 corn fields in the U.S. Corn Belt from 2017 to 2022, integrates biophysical parameters from Sentinel-2, Land Surface Temperature (LST) from Landsat, and agroclimatic data from ERA5 reanalysis dataset. Resampling the time series over thermal time significantly enhances predictive performance. The addition of LST to our model further improves in-season yield forecasting, through its capacity to detect early drought, which is not immediately visible to optical sensors such as the Sentinel-2. Here, we propose a new two-stage machine learning strategy (see Figure 3) to mitigate early season partially available data. It consists in extending the current time series on the basis of complete historical data and adapting the model inference according to the crop progress.

Figure 3: Deep neural network architecture with LSTM and dense layers. Concatenation of dynamic and static paths. Node dimensions indicated. Each Dense hidden layer is followed by batch normalization and ReLU activation function with a dropout rate of 0.5. See Section 8.1.3 for more details.

This work has been published in the Remote Sensing (MDPI) journal 11.

8.1.4 Machine Learning-Based Summer Crops Mapping Using Sentinel-1 and Sentinel-2 Images

Participants: Cássio Fraga Dantas, Dino Ienco.

Collaborators: Saeideh Maleki (UMR TETIS, INRAE, France), Nicolas Baghdadi (UMR TETIS, INRAE, France), Hassan Bazzi (UMR TETIS, AgroParisTech, France), Yasser Nasrallah (UMR TETIS, INRAE, France) ,Sami Najem (UMR TETIS, INRAE, France),

Keywords: Rapeseed mapping, Sentinel-1, Sentinel-2, machine learning, neural networks.

Accurate crop type mapping using satellite imagery is crucial for food security, yet accurately distinguishing between crops with similar spectral signatures is challenging. This study assessed the performance of Sentinel-2 (S2) time series (spectral bands and vegetation indices), Sentinel-1 (S1) time series (backscattering coefficients and polarimetric parameters), alongside phenological features derived from both S1 and S2 time series (harmonic coefficients and median features), for classifying sunflower, soybean, and maize. Random Forest (RF), Multi-Layer Perceptron (MLP), and XGBoost classifiers were applied across various dataset configurations and train-test splits over two study sites and years in France. Additionally, the InceptionTime classifier, specifically designed for time series data, was tested exclusively with time series datasets to compare its performance against the three general machine learning algorithms (RF, XGBoost, and MLP). The results showed that XGBoost outperformed RF and MLP in classifying the three crops. The optimal dataset for mapping all three crops combined S1 backscattering coefficients with S2 vegetation indices, with comparable results between phenological features and time series data (mean F1 scores of 89.9% for sunflower, 76.6% for soybean, and 91.1% for maize). However, when using individual satellite sensors, S1 phenological features and time series outperformed S2 for sunflower, while S2 was superior for soybean and maize. Both phenological features and time series data produced close mean F1 scores across spatial, temporal, and spatiotemporal transfer scenarios, though median features dataset was the best choice for spatiotemporal transfer. Polarimetric S1 data did not yield effective results. The InceptionTime classifier further improved classification accuracy over XGBoost for all crops, with the degree of improvement varying by crop and dataset (the highest mean F1 scores of 90.6% for sunflower, 86.0% for soybean, and 93.5% for maize).

This work has been published in the Remote Sensing (MDPI) journal 15.

8.1.5 Determining Effective Temporal Windows for Rapeseed Detection Using Sentinel-1 Time Series and Machine Learning Algorithms

Participants: Cássio Fraga Dantas, Dino Ienco.

Collaborators: Saeideh Maleki (UMR TETIS, INRAE, France), Nicolas Baghdadi (UMR TETIS, INRAE, France), Sami Najem (UMR TETIS, INRAE, France), Hassan Bazzi (Atos France, France)

Keywords: Rapeseed mapping, Sentinel-1, machine learning, neural networks.

This study investigates the potential of Sentinel-1 (S1) multi-temporal data for the early-season mapping of the rapeseed crop. Additionally, we explore the effectiveness of limiting the portion of a considered time series to map rapeseed fields. To this end, we conducted a quantitative analysis to assess several temporal windows (periods) spanning different phases of the rapeseed phenological cycle in the following two scenarios relating to the availability or constraints of providing ground samples for different years: i) involving the same year for both training and the test, assuming the availability of ground samples for each year; and ii) evaluating the temporal transferability of the classifier, considering the constraints of ground sampling. We employed two different classification methods that are renowned for their high performance in land cover mapping: the widely adopted random forest (RF) approach and a deep learning-based convolutional neural network, specifically the InceptionTime method. To assess the classification outcomes, four evaluation metrics (recall, precision, F1 score, and Kappa) were employed. Using S1 time series data covering the entire rapeseed growth cycle, the tested algorithms achieved F1 scores close to 95% on same-year training and testing, and 92.0% when different years were used, both algorithms demonstrated effective performance. Our findings underscore the importance of a concise S1 time series for effective rapeseed mapping, offering advantages in data storage and processing time. Overall, the study establishes the robustness of RF and InceptionTime in rapeseed detection scenarios, providing valuable insights for agricultural applications.

This work has been published in the Remote Sensing (MDPI) journal 16.

8.1.6 Integrating Predictive Process Monitoring Techniques in Smart Agriculture

Participants: Roberto Interdonato, Dino Ienco.

Collaborators: Simona Fioretto (University of Naples Federico II, Italy), Elio Masciari (University of Naples Federico II, Italy)

Keywords: Predictive Process Monitoring, crop rotation, machine learning.

Problems related to the environment are increasingly commonly known and consequently also technology is adapting to find suitable solutions. The ancestral technique of crop rotation was identified as a solution to address the problems related to pollution due to intensive food production (i.e. using fertilizers and pesticides). To ensure that this technique can actually improve food production, it is necessary to understand how modern technologies can support it. In particular, the analysis of crop rotation can support farmers in decision-making process and the optimization of farm management practices. The aim of this paper is to investigate how predictive process monitoring techniques can enhance crop rotation strategies by leveraging Agriculture 4.0 through real-time monitoring, resulting in more accurate and adaptive strategies. It is a position paper that proposes research questions for further study, which may help to develop the research area.

This work has been published at the International Symposium on Methodologies for Intelligent Systems (ISMIS) 2024 25.

8.1.7 Multi-source deep-learning approach for automatic geomorphological mapping: the case of glacial moraines

Participants: Isabelle Rocamora, Dino Ienco.

Collaborators: Matthieu Ferry (Univ. Montpellier, France)

Keywords: Deep learning, data fusion, geomorphology, moraines, mapping.

Figure 4: (a) Flowchart of our framework. (b) MorNet’s architecture. See Section 8.1.7 for more details.

Landform mapping is the initial step of many geomorphological analyses (e.g., assessment of natural hazards and natural resources) and requires vast resources to be applied to wide areas at high-resolution. Among geomorphological objects, we focus on glacial moraine mapping, since it is a task relevant to many fields (e.g., paleoclimate and glacial geomorphology). Here we proposed to exploit the potential of Deep Learning-based approaches to map moraine landforms by exploiting multi-source remote sensing imagery. To this end, we propose the first Deep Learning model to map glacial moraines, namely MorNet (Figure 4). As multi-source remote sensing information, we combine together three different sources: Topographic (Pleiades-derived Digital Surface Model), Multispectral (Sentinel-2), and SAR (Sentinel-1) data. To cope with such heterogeneous information, the proposed model has a dedicated branch for each input source and, a late fusion mechanism is leveraged to combine them with the aim to provide the final mapping. The performance of the MorNet model is evaluated on several glacier valleys in China in the Himalayan range. This area contains minimally eroded moraines, so they are well-defined and of varied morphology. The behavior of the proposed method is compared to models using individual mono-source models in order to highlight the benefit to simultaneously leverage multi-source information. The use of multi-source data allows MorNet to exploit the complementarity of the three input sources and improve its performance from an F1-score of about 41.6 using a single source to 52.8 using three sources. MorNet provides a first-order moraine map through its ability to identify well-defined moraines. Consequently, MorNet can identify areas likely to contain moraines and intends to be used as a tool by experts to facilitate and support large-scale mapping.

This work has been published in the Geo-spatial Information Science (Taylor & Francis) journal 22.

8.1.8 GeoPlant: Spatial Plant Species Prediction Dataset

Participants: Diego Marcos.

Collaborators: Lukas Picek (Inria, Zenith, France), Christophe Botella (Inria, Zenith, France), Maximilien Servajean (LIRMM, Université de Montpellier, France), César Leblanc (Inria, Zenith, France), Rémi Palard (Inria, Zenith, France), Théo Larcher (Inria, Zenith, France), Benjamin Deneu (Inria, Zenith, France), Pierre Bonnet (Amap, Cirad, France), Alexis Joly (Inria, Zenith, France)

Keywords: Deep learning, data fusion, species distribution models.

The difficulty of monitoring biodiversity at fine scales and over large areas limits ecological knowledge and conservation efforts. To fill this gap, Species Distribution Models (SDMs) predict species across space from spatially explicit features. Yet, they face the challenge of integrating the rich but heterogeneous data made available over the past decade, notably millions of opportunistic species observations and standardized surveys, as well as multi-modal remote sensing data. In light of that, we have designed and developed a new European-scale dataset for SDMs at high spatial resolution (10-50 m), including more than 10k species (i.e., most of the European flora). The dataset comprises 5M heterogeneous Presence-Only records and 90k exhaustive Presence-Absence survey records, all accompanied by diverse environmental rasters (e.g., elevation, human footprint, and soil) that are traditionally used in SDMs. In addition, it provides Sentinel-2 RGB and NIR satellite images with 10 m resolution, a 20-year time-series of climatic variables, and satellite time-series from the Landsat program. In addition to the data, we provide an openly accessible SDM benchmark (hosted on Kaggle), which has already attracted an active community and a set of strong baselines for single predictor/modality and multimodal approaches. All resources, e.g., the dataset, pre-trained models, and baseline methods (in the form of notebooks), are available on Kaggle, allowing one to start with our dataset literally with two mouse clicks.

This work has been accepted to NeurIPS 2024 Datasets and Benchmarks track 32, 31.

8.1.9 Mapping the diversity of land uses following deforestation across Africa

Participants: Diego Marcos.

Collaborators: Robert N. Masolele (Wageningen University and Research, The Netherlands), Veronique De Sy (Wageningen University and Research, The Netherlands), Itohan‑Osa Abu (Julius-Maximilians-University, Germany), Jan Verbesselt (Wageningen University and Research, The Netherlands), Johannes Reiche (Wageningen University and Research, The Netherlands), Martin Herold (German GeoResearch Center, Germany)

Keywords: Deforestation drivers, continental scale, land-use following deforestation, high-resolution satellite imagery.

African forests are increasingly in decline as a result of land-use conversion due to human activities. However, a consistent and detailed characterization and mapping of land-use change that results in forest loss are not available at the spatial-temporal resolution and thematic levels that are suitable for decision-making at the local and regional scales; so far they have only been provided on coarser scales and restricted to humid forests. Here we present the first high-resolution (5 m) and continental-scale mapping of land use following deforestation in Africa, which covers an estimated 13.85% of the global forest area, including humid and dry forests. We use reference data for 15 different land-use types from 30 countries and implement an active learning framework to train a deep learning model for predicting land-use following deforestation with an F1-score of 84 $\pm$ 0.7 for the whole of Africa. Our results show that the causes of forest loss vary by region. In general, small-scale cropland is the dominant driver of forest loss in Africa, with hotspots in Madagascar and DRC. In addition, commodity crops such as cacao, oil palm, and rubber are the dominant drivers of forest loss in the humid forests of western and central Africa, forming an “arc of commodity crops” in that region. At the same time, the hotspots for cashew are found to increasingly dominate in the dry forests of both western and south-eastern Africa, while larger hotspots for large-scale croplands were found in Nigeria and Zambia. The increased expansion of cacao, cashew, oil palm, rubber, and large-scale croplands observed in humid and dry forests of western and south-eastern Africa suggests they are vulnerable to future land-use changes by commodity crops, thus creating challenges for achieving the zero deforestation supply chains, supporting REDD+ initiatives, and towards sustainable development goals.

This work has been published as a research article in the journal Scientific Reports 18.

8.2 Advanced learning paradigms to support EO data analysis

8.2.1 DisCoM-KD: Cross-Modal Knowledge Distillation via Disentanglement Representation and Adversarial Learning

Participants: Cássio Fraga Dantas, Dino Ienco.

Keywords: Multi-modal learning, Cross-Modal Knowledge Distillation, Classification, Disentanglement learning.

Cross-modal knowledge distillation (CMKD) refers to the scenario in which a learning framework must handle training and test data that exhibit a modality mismatch, more precisely, training and test data do not cover the same set of data modalities. Traditional approaches for CMKD are based on a teacher/student paradigm, where a teacher is trained on multi-modal data with the aim to successively distill knowledge from a multi-modal teacher to a single-modal student. Despite the widespread adoption of such paradigm, recent research has highlighted its inherent limitations in the context of cross-modal knowledge transfer. Taking a step beyond the teacher/student paradigm, in this work, we introduce a new framework for cross-modal knowledge distillation, named DisCoM-KD (Disentanglement-learning based Cross-Modal Knowledge Distillation), that explicitly models different types of per-modality information with the aim to transfer knowledge from multi-modal data to a single-modal classifier (Figure 5). To this end, DisCoM-KD effectively combines disentanglement representation learning with adversarial domain adaptation to simultaneously extract, for each modality, domain-invariant, domain-informative and domain-irrelevant features according to a specific downstream task. Unlike the traditional teacher/student paradigm, our framework simultaneously learns all single-modal classifiers, eliminating the need to learn each student model separately as well as the teacher classifier. We evaluated DisCoM-KD on three standard multi-modal benchmarks and compared its behavior with recent state of the art knowledge distillation frameworks. The findings clearly demonstrate the effectiveness of DisCoM-KD over competitors considering mismatch scenarios involving both overlapping and non-overlapping modalities. These results offer insights to reconsider the traditional paradigm for distilling information from multi-modal data to single-modal neural networks.

Figure 5: Schematic overview of DisCoM-KD: On the left, there are two per-modality branch extractors for modalities $M 1$ and $M 2$ , along with two per-modality task classifiers to obtain the final prediction. On the right, several auxiliary classifiers, acting on intermediate representations, help disentangling per-modality information and make representations task informative. The training of the two parallel architectures is performed jointly, but at inference time, each model is deployed independently. See Section 8.2.1 for more details.

This work has been published at the British Machine Vision Conference (BMVC) 2024 26.

8.2.2 Semi Supervised Heterogeneous Domain Adaptation via Disentanglement and Pseudo-Labelling

Participants: Cássio Fraga Dantas, Raffaele Gaetano, Dino Ienco.

Keywords: Domain adaptation, heterogeneous data, Feature disentanglement, Pseudo-labeling, consistency regularization.

Semi-supervised domain adaptation methods leverage information from a source labelled domain with the goal of generalizing over a scarcely labelled target domain. While this setting already poses challenges due to potential distribution shifts between domains, an even more complex scenario arises when source and target data differs in modality representation (e.g. they are acquired by sensors with different characteristics). For instance, in remote sensing, images may be collected via various acquisition modes (e.g. optical or radar), different spectral characteristics (e.g. RGB or multi-spectral) and spatial resolutions. Such a setting is denoted as Semi-Supervised Heterogeneous Domain Adaptation (SSHDA) and it exhibits an even more severe distribution shift due to modality heterogeneity across domains. To cope with the challenging SSHDA setting, here we introduce SHeDD (Semi-supervised Heterogeneous Domain Adaptation via Disentanglement) an end-to-end neural framework tailored to learning a target do-main classifier by leveraging both labelled and unlabelled data from heterogeneous data sources (Figure 6). SHeDD is designed to effectively disentangle domain-invariant representations, relevant for the downstream task, from domain-specific information, that can hinder the cross-modality transfer. Additionally, SHeDD adopts an augmentation-based consistency regularization mechanism that takes advantages of reliable pseudo-labels on the unlabelled target samples to further boost its generalization ability on the target domain. Empirical evaluations on two remote sensing bench-marks, encompassing heterogeneous data in terms of acquisition modes and spectral/spatial resolutions, demonstrate the quality of SHeDD compared to both baseline and state-of-the-art competing approaches

Figure 6: Schematic view of the proposed method architecture with a separate encoder for each of the data modalities (source and target). Feature disentanglement enables domain-specific and domain-invariant information to be encoded separately into each half of the generated embedding vectors (depicted in orange and green respectively). The domain-invariant information ( $z^{i n v}$ ) is used by the task classifier, while the domain classifier receives the domain-specific portion of the embedding vector ( $z^{s p e}$ ). At inference time, only the bottom part of the architecture is used, the top part being instrumental in the training stage to enable the feature disentanglement procedure. See Section 8.2.2 for more details.

This work has been published at the European conference on Machine Leanring (ECML/PKDD) 2024 24.

8.2.3 Orthrus: multi-scale land cover mapping from satellite image time series via 2D encoding and convolutional neural network

Participants: Dino Ienco.

Collaborators: Ali Ben Abbes (Univ. Manouba, Tunisia), Imed Riadh Farah (Univ. Manouba, Tunisia)

Keywords: Deep Learning, pixel-object classification, convolutional neural networks (CNN), multivariate time-series , classification, 2D encoding representation, land use land cover.

With the advent of modern Earth observation (EO) systems, the opportunity of collecting satellite image time series (SITS) provides valuable insights to monitor spatiotemporal dynamics. Within this context, accurate land use/land cover (LULC) mapping plays a pivotal role in supporting territorial management and facilitating informed decision-making processes. However, traditional pixel-based and object-based classification methods often face challenges to effectively exploit spectral and spatial information. In this study, we propose Orthrus, a novel approach that fuses multi-scale information for enhanced LULC mapping. The proposed approach exploits several 2D encoding techniques to encode times series information into imagery. The resulting image is leveraged as input to a standard convolutional neural network (CNN) image classifier to cope with the downstream classification task. The evaluations on two real-world benchmarks, namely Dordogne and Reunion-Island, demonstrated the quality of Orthrus over state-of-the-art techniques from the field of land cover mapping based on SITS data. More precisely, Orthrus exhibits an enhancement of more than 3.5 accuracy points compared to the best competing approach on the Dordogne benchmark, and surpasses the best competing approach on the Reunion-Island dataset by over 3 accuracy points.

This work has been published in the Neural Computing and Applications (Springer) journal 6.

8.2.4 A constrastive semi-supervised deep learning framework for land cover classification of satellite time series with limited labels

Participants: Roberto Interdonato, Raffaele Gaetano, Dino Ienco.

Keywords: Satellite image time series, land cover mapping, semi-supervised learning, deep learning.

In this work, we present a new semi-supervised learning framework to cope with satellite image time series (SITS) classification in a data paucity scenario, considering extremely low levels of supervision. The proposed methodology, referred to as S $^{3}$ ITS (Semi-Supervised Satellite Image Time Series classification method), is based on temporal convolutional neural networks and it takes advantage of both labelled and unlabelled information. S $^{3}$ ITS enforces the data to be projected in a discriminative manifold via contrastive learning, in order to produce a data representation where samples belonging to the same category are closer than the ones belonging to different ones. Pseudo-labelling is employed on unlabelled samples to take the most out of the available unlabelled information. Experiments on two study sites described by satellite image time series of Sentinel-2 images highlight the quality of the proposed method with respect to common classification methods and recent machine learning approaches especially tailored for the semi-supervised classification of multi-variate time series data. This work has been published in the Neurocomputing (Elsevier) journal 13.

8.2.5 Reuse out-of-year data to enhance land cover mapping via feature disentanglement and contrastive learning

Participants: Cássio Fraga Dantas, Raffaele Gaetano, Dino Ienco.

Collaborators: Claudia Paris (University of Twente, Netherlands)

Keywords: Satellite image time series (SITS), land cover mapping, domain adaptation, contrastive learning, data-centric artificial intelligence.

Given the systematic acquisition of satellite data, it is possible to generate up-to-date land cover (LC) maps, essential for effective agricultural territory management, environmental monitoring, and informed decision-making. Typically, creating a LC map requires collecting high-quality labeled data, a process that is both costly and time-consuming. To mitigate the need to collect large volume of labeled data, we propose a deep learning framework called REFeD (data Reuse with Effective Feature Disentanglement for land cover mapping), which leverages already available out-of-year reference data to enhance the production of up-to-date LC maps (Figure 7). To this end, REFeD integrates remote sensing and reference data from different domains (e.g., historical and recent data) utilizing a disentanglement strategy based on contrastive learning. By separating domain-invariant and domain-specific features, REFeD isolates useful information associated to the downstream LC mapping task and mitigates distribution shifts between domains. Moreover, REFeD incorporates an effective supervision scheme to reinforce feature disentanglement through multiple levels of supervision at different granularities. Experimental evaluation on study areas characterized by diverse landscapes, including Koumbia (West Africa, Burkina Faso) and Centre-Val de Loire (central Europe, France), demonstrates the effectiveness of the proposed approach

Figure 7: Architecture of the proposed pseudo-siamese network used in the training stage and composed of two independent branches which disentangle the domain-invariant information (top branch) from domain-specific information (bottom branch). Class ( $ℒ_{c l}$ ) and domain ( $ℒ_{d o m}$ ) discrimination losses used respectively on the top and bottom branches, while a multi-level contrastive loss ( $ℒ_{c o n}$ ) is used to intermediate features at different depths from both branches. At inference time, only the domain-invariant encoder is used for classifying the target domain. See Section 8.2.5 for more details.

This work is currently available as a preprint 10 and under submission to a remote sensing journal.

8.2.6 Prompt-guided and multimodal landscape scenicness assessments with vision-language models

Participants: Diego Marcos.

Collaborators: Alex Levering (Vrije Universiteit Amsterdam, Netherlands), Nathan Jacobs (Washington University in St. Louis, USA), Devis Tuia (EPFL, Switzerland)

Keywords: Vision-language models, landscape scenicness.

Recent advances in deep learning and Vision-Language Models (VLM) have enabled efficient transfer to downstream tasks even when limited labelled training data is available, as well as for text to be directly compared to image content. These properties of VLMs enable new opportunities for the annotation and analysis of images. We test the potential of VLMs for landscape scenicness prediction, i.e., the aesthetic quality of a landscape, using zero- and few-shot methods. We experiment with few-shot learning by fine-tuning a single linear layer on a pre-trained VLM representation. We find that a model fitted to just a few hundred samples performs favorably compared to a model trained on hundreds of thousands of examples in a fully supervised way. We also explore the zero-shot prediction potential of contrastive prompting using positive and negative landscape aesthetic concepts. Our results show that this method outperforms a linear probe with few-shot learning when using a small number of samples to tune the prompt configuration. We introduce Landscape Prompt Ensembling (LPE), which is an annotation method for acquiring landscape scenicness ratings through rated text descriptions without needing an image dataset during annotation. We demonstrate that LPE can provide landscape scenicness assessments that are concordant with a dataset of image ratings. The success of zero- and few-shot methods combined with their ability to use text-based annotations highlights the potential for VLMs to provide efficient landscape scenicness assessments with greater flexibility.

This work has been published as a research aricle in the journal PloS one 14.

8.3 Interaction between Domain expert and Machine Learning models

8.3.1 Coarse-to-Fine Concept Bottleneck Models

Participants: Konstantinos Panousis, Dino Ienco, Diego Marcos.

Keywords: Interpretability, explainability, multi-modal, bayesian deep learning, sparsification.

Deep learning algorithms have recently gained significant attention due to their impressive performance.

Figure 8: (Left) The Concept Discovery Block (CDB). Given a set of concepts and an image, we compute their similarity via a VLM; we consider a data-driven mechanism for concept discovery, sampling from an amortized Bernoulli posterior. (Right) A schematic of the envisioned CF-CBMs. We consider a set of high level concepts, each described by a number of attributes; this forms the pool of low-level concepts. Our objective is to discover concepts that describe the whole image, while exploiting information residing in, in this case P = 9, patch-specific regions by matching low-level concepts to each patch and aggregate the information to obtain a single representation. Each level comprises CDBs, while the levels are linked together via the binary indicators $Z_{H}$ and $Z_{L}$ .

However, their high complexity and un-interpretable mode of operation hinders their confident deployment in real-world safety-critical tasks. This work targets ante hoc interpretability, and specifically Concept Bottleneck Models (CBMs). Our goal is to design a framework that admits a highly interpretable decision making process with respect to human understandable concepts, on two levels of granularity. To this end, we propose a novel two-level concept discovery formulation leveraging: i) recent advances in vision-language models, and ii) an innovative formulation for coarse-to-fine concept selection via data-driven and sparsity-inducing Bayesian arguments. Figure 8 depicts the general schema of the proposed approach. Within this framework, concept information does not solely rely on the similarity between the whole image and general unstructured concepts; instead, we introduce the notion of concept hierarchy to uncover and exploit more granular concept information residing in patch-specific regions of the image scene. As we experimentally show, the proposed construction not only outperforms recent CBM approaches, but also yields a principled framework towards interpetability.

This work has been published at the Neural Information Processing Systems (NeurIPS) 2024 conference 30.

8.3.2 PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers

Participants: Ananthu Aniraj, Cássio Fraga Dantas, Dino Ienco, Diego Marcos.

Keywords: Part Detection, transformers, interpretability.

Computer vision methods that explicitly detect object parts and reason on them are a step towards inherently interpretable models. Existing approaches that perform part discovery driven by a fine-grained classification task make very restrictive assumptions on the geometric properties of the discovered parts; they should be small and compact. Although this prior is useful in some cases, in this paper we show that pre-trained transformer-based vision models, such as self-supervised DINOv2 ViT, enable the relaxation of these constraints. In particular, we find that a total variation (TV) prior, which allows for multiple connected components of any size, substantially outperforms previous work. We test our approach (the architecture is depicted in Figure 9) on three fine-grained classification benchmarks: CUB, PartImageNet and Oxford Flowers, and compare our results to previously published methods as well as a re-implementation of the state-of-the-art method PDiscoNet with a transformer-based backbone. We consistently obtain substantial improvements across the board, both on part discovery metrics and the downstream classification task, showing that the strong inductive biases in self-supervised ViT models require to rethink the geometric priors that can be used for unsupervised part discovery.

Figure 9: The PDiscoFormer architecture for part discovery.

This work has been published at the European Conference on Computer Vision (ECCV) 2024 23.

8.3.3 Explaining the decisions and the functioning of a convolutional spatiotemporal land cover classifier with channel attention and redescription mining

Participants: Dino Ienco.

Collaborators: Enzo Pelous (Université Savoie Mont Blanc, France), Nicolas Méger (Université Savoie Mont Blanc, France), Abdourrahmane Atto (Université Savoie Mont Blanc, France), Hermann Courteille (University of Rennes, IRISA, France), Christophe Lin-Kwong-Chon (Université Savoie Mont Blanc, France)

Keywords: Explainable AI, convolutional neural networks, land cover classification, satellite image time series, attention, redescription mining, grouped frequent sequential patterns.

Convolutional neural networks trained with satellite image time series have demonstrated their potential in land cover classification in recent years. Nevertheless, the rationale leading to their decisions remains obscure by nature. Methods for providing relevant and simplified explanations of their decisions as well as methods for understanding their inner functioning have thus emerged. However, both kinds of methods generally work separately and no explicit connection between their findings is made available. This work presents an innovative method for refining the explanations provided by channel-based attention mechanisms. It consists in identifying correspondence rules between neuronal activation levels and the presence of spatiotemporal patterns in the input data for each channel and target class. These rules provide both class-level and instance-level explanations, as well as an explicit understanding of the network operations. They are extracted using a state-of-the-art redescription mining algorithm. Experiments on the Reunion Island Sentinel-2 dataset show that both correct and incorrect decisions can be explained using convenient spatiotemporal visualizations.

This work has been published in the ISPRS Journal of Photogrammetry and Remote Sensing (Elsevier) journal 20.

8.3.4 Understanding the sentiment associated with cultural ecosystem services using images and text from social media

Participants: Diego Marcos.

Collaborators: Ilan Havinga (Wageningen University, Netherlands), Patrick Bogaart (National Accounts Department, Statistics Netherlands), Devis Tuia (EPFL, Switzerland), Lars Hein (Wageningen University, Netherlands)

Keywords: Ecosystem services, Cultural ecosystem services, Natural language processing, Machine learning, Social media, Big data.

Social media is increasingly being employed to develop Cultural Ecosystem Services (CES) indicators. The image-sharing platform Flickr has been one of the most popular sources of data. Most large-scale studies, however, tend to only use the number of images as a proxy for CES due to the challenges associated with processing large amounts of this data but this does not fully represent the benefit generated by ecosystems in terms of the positive experiences expressed by users in the associated text. To address this gap, we have applied several Computer Vision (CV) and natural language processing (NLP) models to link CES estimates for Great Britain based on the content of images to sentiment measures using the accompanying text, and compare our results to a national, geo-referenced survey of recreational well-being in England. We have found that the aesthetic quality of the landscape and the presence of particular wildlife results in more positive sentiment. However, we have also found that different physical settings correlate with this sentiment and that sentiment is sometimes more strongly related to social activities than many natural factors. Still, we found significant associations between these CES measures, sentiment and survey data. The obtained findings illustrate that integrating sentiment analysis with CES measurement can capture some of the positive benefits associated with CES using social media. The additional detail provided by these novel techniques can help to develop more meaningful CES indicators for recreational land use management

This work has been published in the Ecosystem Services (Elsevier) journal 12.

9 Bilateral contracts and grants with industry

9.1 Bilateral contracts with industry

ATOS - Cifre thesis

Participants: Raffaele Gaetano, Diego Marcos, Dino Ienco.

This Cifre Ph.D. thesis project, entitled “Multi-source satellite image segmentation for the extraction of geometric landscape objects with an application to the extraction of agricultural land parcels”, started in September 2023, for a total duration of 3 years.

Context: Delineating agricultural field plot accurately and efficiently is important not only for the declaration-based subsidy systems such as the European Common Agricultural Policy, but also for monitoring agricultural activities on several scales (environmental impact, territorial development, crop monitoring and precision farming, etc.) and get useful information regarding the status of agricultural production. To this end, the necessity of precise and timely spatialized products, such as land use and land cover maps and the estimation of agricultural yields at field level, are essential. These tools are part of a process of developing value-added services linked to digital agriculture. The accuracy and freshness of these products could prove to be a key factor in supporting decision-making by a wide range of stakeholders, including farmers, land managers and political decision-makers.

Objectives: Initial work on the extraction of agricultural land parcels from satellite imagery using deep learning techniques has recently been proposed but these are mainly studies that directly deploy techniques from the state of the art in computer vision in this field of application, and therefore with a limited adaptation to the field of satellite imagery, particularly with regard to taking into account the multi-source, multi-temporal and multi-scale information that is accessible via modern Earth observation missions. It is in this context that this CIFRE thesis aims to tackle the problem of the automatic extraction of agricultural fields from remotely sensed data on a territory and its characterization in terms of land use and land cover. To this end, the thesis project plans to leverage deep learning techniques such as semantic segmentation and instance segmentation to propose new approaches tailored to the analysis of satellite data for the task of extracting geometric contours for the delineation of agricultural fields, as well as for characterizing the corresponding land use (in terms of cropping practices).
ECOMED

Participants: Cássio Fraga Dantas, Diego Marcos, Dino Ienco.

The collaboration, with a duration of 1 year, has covered the gratification related to the master internship of Bertille Temple.

Context: ECO-MED Ecologie et Médiation (Ecology and Mediation) is a consultancy specializing in studies, expertise and advice on the natural environment, applied to regional planning and the enhancement of natural environments. It has been working with developers, industrialists and public bodies since 2003. With its 3 branches in Marseille, Montpellier and Lyon, employing around fifty people, ECO-MED is present throughout the south-east of France and covers the whole of this territory in terms of projects. Since September 2011, ECO-MED has been recognized as a company involved in both fundamental and applied research. This makes it possible to promote all the protocols and methods that have been the fruit of ECO-MED's work since it was founded in October 2003.

Objectives: ECO-MED has collected data from the last years of field campaign and the objective is to evaluate and assess the quality of such collected data (both ground truth and aerial imagery) using automatic deep learning method for the classification and segmentation of natural habitats according to EUNIS-type reference systems, species habitats and ecologically functional physiognomic complexes.
ORUS

Participants: Cássio Fraga Dantas, Dino Ienco, Jocelyn Chanussot.

Context: ORUS is interested in the scientific expertise of the EVERGREEN and THOTH teams to provide advice, guidance and expertise on their research topic: "Processing Earth observation data using machine learning techniques, processing hyperspectral imagery and, more generally, machine learning methods for computer vision".

Objectives: The aim of the contract is to provide a scientific advice in the form of a preliminary survey covering a state-of-the-art related to recent techniques for the classification and analysis of hyperspectral data, in order to pave the way for a second research collaboration that meets both the needs expressed by ORUS and the objective of exploring a line of work with scientific and innovative potential in terms of its contribution to the state-of-the-art.

10 Partnerships and cooperations

10.1 International research visitors

10.1.1 Visits of international scientists

Juan Li

Status: PhD Student

Institution of origin: Beijing Normal University

Country: China

Dates: From January 2024 until November 2025