MODAL

2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011

MODAL - 2023

Keywords
1 Team members, visitors, external collaborators
2 Overall objectives
- 2.1 Context
- 2.2 Goals
3 Research program
4 Application domains
- 4.1 Economic world
- 4.2 Biology and health
5 Social and environmental responsibility
6 New software, platforms, open data
- 6.1 New software
- 6.2 New platforms
  - 6.2.1 MASSICCC Platform
7 New results
8 Bilateral contracts and grants with industry
- 8.1 Bilateral contracts with industry
- 8.2 Bilateral grants with industry
  - Withings
  - ADEO
  - Seckiot
  - Decathlon
  - ASYGN
  - HORIBA
9 Partnerships and cooperations
10 Dissemination
- 10.1 Promoting scientific activities
- 10.2 Teaching - Supervision - Juries
11 Scientific production

2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011

Keywords
1 Team members, visitors, external collaborators
2 Overall objectives
- 2.1 Context
- 2.2 Goals
3 Research program
4 Application domains
- 4.1 Economic world
- 4.2 Biology and health
5 Social and environmental responsibility
6 New software, platforms, open data
- 6.1 New software
- 6.2 New platforms
  - 6.2.1 MASSICCC Platform
7 New results
8 Bilateral contracts and grants with industry
- 8.1 Bilateral contracts with industry
- 8.2 Bilateral grants with industry
  - Withings
  - ADEO
  - Seckiot
  - Decathlon
  - ASYGN
  - HORIBA
9 Partnerships and cooperations
10 Dissemination
- 10.1 Promoting scientific activities
- 10.2 Teaching - Supervision - Juries
11 Scientific production

2023Activity reportProject-TeamMODAL

RNSR: 201020969D

Research center Inria Centre at the University of Lille
In partnership with:CNRS, Université de Lille
Team name: MOdel for Data Analysis and Learning
In collaboration with:Laboratoire Paul Painlevé (LPP)
Domain:Applied Mathematics, Computation and Simulation
Theme:Optimization, machine learning and statistical methods

Keywords

Computer Science and Digital Science

A3.1.4. Uncertain data
A3.1.10. Heterogeneous data
A3.2.3. Inference
A3.3.2. Data mining
A3.3.3. Big data analysis
A3.4.1. Supervised learning
A3.4.2. Unsupervised learning
A3.4.5. Bayesian methods
A3.4.7. Kernel methods
A5.2. Data visualization
A5.9.2. Estimation, modeling
A6.2.3. Probabilistic methods
A6.2.4. Statistical methods
A6.3.3. Data processing
A9.2. Machine learning

1 Team members, visitors, external collaborators

Research Scientists

Christophe Biernacki [INRIA, Professor Detachement, HDR]
Benjamin Guedj [INRIA, Researcher]
Hemant Tyagi [INRIA, Researcher]

Faculty Members

Cristian Preda [Team leader, UNIV LILLE, Professor, HDR]
Sophie Dabo [UNIV LILLE, Professor, HDR]
Guillemette Marot [UNIV LILLE, Professor, HDR]

Post-Doctoral Fellow

Rim Essifi [INRIA, Post-Doctoral Fellow, until Aug 2023]

PhD Students

Reuben Adams [UCL]
François Bassac [Decathlon, CIFRE]
Clarisse Boinay [Seckiot]
Violaine Courrier [WITHINGS, from Sep 2023]
Clara Dubois [LABO TIMC, from Jun 2023]
Maxime Haddouche [UNIV LILLE]
Wilfried Heyse [UNIV LILLE, until Aug 2023]
Eglantine Karle [INRIA, until Oct 2023]
Etienne Kronert [WORLDLINE]
Issam Ali Moindjie [INRIA]
Axel Potier [ADEO]
Antonin Schrab [UCL]

Technical Staff

Ernesto Javier Araya Valdivia [INRIA, Engineer, from Mar 2023 until Oct 2023]
Rachid Boulkhir [INRIA, Engineer, until Sep 2023]
Guillaume Braun [INSEE, until Mar 2023]
Ismat Yahia Chaib Draa [ALICANTE, Engineer, until Nov 2023]
Louise Chen [INRIA, Engineer, from Nov 2023]

Interns and Apprentices

Paguidame Sambiani [INRIA, Intern, from Jul 2023 until Sep 2023]

Administrative Assistant

Anne Rejl [INRIA]

External Collaborator

Alain Celisse [UNIV PARIS I, HDR]

2 Overall objectives

2.1 Context

In several respects, modern society has strengthened the need for statistical analysis both from the applied and theoretical points of view. The genesis comes from the easier availability of data thanks to technological breakthroughs (storage, transfer, computing), and are now so widespread that they are no longer limited to large human organizations. The more or less conscious goal of such data availability is the expectation of improving the quality of “since the dawn of time” statistical stories which are namely discovering new knowledge or doing better predictions. These both central tasks can be referred to respectively as unsupervised learning or supervised learning, even if it is not limited to them or other names exist depending on communities. Somehow, it pursues the following hope: “more data for better quality and more numerous results”.

However, today's data are increasingly complex. They gather mixed type features (for instance continuous data mixed with categorical data), missing or partially missing items (like intervals) and numerous variables (high dimensional situation). As a consequence, the target “better quality and more numerous results” of the previous adage (both words are important: “better quality” and also “more numerous”) could not be reached through a somewhat “manual” way, but should inevitably rely on some theoretical formalization and guarantee. Indeed, data can be so numerous and so complex (data can live in quite abstract spaces) that the “empirical” statistician is quickly outdated. However, data being subject by nature to randomness, the probabilistic framework is a very sensible theoretical environment to serve as a general guide for modern statistical analysis.

2.2 Goals

Modal is a project-team working on today's complex data sets (mixed data, missing data, high-dimensional data), for classical statistical targets (unsupervised learning, supervised learning, regression etc.) with approaches relying on the probabilistic framework. This latter can be tackled through both model-based methods (as mixture models for a generic tool) and model-free methods (as probabilistic bounds on empirical quantities). Furthermore, Modal is connected to the real world by applications, typically with biological ones (some members have this skill) but many other ones are also considered since the application coverage of the Modal methodology is very large. It is also important to note that, in return, applications are often real opportunities for initiating academic questioning for the statistician (case of some projects treated by Bilille platform and some bilateral contracts of the team).

From the academic communities point of view, Modal can be seen as belonging simultaneously to both the statistical learning and machine learning ones, as attested by its publications. Somewhere it is the opportunity to make a bridge between these two stochastic communities around a common but large probabilistic framework.

3 Research program

3.1 Research axis 1: Unsupervised learning

Scientific locks related to unsupervised learning are numerous, concerning the clustering outcome validity, the ability to manage different kinds of data, the missing data questioning, the dimensionality of the data set etc. Many of them are addressed by the team, leading to publication achievements, often with a specific package delivery (sometimes upgraded as a software or even as a platform grouping several software). Because of the variety of the scope, it involves nearly all the permanent team members, often with PhD students and some engineers. The related works are always embedded inside a probabilistic framework, typically model-based approaches but also model-free ones like PAC-Bayes (PAC stands for Probably Approximately Correct), because such a mathematical environment offers both a well-posed problem and a rigorous answer.

3.2 Research axis 2: Performance assessment

One main concern of the Modal team is to provide theoretical justifications on the procedures which are designed. Such guarantees are important to avoid misleading conclusions resulting from any unsuitable use. For example, one ingredient in proving these guarantees is the use of the PAC framework, leading to finite-sample concentration inequalities. More precisely, contributions to PAC learning rely on the classical empirical process theory and the PAC-Bayesian theory. The Modal team exploits such non-asymptotic tools to analyze the performance of iterative algorithms (such as gradient descent), cross-validation estimators, online change-point detection procedures, ranking algorithms, matrix factorization techniques and clustering methods, for instance. The team also develops some expertise on the formal dynamic study of algorithms related to mixture models (important models used in the previous unsupervised setting), like degeneracy for EM algorithm or also label switching for Gibbs algorithm.

3.3 Research axis 3: Functional data

Mainly due to technological advances, functional data are more and more widespread in many application domains. Functional data analysis (FDA) is concerned with the modeling of data, such as curves, shapes, images or a more complex mathematical object, though as smooth realizations of a stochastic process (an infinite dimensional data object valued in a space of eventually infinite dimension; space of squared integrable functions etc.). Time series are an emblematic example even if it should not be limited to them (spectral data, spatial data etc.). Basically, FDA considers that data correspond to realizations of stochastic processes, usually assumed to be in a metric, semi-metric, Hilbert or Banach space. One may consider, functional independent or dependent (in time or space) data objects of different types (qualitative, quantitative, ordinal, multivariate, time-dependent, spatial-dependent etc.). The last decade saw a dynamic literature on parametric or non-parametric FDA approaches for different types of data and applications to various domains, such as principal component analysis, clustering, regression and prediction.

3.4 Research axis 4: Applications motivating research

The fourth axis consists in translating real application issues into statistical problems raising new (academic) challenges for models developed in Modal team. Cifre PhDs in industry and interdisciplinary projects with research teams in Health and Biology are at the core of this objective. The main originality of this objective lies in the use of statistics with complex data, including in particular ultra-high dimension problems. We focus on real applications which cannot be solved by classical data analysis.

4 Application domains

4.1 Economic world

The Modal team applies its research to the economic world through CIFRE PhD supervision such as CACF (credit scoring), A-Volute (expert in 3D sound), Meilleur Taux (insurance comparator), Worldline. It also has several contracts with companies such as COLAS, Nokia-Apsys/Airbus, Safety Line (through the PERF-AI consortium), Agence d'Urbanisme Métropole Européenne de Lille, ASYGN SAS (MEMs, joint Cytomems ANR project), HORIBA France SAS (Raman spectrometry), Withings (medical devices), Seckiot (cyber-security).

4.2 Biology and health

The second main application domain of the team is biology and health. Some members of the team are involved in the direction of Bilille, the bioinformatics platform of Lille, and of OncoLille Institute. Some members of the team also co-supervise PhD students of Inserm teams.

5 Social and environmental responsibility

MODAL has not any social and environmental responsibility.

6 New software, platforms, open data

6.1 New software

6.1.1 MixtComp.V4

Keywords:
Clustering, Statistics, Missing data, Mixed data
Functional Description:
MixtComp (Mixture Computation) is a model-based clustering package for mixed data originating from the Modal team (Inria Lille). It has been engineered around the idea of easy and quick integration of all new univariate models, under the conditional independence assumption. New models will eventually be available from researches, carried out by the Modal team or by other teams. Currently, central architecture of MixtComp is built and functionality has been field-tested through industry partnerships. Five basic models (Gaussian, Multinomial, Poisson, Weibull, NegativeBinomial) are implemented, as well as two advanced models (Functional and Rank). MixtComp has the ability to natively manage missing data (completely or by interval). MixtComp is used as an R package, but its internals are coded in C++ using state of the art libraries for faster computation.
Release Contributions:
- New I/O system - Replacement of regex library - Improvement of initialization - Criteria for stopping the algorithm - Added management of partially missing data for several models - User documentation - Adding user features in R
URL:
https://github.com/modal-inria/MixtComp
Contact:
Christophe Biernacki
Participants:
Christophe Biernacki, Vincent Kubicki, Matthieu Marbac-Lourdelle, Serge Iovleff, Quentin Grimonprez, Etienne Goffinet
Partners:
Université de Lille, CNRS

6.1.2 cfda

Name:
Categorical functional data analysis
Keyword:
Functional data
Functional Description:

The R package cfda performs:

- descriptive statistics for categorical functional data

- dimension reduction and optimal encoding of states (correspondance multiple analyses towards functional data)
URL:
https://github.com/modal-inria/cfda
Contact:
Cristian Preda
Participants:
Cristian Preda, Quentin Grimonprez, Vincent Vandewalle
Partner:
Université de Lille

6.1.3 ClusPred

Name:
Simultaneous Semi-Parametric Estimation of Clustering and Regression
Keywords:
Regression, Clustering, Semi-parametric model, Finite mixture
Functional Description:
Parameter estimation of regression models with fixed group effects, when the group variable is missing while group-related variables are available. Parametric and semi-parametric approaches are considered.
URL:
https://cran.r-project.org/web/packages/ClusPred
Authors:
Matthieu Marbac-Lourdelle, Mohammed Sedki, Christophe Biernacki, Vincent Vandewalle
Contact:
Matthieu Marbac-Lourdelle

6.1.4 visCorVar

Name:
visualization of correlated variables in the context of statistical integration of omics data
Keywords:
Data integration, Visualization
Functional Description:
The R package visCorVar allows visualizing results from data integration with the function block.spslda (bioconductor mixOmics package). The data integration is performed for different types of omic datasets (transcriptomics, metabolomics, metagenomics) in order to select variables of a omic dataset which are correlated with the variables of the other omic datasets and the response variables and to predict the class membership of a new sample. These correlated variables can be visualized with correlation circles and networks.
URL:
https://gitlab.com/bilille/viscorvar
Contact:
Guillemette Marot
Participants:
Maxime Brunin, Guillemette Marot, Pierre Pericard
Partner:
Université de Lille

6.1.5 metaRNASeq

Name:
RNA-Seq data meta-analysis
Keywords:
Transcriptomics, Meta-analysis, Differential analysis, High throughput sequencing, Biostatistics
Functional Description:
MetaRNASeq is a specialised software for RNA-seq experiments. It is an R package which is an adaptation of the metaMA package, which performs meta-analysis of microarray data. Both enable to take advantage of empirical bayesian approaches, especially appropriate in a context of high dimension. Specificities of the two types of technologies require however some adaptations to each one, explaining the development of two different packages. To facilitate their use by a large public, a Galaxy-web instance named SMAGEXP has been created and gathers the two packages.
Release Contributions:
Minimum maintenance was ensured to correct a bug reported by an user, due to Windows Systems, not appearing on Linux. This bug was related to the treatment of missing values. Guillemette Marot, who created and largely contributed to the initial versions of the metaRNASeq package, led the maintenance in September 2021 to Samuel Blanck, engineer in METRICS ULR2694 team (Univ. Lille, CHU Lille).
URL:
https://cran.r-project.org/web/packages/metaRNASeq/index.html
Contact:
Guillemette Marot
Participants:
Guillemette Marot, Andrea Rau, Samuel Blanck
Partners:
INRAE, Université de Lille

6.1.6 HDSpatialScan

Name:
Multivariate and Functional Spatial Scan Statistics
Keywords:
Functional data, Clustering, Spatial information, Multivariate data
Functional Description:
Allows to detect spatial clusters of abnormal values on multivariate or functional data
URL:
https://cran.r-project.org/web/packages/HDSpatialScan/index.html
Contact:
Sophie Dabo

6.1.7 MLGL

Name:
Multi-Layer Group Lasso
Keywords:
Variable selection, Statistical learning
Functional Description:
The MLGL R-package, standing for Multi-Layer Group-Lasso, implements a procedure of variable selection in the context of redundancy between explanatory variables, which holds true with high dimensional data. The MLGL approach combines variables aggregation and selection in order to improve interpretability and performance. First, a hierarchical clustering procedure provides at each level a partition of the variables into groups. Then, the set of groups of variables from the different levels of the hierarchy is given as input to group-Lasso, with weights adapted to the structure of the hierarchy. At this step, group-Lasso outputs sets of candidate groups of variables for each value of regularization parameter. The versatility offered by MLGL to choose groups at different levels of the hierarchy a priori induces a high computational complexity. MLGL however exploits the structure of the hierarchy and the weights used in group-Lasso to greatly reduce the final time cost. The final choice of the regularization parameter – and therefore the final choice of groups – is made by a multiple hierarchical testing procedure.
URL:
https://cran.r-project.org/web/packages/MLGL/index.html
Contact:
Guillemette Marot

6.2 New platforms

6.2.1 MASSICCC Platform

Participants: Christophe Biernacki, Julien Vandeale.

MASSICCC is a demonstration platform giving access through a SaaS (service as a software) concept to data analysis libraries developed at Inria. It allows obtaining results either directly through a website specific display (specific and interactive visual outputs) or through an R data object download. It started in October 2015 for two years and is common to the Modal team (Inria Lille) and the Select team (Inria Saclay). In 2016, two packages have been integrated: Mixmod and MixtComp (see the specific section about MixtComp). In 2017, the BlockCluster package has been integrated and also a particular attention to provide meaningful graphical outputs (for Mixmod, MixtComp and BlockCluster) directly in the web platform itself has led to some specific developments. In 2019, a new version of the MixtComp software has been developed. From 2020, Julien Vandaele joined the MODAL team as a research engineer for upgrading the MixtComp software and also for replacing the MASSICCC platform by some three R notebooks dedicated to the three packages Mixmod, BlockCluster and MixtComp. All these notebooks can be founded here on the MODAL webpage.

7 New results

7.1 Axis 1: Co-clustering as a (very) parsimonious clustering

Participants: Christophe Biernacki.

We advocate that co-clustering, is of particular interest to perform high dimension (HD) clustering of individuals even if it is not its primary mission. Indeed, column clustering is recast as a strategy to control the variance of the estimation, the model dimension being driven by the number of groups of variables instead of the number of variables itself. A survey paper published in an international journal 14 advocates the ability of co-clustering to outperform simple mixture row-clustering, even if co-clustering clearly corresponds to a misspecified model situation, revealing a promising manner to efficiently address (very) HD clustering.

It is a joint work with Julien Jacques from University Lyon 2 and Christine Keribin from University Paris-Saclay.

7.2 Axis 1: Dealing with Missing Data in Model-based Clustering through a MNAR Model

Participants: Christophe Biernacki.

Since the 90s, model-based clustering is largely used to classify data. Nowadays, with the increase of available data, missing values are more frequent. Traditional ways to deal with them consist in obtaining a filled data set, either by discarding missing values or by imputing them. In the first case, some information is lost; in the second case, the final clustering purpose is not taken into account through the imputation step. Thus, both solutions risk to blur the clustering estimation result. Alternatively, we defend the need to embed the missingness mechanism directly within the clustering modeling step. There exists three types of missing data: missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). In all situations logistic regression is proposed as a natural and flexible candidate model. In particular, its flexibility property allows us to design some meaningful parsimonious variants, as dependency on missing values or dependency on the cluster label. In this unified context, standard model selection criteria can be used to select between such different missing data mechanisms, simultaneously with the number of clusters. Practical interest of our proposal is illustrated on data derived from medical studies suffering from many missing data.

A preprint has been submited to an international journal 57 and a invited talk to an international conference has been given related to this topic 32.

It is a joint work with Claire Boyer from Sorbonne Université, Gilles Celeux from Inria Saclay, Julie Josse from Inria Montpellier, Fabien Laporte from Institut Pasteur and Matthieu Marbac from ENSAI.

7.3 Axis 1: Gaussian-based Visualization of Gaussian and non-Gaussian Model-based Clustering

Participants: Christophe Biernacki.

A generic method is introduced to visualize in a Gaussian-like way, and onto $ℝ^{2}$ or non-Gaussian model-based clustering. The key point is to explicitly force a spherical Gaussian mixture visualization to inherit from the within cluster overlap which is present in the initial clustering mixture. The result is a particularly user-friendly draw of the clusters, allowing any practitioner to have a thorough overview of the potentially complex clustering result. An entropic measure allows us to inform of the quality of the drawn overlap, in comparison to the true one in the initial space. The proposed method is illustrated on four real data sets of different types (categorical, mixed, functional and network) and is implemented on the R package ClusVis. This work has been published previously in an international journal but has been presented this year as an invited plenary sesssion to the French classification conference 36 and also as an invited talk to the main international workshop on model-based clustering 42.

This is a joint work with Matthieu Marbac from ENSAI and Vincent Vandewalle from University Côte d’Azur.

7.4 Axis 1: Levels Merging in the Latent Class Model

Participants: Christophe Biernacki.

The latent class model (LCM), dedicated to cluster categorical variables, suffers for the curse of dimension when the number of levels is large, situation frequently encountered in practice. We propose to extent LCM to a natural modeling which limits the number of levels by merging them, process which is also equivalent to a specific levels clustering. Related estimation and model selection processes are also presented and discussed. This work has been presented for an invited talk at two international conferences 34, 35.

7.5 Axis 1: Comparative study of series clustering models multivariate temporal data from connected medical objects

Participants: Christophe Biernacki, Violaine Courrier, Cristian Preda.

In healthcare, patient data are often collected in the form of multivariate time series, providing a comprehensive overview of a patient's health status over time. These data are generally scattered and episodic. However, connected medical objects can increase the frequency of data. The objective is to create unsupervised patient profiles from these time series. In the absence of labels, a predictive model can be used to predict future values while performing a space of latent clusters, evaluated according to predictive performance. Using real data from the Withings company, we compare the static clustering approach MAGMACLUST, which creates a cluster at the scale of the entire time series, and the dynamic clustering DGM2, which allows an individual's membership in a group to change over time. This work will be presented to a conference in 2024 41.

7.6 Axis 1: Dynamic Ranking with the BTL Model: A Nearest Neighbor based Rank Centrality Method

Participants: Eglantine Karle, Hemant Tyagi.

Many applications such as recommendation systems or sports tournaments involve pairwise comparisons within a collection of $n$ items, the goal being to aggregate the binary outcomes of the comparisons in order to recover the latent strength and/or global ranking of the items. In recent years, this problem has received significant interest from a theoretical perspective with a number of methods being proposed, along with associated statistical guarantees under the assumption of a suitable generative model.

While these results typically collect the pairwise comparisons as one comparison graph $G$ , however in many applications – such as the outcomes of soccer matches during a tournament – the nature of pairwise outcomes can evolve with time. Theoretical results for such a dynamic setting are relatively limited compared to the aforementioned static setting. We study in this paper an extension of the classic BTL (Bradley-Terry-Luce) model for the static setting to our dynamic setup under the assumption that the probabilities of the pairwise outcomes evolve smoothly over the time domain $[0, 1]$ . Given a sequence of comparison graphs ${(G_{t^{'}})}_{t^{'} \in 𝒯}$ on a regular grid $𝒯 \subset [0, 1]$ , we aim at recovering the latent strengths of the items $w_{t}^{*} \in ℝ^{n}$ at any time $t \in [0, 1]$ . To this end, we adapt the Rank Centrality method – a popular spectral approach for ranking in the static case – by locally averaging the available data on a suitable neighborhood of $t$ . When ${(G_{t^{'}})}_{t^{'} \in 𝒯}$ is a sequence of Erdös-Renyi graphs, we provide non-asymptotic $ℓ_{2}$ and $ℓ_{\infty}$ error bounds for estimating $w_{t}^{*}$ which in particular establishes the consistency of this method in terms of $n$ , and the grid size $| 𝒯 |$ . We also complement our theoretical analysis with experiments on real and synthetic data. This work appeared in the Journal of Machine Learning Research 22.

7.7 Axis 1&2: Dynamic Ranking and Translation Synchronization

Participants: Ernesto Araya, Eglantine Karle, Hemant Tyagi.

In many applications, such as sport tournaments or recommendation systems, we have at our disposal data consisting of pairwise comparisons between a set of n items (or players). The objective is to use this data to infer the latent strength of each item and/or their ranking. Existing results for this problem predominantly focus on the setting consisting of a single comparison graph G. However, there exist scenarios (e.g., sports tournaments) where the pairwise comparison data evolves with time. Theoretical results for this dynamic setting are relatively limited and is the focus of this paper. We study an extension of the translation synchronization problem to the dynamic setting where the outcomes evolve smoothly over time, and derive efficient algorithms which are consistent (under a dynamic generative model) in terms of the number of time points. Experiments on synthetic and real data showcase the efficacy of the proposed methods.

This work appeared in the journal Information and Inference: a journal of the IMA 13.

7.8 Axis 1&2: Minimax Optimal Clustering of Bipartite Graphs with a Generalized Power Method

Participants: Guillaume Braun, Hemant Tyagi.

Clustering bipartite graphs is a fundamental task in network analysis, especially when the number of rows and columns of the adjacency matrix are of different order. Recent results provide an upper-bound for the misclustering rate when the columns (resp. rows) can be partitioned into $L = 2$ (resp. $K = 2$ ) communities. In this work, we introduce a new algorithm based on the power method and derive conditions for exact recovery in the general setting where $K \neq L \geq 2$ . We also derive a minimax lower bound on the misclustering error when $K = L$ , under a symmetric version of our model, which matches the corresponding upper bound up to a factor depending on $K$ .

This work appeared in the journal Information and Inference: a journal of the IMA 16.

7.9 Axis 1&2: Graph Matching via convex relaxation to the simplex

Participants: Ernesto Araya, Hemant Tyagi.

This paper addresses the Graph Matching problem, which consists of finding the best possible alignment between two input graphs, and has many applications in computer vision, network deanonymization and protein alignment. A common approach to tackle this problem is through convex relaxations of the NP-hard Quadratic Assignment Problem (QAP). Here, we introduce a new convex relaxation onto the unit simplex and develop an efficient mirror descent scheme with closed-form iterations for solving this problem. Under the correlated Gaussian Wigner model, we show that the simplex relaxation admits a unique solution with high probability. In the noiseless case, this is shown to imply exact recovery of the ground truth permutation. Additionally, we establish a novel sufficiency condition for the input matrix in standard greedy rounding methods, which is less restrictive than the commonly used `diagonal dominance' condition. We use this condition to show exact one-step recovery of the ground truth (holding almost surely) via the mirror descent scheme, in the noiseless setting. We also use this condition to obtain significantly improved conditions for the GRAMPA algorithm [Fan et al. 2019] in the noiseless setting. Our method is evaluated on both synthetic and real data, demonstrating superior statistical performance compared to existing convex relaxation methods with similar computational costs.

This work is currently under review in a journal 60.

7.10 Axis 2: Learning linear dynamical systems under convex constraints

Participants: Hemant Tyagi.

We consider the problem of finite-time identification of linear dynamical systems from $T$ samples of a single trajectory. Recent results have predominantly focused on the setup where no structural assumption is made on the system matrix $A^{*} \in R^{n \times n}$ , and have consequently analyzed the ordinary least squares (OLS) estimator in detail. We assume prior structural information on $A^{*}$ is available, which can be captured in the form of a convex set $𝒦$ containing $A^{*}$ . For the solution of the ensuing constrained least squares estimator, we derive non-asymptotic error bounds in the Frobenius norm that depend on the local size of $𝒦$ at $A^{*}$ . To illustrate the usefulness of these results, we instantiate them for three examples, namely when (i) $A^{*}$ is sparse and $𝒦$ is a suitably scaled $ℓ_{1}$ ball; (ii) $𝒦$ is a subspace; (iii) $𝒦$ consists of matrices each of which is formed by sampling a bivariate convex function on a uniform $n \times n$ grid (convex regression). In all these situations, we show that $A^{*}$ can be reliably estimated for values of $T$ much smaller than what is needed for the unconstrained setting.

This work is currently under review in a journal 59 and is joint work with Denis Efimov (Inria Lille, Valse team).

7.11 Axis 2: An estimation approach for the influential–imitator diffusion

Participants: Sophie Dabo-Niang.

This paper presents a numerical estimation procedure for the influential–imitator diffusion, an extension to the Bass model in which a population is partitioned into two segments: influentials (who influence each other) and imitators (whose choices are affected by the ones of influentials). Focusing on the estimation of the model parameters, we propose a maximum likelihood approach and investigate its numerical solvability, building on an asymptotic approximation of the underlying differential equation. Specifically, we develop a truncated series expansion, exhibiting an increasing accuracy when the spontaneous innovation decreases. After uncovering the theoretical properties of the proposed methodology, we propose a specialized block coordinate descent method for the numerical maximization of the likelihood function. Empirical and computational tests are provided using the Michell and West dataset about the cannabis consumption of a cohort of students over their second, third and fourth year at a secondary school in Glasgow. The estimated imitation pattern confirms the well-known hypothesis on peer influences, where the choices of popular children represent the leading effects to determine the habits of others.

It is a joint work with Ringo Thomas Tchouya (IMSP, bENIN), Stefano Nasini (IESEG, Lille) 31.

7.12 Axis 2: k-nearest neighbors prediction and classification for spatial data

Participants: Sophie Dabo-Niang.

This paper proposes a spatial $k$ -nearest neighbor method for nonparametric prediction of real-valued spatial data and supervised classification for categorical spatial data. The proposed method is based on a double nearest neighbor rule which combines two kernels to control the distances between observations and locations. It uses a random bandwidth in order to more appropriately fit the distributions of the covariates. The almost complete convergence with rate of the proposed predictor is established red and the almost sure convergence of the supervised classification rule was deduced. Finite sample properties are given for two applications of the $k$ -nearest neighbor prediction and classification rule.

It is a joint work with Mohamed Salem Ahmed (University of Lille, CERIM), Mohamed Attouch (University Sidi Bel Abbes, Algeria), Mamadou Ndiaye (UCAD, Senegal) 11.

7.13 Axis 2: FDR control for Online Anomaly Detection

Participants: Etienne Kronert, Alain Célisse, Dalila Hattab.

The goal of anomaly detection is to identify observations generated by a process that is different from a reference one. An accurate anomaly detector must ensure low false positive and false negative rates. However in the online context such a constraint remains highly challenging due to the usual lack of control of the False Discovery Rate (FDR). In particular the online framework makes it impossible to use classical multiple testing approaches such as the Benjamini-Hochberg (BH) procedure. Our strategy overcomes this difficulty by exploiting a local control of the "modified FDR" (mFDR). An important ingredient in this control is the cardinality of the calibration set used for computing empirical p-values, which turns out to be an influential parameter. It results a new strategy for tuning this parameter, which yields the desired FDR control over the whole time series. The statistical performance of this strategy is analyzed by theoretical guarantees and its practical behavior is assessed by simulation experiments which support our conclusions. See for more details 54.

7.14 Axis 2: Optimistic Dynamic Regret Bounds

Participants: Benjamin Guedj, Maxime Haddouche.

Online Learning (OL) algorithms have originally been developed to guarantee good performances when comparing their output to the best fixed strategy. The question of performance with respect to dynamic strategies remains an active research topic. We develop in this work dynamic adaptations of classical OL algorithms based on the use of experts' advice and the notion of optimism. We also propose a constructivist method to generate those advices and eventually provide both theoretical and experimental guarantees for our procedures.

Joint work with Olivier Wintenberger (Sorbonne Université). See for more details 49.

7.15 Axis 2: Wasserstein PAC-Bayes Learning: Exploiting Optimisation Guarantees to Explain Generalisation

Participants: Benjamin Guedj, Maxime Haddouche.

PAC-Bayes learning is an established framework to both assess the generalisation ability of learning algorithms, and design new learning algorithm by exploiting generalisation bounds as training objectives. Most of the exisiting bounds involve a Kullback-Leibler (KL) divergence, which fails to capture the geometric properties of the loss function which are often useful in optimisation. We address this by extending the emerging Wasserstein PAC-Bayes theory. We develop new PAC-Bayes bounds with Wasserstein distances replacing the usual KL, and demonstrate that sound optimisation guarantees translate to good generalisation abilities. In particular we provide generalisation bounds for the Bures-Wasserstein SGD by exploiting its optimisation properties. See for details 48.

7.16 Axis 2: Generalization Bounds: Perspectives from Information Theory and PAC-Bayes

Participants: Benjamin Guedj.

A fundamental question in theoretical machine learning is generalization. Over the past decades, the PAC-Bayesian approach has been established as a flexible framework to address the generalization capabilities of machine learning algorithms, and design new ones. Recently, it has garnered increased interest due to its potential applicability for a variety of learning algorithms, including deep neural networks. In parallel, an information-theoretic view of generalization has developed, wherein the relation between generalization and various information measures has been established. This framework is intimately connected to the PAC-Bayesian approach, and a number of results have been independently discovered in both strands. In this monograph, we highlight this strong connection and present a unified treatment of generalization. We present techniques and results that the two perspectives have in common, and discuss the approaches and interpretations that differ. In particular, we demonstrate how many proofs in the area share a modular structure, through which the underlying ideas can be intuited. We pay special attention to the conditional mutual information (CMI) framework; analytical studies of the information complexity of learning algorithms; and the application of the proposed methods to deep learning. This monograph is intended to provide a comprehensive introduction to information-theoretic generalization bounds and their connection to PAC-Bayes, serving as a foundation from which the most recent developments are accessible. It is aimed broadly towards researchers with an interest in generalization and theoretical machine learning.

Joint work with Fredrik Hellström (UCL), Giuseppe Durisi (Chalmers), Maxim Raginsky (University of Illinois). See for details 50.

7.17 Axis 2: Comparing Comparators in Generalization Bounds

Participants: Benjamin Guedj.

We derive generic information-theoretic and PAC-Bayesian generalization bounds involving an arbitrary convex comparator function, which measures the discrepancy between the training and population loss. The bounds hold under the assumption that the cumulant-generating function (CGF) of the comparator is upper-bounded by the corresponding CGF within a family of bounding distributions. We show that the tightest possible bound is obtained with the comparator being the convex conjugate of the CGF of the bounding distribution, also known as the Cramér function. This conclusion applies more broadly to generalization bounds with a similar structure. This confirms the near-optimality of known bounds for bounded and sub-Gaussian losses and leads to novel bounds under other bounding distributions.

Joint work with Fredrik Hellström (UCL). See for more details 51.

7.18 Axis 2: Federated Learning with Nonvacuous Generalisation Bounds

Participants: Benjamin Guedj, Maxime Haddouche.

We introduce a novel strategy to train randomised predictors in federated learning, where each node of the network aims at preserving its privacy by releasing a local predictor but keeping secret its training dataset with respect to the other nodes. We then build a global randomised predictor which inherits the properties of the local private predictors in the sense of a PAC-Bayesian generalisation bound. We consider the synchronous case where all nodes share the same training objective (derived from a generalisation bound), and the asynchronous case where each node may have its own personalised training objective. We show through a series of numerical experiments that our approach achieves a comparable predictive performance to that of the batch approach where all datasets are shared across nodes. Moreover the predictors are supported by numerically nonvacuous generalisation bounds while preserving privacy for each node. We explicitly compute the increment on predictive performance and generalisation bounds between batch and federated settings, highlighting the price to pay to preserve privacy.

Joint work with Pierre Jobic (CEA). See for more details 52.

7.19 Axis 2: Learning via Wasserstein-Based High Probability Generalisation Bounds

Participants: Benjamin Guedj, Maxime Haddouche.

Minimising upper bounds on the population risk or the generalisation gap has been widely used in structural risk minimisation (SRM) – this is in particular at the core of PAC-Bayesian learning. Despite its successes and unfailing surge of interest in recent years, a limitation of the PAC-Bayesian framework is that most bounds involve a Kullback-Leibler (KL) divergence term (or its variations), which might exhibit erratic behavior and fail to capture the underlying geometric structure of the learning problem – hence restricting its use in practical applications. As a remedy, recent studies have attempted to replace the KL divergence in the PAC-Bayesian bounds with the Wasserstein distance. Even though these bounds alleviated the aforementioned issues to a certain extent, they either hold in expectation, are for bounded losses, or are nontrivial to minimize in an SRM framework. In this work, we contribute to this line of research and prove novel Wasserstein distance-based PAC-Bayesian generalisation bounds for both batch learning with independent and identically distributed (i.i.d.) data, and online learning with potentially non-i.i.d. data. Contrary to previous art, our bounds are stronger in the sense that (i) they hold with high probability, (ii) they apply to unbounded (potentially heavy-tailed) losses, and (iii) they lead to optimizable training objectives that can be used in SRM. As a result we derive novel Wasserstein-based PAC-Bayesian learning algorithms and we illustrate their empirical advantage on a variety of experiments.

Joint work with Umut Simsekli and Paul Viallard (EP SIERRA, CRI PRO).

See for more details 40.

7.20 Axis 2: A note on regularised NTK dynamics with an application to PAC-Bayesian training

Participants: Benjamin Guedj.

We establish explicit dynamics for neural networks whose training objective has a regularising term that constrains the parameters to remain close to their initial value. This keeps the network in a lazy training regime, where the dynamics can be linearised around the initialisation. The standard neural tangent kernel (NTK) governs the evolution during the training in the infinite-width limit, although the regularisation yields an additional term appears in the differential equation describing the dynamics. This setting provides an appropriate framework to study the evolution of wide networks trained to optimise generalisation objectives such as PAC-Bayes bounds, and hence potentially contribute to a deeper theoretical understanding of such networks.

Joint work with Eugenio Clerico (University of Oxford and Uni Pompeu Fabra).

See for more details 47.

7.21 Axis 3: Investigating spatial scan statistics for multivariate functional data

Participants: Sophie Dabo-Niang.

In environmental surveillance, cluster detection of environmental black spots is of major interest due to the adverse health effects of pollutants, as well as their known synergistic effect. Thus, this paper introduces three new spatial scan statistics for multivariate functional data, applicable for detecting clusters of abnormal air pollutants concentrations measured spatially at a very fine scale in northern France in October 2021 taking into account their correlations. Mathematically, our methodology is derived from a functional multivariate analysis of variance, an adaptation of the Hotelling $T^{2}$ -test statistic, and a multivariate extension of the Wilcoxon test statistic. The approaches were evaluated in a simulation study and then applied to the air pollution dataset.

It is a joint work with Camille Frévent (University of Lille, CERIM), Mohamed Salem Ahmed (University of Lille, CERIM), Michaël Genin (University of Lille, CERIM).

For more details, see 18.

7.22 Axis 3: On estimation and prediction in spatial functional linear regression model

Participants: Sophie Dabo-Niang.

We consider a spatial functional linear regression, where a scalar response is related to a square-integrable spatial functional process. We use a smoothing spline estimator for the functional slope parameter and establish a finite sample bound for variance of this estimator. Then we give the optimal bound of the prediction error under mixing spatial dependence. Finally, we illustrate our results by simulations and by an application to ozone pollution forecasting at nonvisited sites.

It is a joint work with Stéphane Bouka (University of , CERIM), Guy-Martial Nkiet Ahmed (University of France Ville, Gabon), Michaël Genin (University of France Ville, Gabon). For more details, see 15.

7.23 Axis 3: Spatial Autocorrelation of Global Stock Exchanges Using Functional Areal Spatial Principal Component Analysis

Participants: Sophie Dabo-Niang.

This work focuses on functional data presenting spatial dependence. The spatial autocorrelation of stock exchange returns for 71 stock exchanges from 69 countries was investigated using the functional Moran’s I statistic, classical principal component analysis (PCA) and functional areal spatial principal component analysis (FASPCA). This work focuses on the period where the 2015–2016 global market sell-off occurred and proved the existence of spatial autocorrelation among the stock exchanges studied. The stock exchange return data were converted into functional data before performing the classical PCA and FASPCA. Results from the Monte Carlo test of the functional Moran’s I statistics show that the 2015–2016 global market sell-off had a great impact on the spatial autocorrelation of stock exchanges. Principal components from FASPCA show positive spatial autocorrelation in the stock exchanges. Regional clusters were formed before, after and during the 2015–2016 global market sell-off period. This work explored the existence of positive spatial autocorrelation in global stock exchanges and showed that FASPCA is a useful tool in exploring spatial dependency in complex spatial data.

It is a joint work with Tzung Hsuen Khoo (University of Malaya, Malaysia), Dharini Pathmanathan (University of Malaya, Malaysia).

For more details, see 23.

7.24 Axis 3: Multivariate functional principal component analysis for endogenously stratified data

Participants: Sophie Dabo-Niang.

CWe address the problem of performing dimension reduction on multivariate functional data observed on different domains in an endogenously stratified sampling context. The aim is to propose a new multivariate functional principal component analysis (MFPCA) approach for data sampled by a stratification of a population according to a binary variable of interest. This estimation strategy is derived from a direct relationship between univariate and multivariate FPCA for finite Karhunen-Loève decompositions. The proposed methodology yields encouraging results and can be applied to data with measurement errors. Computational results on simulated data highlight the good performance of the proposed methodology compared to the classical MFPCA, which ignores the type of data sampling. A real-life application considering breast cancer cells data is also presented.

It is a joint work with Idris Christelle Judith Agonkoui (IMSP, Benin), Freedath Djibril Moussa (IMSP, Benin). For more details, see 66.

7.25 Axis 3: PLS regression approach for multivariate functional data with different domains

Participants: Issam Moindjie, Sophie Dabo, Cristian Preda.

Multivariate functional data is considered as sample paths of a multivariate valued stochastic process, $X = (X_{1}, ..., X_{d})$ . In this setting, each dimension $X_{i}$ , $i = 1, ..., d$ , is a stochastic process, $X_{i} = {X_{i} (t), t \in ℐ_{i}}$ , where $ℐ_{i}$ is some compact domaine of $ℝ$ . The problems of linear regression and binary classification are addressed by PLS regularization techniques. For application purposes, decision tree methods combined with functional PLS regression are proposed. For more details, see 25.

7.26 Axis 3: Group lasso regression for spatially dependent functional data

Participants: Issam Moindjie, Sophie Dabo, Cristian Preda.

Multivariate functional data is considered under the assumption of spatially dependence between dimensions. Each dimension is associated to some (spatial) clusters with potentially different effect on a response variable. In the context of linear regression with multivariate functional data, a natural assumption is to consider the same regression coefficient (slope) function for all dimensions belonging to the same cluster. Fused and group lasso techniques are extended for this purpose. This work was submitted to CSDA journal (55) and 45.

7.27 Axis 3: Linear approximation for multivariate categorical functional data

Participants: Cristian Preda, Quentin Grimonprez.

Multivariate categorical functional data can be seen as one-dimensional categorical functional data but with a number of states equal to the product of the number of states for each dimension. That yields to a largy computational complexity that can be avoid by proposing a linear approximation of the optimal encodings. Indeed, the optimal encodings are the conditional expectation of the principal components with respect to functional random vector. In our appproach this conditional espctation is considered as a linear form of the dimensions of the functional vector. See for more details 43 and 6.1.2.

7.28 Axis 4: Multi-layer group Lasso

Participants: Guillemette Marot.

Multi-Layer Group-Lasso (MLGL) is a procedure of variable selection in the context of redundancy between explanatory variables, which holds true with high-dimensional data. The proposed approach combines variable aggregation and selection in order to improve interpretability and performance. The associated R package is available on CRAN and its related publication 19, accepted in 2023 in Journal of Statistical Software, gives more details about the statistical procedure.

7.29 Axis 4: Research of biomarkers using penalised regressions

Participants: Guillemette Marot, Wilfried Heyse.

Thanks to Lasso logistic regression, a joint work with Hélène Sarter identified a novel 8-predictors signature to predict complicated disease course in pediatric-onset Crohn’s disease 28. The research of biomarkers let us the opportunity to test various multi-block approaches in order to combine clinical and omics data. Finally, we retained a very simple approach, which performed the best and offered results that were further validated.

7.30 Axis 4: Research of biomarkers using competing risks models and clustering

Participants: Guillemette Marot, Wilfried Heyse.

When using Lasso penalized Cox regressions on proteomics data to predict heart failure after myocardial infarction, we did not manage to beat predictive models relying on only clinical data. Therefore, we changed the strategy and finally used clustering to identify subtypes of patients based on proteins which could help predict heart failure 21. The methodology looks simple in the paper but there was a lot of work to define the outcome to keep and to choose the best strategy. The models including only clinical data were already good and it was challenging to obtain better results by including proteomics data. In this project, we experimentated the necessity to take into account the competing risks in the modeling and had to perform univariate analyses for proteomics before multivariate analysis. This work has relied on a strong collaboration with Florence Pinet, specialist in proteomics and Christophe Bauters, cardiologist.

7.31 Axis 4: Statistical analysis of proteomic data with empirical bayesian approaches

Participants: Guillemette Marot.

Our expertise on empirical bayesian approaches for proteomics data analysis has led to a publication in Annals of Rheumatic Diseases (impact factor 28) 27. This is a joint work with Dr S.Sanges and Pr D. Launay. The proteomic analysis has revealed potential biomarkers that may assist diagnosis and treatment of patients with systemic clerosis-associated pulmonary arterial hypertension (SSc-PAH). Further biological validation in an independent cohort revealed that chemerin, which was highlighted in the exploratory analysis, was a reliable surrogate biomarker for pulmonary vascular resistance. We also used the same kind of empirical bayesian approach with another type of proteomic data, mass spectrometry data, in order to study differential analysis between different strains of Hepatitis C virus 24. These differential analyses were complemented by partial-least squares discriminant analysis. This last work was interesting not only for biology but also for Bilille platform to set up normalisation and statistical analysis pipelines for the PLBS P3M platform.

7.32 Axis 4: Testing Abnormality of a Sequence of Graphs: Application to Cybersecurity

Participants: Christophe Biernacki, Clarisse Boinay, Cristian Preda.

The increasing number of cyber attacks on industrial networks puts human life and economies at risk. Firms usually implement fixed rules rather than anomaly detection to prevent such attacks. However, anomaly detection methods would allow for a more flexible grasp of deviations from normal behaviour. For instance, anomaly detection in graphs modeling industrial networks can sense changes in the behaviour of machines. In this work, we seek to establish whether the number of messages sent from one or more machines to one or more machines is normal or not. To this end, we first model interactions between IP addresses with dynamical graphs. Then, we construct a test statistic based on the lihelihood of a graph computed thanks to generative models such as the stochastic block model and kernel estimators. Finally, we evaluate the power of the test in realistic and generic attack scenarios. This work has been presented to the main French conference in Statistics 38.

7.33 Axis 4:Bayesian spatiotemporal modelling for disease mapping: an application to preeclampsia and gestational diabetes in Florida, United States

Participants: Sophie Dabo-Niang.

Morbidities generally show patterns of concentration that vary by space and time. Disease mapping models are useful in estimating the spatiotemporal patterns of disease risks and are therefore pivotal for effective disease surveillance, resource allocation, and the development of prevention strategies. This study considers six spatiotemporal Bayesian hierarchical models based on two spatial conditional autoregressive priors. It could serve as a guideline on the development and application of Bayesian hierarchical models to assess the emerging risk trends, risk clustering, and spatial inequality trends, with estimation of covariables’ effects on the interested disease risk. The method is applied to the Florida Birth Record data between 2006 and 2015 to study two cardiovascular risk factors: preeclampsia and gestational diabetes. High-risk clusters were detected in North Central Florida for preeclampsia and in Central Florida for gestational diabetes. While the adjusted disease trend was stable, spatial inequality peaked in 2011–2012 for both diseases. Exposure to PM2.5 at first or/and second trimester increased the risk of preeclampsia and gestational diabetes, but the magnitude is less severe compared to previous studies. In conclusion, this study underscores the significance of selecting appropriate disease mapping models in estimating the intricate spatiotemporal patterns of disease risk and suggests the importance of localized interventions to reduce health disparities. The result also identified an opportunity to study potential risk factors of preeclampsia, as the spike of risk in North Central Florida cannot be explained by current covariables.

This work is a result of a visit of Boubakari Ibrahimou (Florida International University, Miami, FL, USA) to Modal and university of Lille during two months.

It is a joint work with Ning Sun, Zoran Bursac, Ian Dryden, Roberto Lucchini, Boubakari Ibrahimou (Florida International University, Miami, FL, USA) 30.

7.34 Axis 4: Structural Changes in Temperature and Precipitation in MENA Countries

Participants: Sophie Dabo-Niang.

This paper evaluates the extent of climate variability in the Middle East and North Africa (MENA) region using time series structural change tests. The MENA region is highly susceptible to climate change, being one of the driest and most water-scarce regions in the world. The study aims to identify structural breaks in temperature and precipitation time series from 1901 to 2012. Specifically, a statistical analysis is performed based on a structural change model (Bai and Perron 1998, 2003a) for temperature and precipitation across 19 countries. The results indicate significant structural changes in temperature and precipitation patterns during the observation period, and suggest that climate variability has indeed begun to occur in all study area, with 1990 marking a turning point in terms of global warming. North African countries, Qatar, and the United Arab Emirates experienced a large number of breaks in temperature variables between 1901 and 2012, while other countries experienced fewer breaks. With regards to the seasonal aspect of precipitation, the individual rainfall Seasonality Index results demonstrate strong seasonal variability of rainfall from one year to another. Results show that rainfall in MENA countries is irregular throughout the year and that it ranges from seasonal to extremely seasonal throughout the study period. These findings have important implications for water resources management, agriculture, human health, and ecosystems in the region.

See for more details 12.

It is a joint work with Hassan Amouzay (University Mohamed V, Rabat), Raja Chakir (INRAE, Paris), Ahmed El Ghini (University Mohamed V, Rabat).

7.35 Axis 4: Spatial Relative Risk of Upper Aerodigestive Tract Cancers Incidence in French Northern Region

Participants: Sophie Dabo-Niang.

In this work, kernel spatial relative risk function estimation is of interest. We consider the case where covariates that may affect the spatial patterns of disease are contaminated by measurement errors. Finite sample properties were carried out in order to illustrate our methodology with real cancer data. We perform relative risk functions estimation on upper aerodigestive tract cancer (UADT) data to investigate locations of high and low incidence concentration in NPDC (Nord-Pas-de-Calais) French region.

For more details, see 17. It is a joint work with Emad Darwich (University of Lille), Leila Hamdad, Hamid Haddadou (ESI, Algeria), Baba Thiam (University of Lille).

7.36 Axis 4: Functional, Multivariate Functional and Spatial PCA: Application to Covid-19 Data in the African Continent

Participants: Sophie Dabo-Niang.

Covid-19 pandemic has negatively impacted many areas, including the economy and health care facilities, and has left more than 5 million deaths worldwide. In this paper, we use functional data analysis methods to describe evolution of the number of cases and the number of deaths of Covid-19 in Africa.

We perform functional principal component analysis, Multivariate functional component analysis and spatial component analysis to characterize better the phenomena and spatial data to determine the impact of a region's neighborhood on number of cases. The obtained results allow us to have a better knowledge of the evolution of the pandemic in African continent.

It is a joint work with Idris Si-Ahmed (ESI, Algeria), Mazamaesso Azeyou (AIMS, Senegal), Leila Hamdad (ESI, Algeria). See for more details 67.

8 Bilateral contracts and grants with industry

8.1 Bilateral contracts with industry

Diagrams Technologies startup

Participants: Christophe Biernacki, Cristian Preda.

Christophe Biernacki and Cristian Preda act as scientific experts for the Diagrams Technologies startup specialized in industrial data analysis a software dedicated to predictive maintenance. This startup is a spinoff of the MODAL team.

Program France-Relance : MODAL-Alicante

Participants: Cristian Preda.

The objective of this collaboration is to develop statistical learning models that explore the temporal dimension of health data within the framework of projects developed by the company ALICANTE and whose solutions are provided by the research work of the MODAL team. Ismat Draa and Rachid Boulkhir are part of this project.

Duration: 12/2021 - 12/2023 (2 years)

ADULM

Participants: Sophie Dabo-Niang, Cristian Preda.

The main goal of this projet with Lille Metropole Urban Development and Planning Agency (ADULM) is to design a tool for Territorial Coherence Scheme (SCoT) to monitor urban developments and develop territorial observation.

Duration: 01/2021 - 12/2023 (3 years)

8.2 Bilateral grants with industry

Withings

Participants: Christophe Biernacki, Cristian Preda.

Withings is a French consumer electronics company which designs and innovates in connected devices, such as the first Wi-Fi scale on the market (introduced in 2009), an FDA-cleared blood pressure monitor, a smart sleep system, and a line of automatic activity tracking watches. It also provides B2B services for healthcare providers and researchers. A PhD program begun on September 2023 on the topic of analysis of multivariate, sparse longitudinal data, with mixed co-variates, from connected medical objects.

ADEO

Participants: Christophe Biernacki, Vincent Vandewalle.

Adeo is No. 1 in Europe and No. 3 worldwide in the DIY market. A PhD began in Dec. 2020 with Axel Potier under the supervision of Christophe Biernacki, Vincent Vandewalle, Matthieu Marbac (ENSAI) and Julien Favre (ADEO) on the topic of sales forecasting concerning “slow movers” items (equivalent to item sold in low quantities).

Seckiot

Participants: Christophe Biernacki, Cristian Preda.

Seckiot is an editor of cybersecurity software to protect industrial systems & IoT. From December 2021, Clarisse Boinay begun her Cifre PhD thesis (with AID, Agence de l'Innovation de Défense) with Seckiot on the topic of “anomaly detection and change point detection in contextual dynamic asynchronous graphs with applications in OT cybersecurity” under the co-supervision of Thomas Anglade (Seckiot), Christophe Biernacki and Cristian Preda.

Decathlon

Participants: Cristian Preda.

Decathlon is a brand specializing in the large distribution of sports equipment and materials. From September 2022, François Bassac begun his PhD thesis within Inria-Decathlon partnership on the topic of predicting performances and injuries with training data under the supervision Cristian Preda.

Duration : 09/2022 - 08/2025 (3 years)

ASYGN

Participants: Sophie Dabo, Cristian Preda.

ASYGN is a company specialized on the signal treatment chain. Modal is working with this compagny and LIMMS/CNRS-IIS to apply bioMEMS technology in the field of cancer.

Duration: 01/2022 - 12/2024 (3 years)

HORIBA

Participants: Sophie Dabo, Cristian Preda.

HORIBA is a company specialized on optical spectrometry. Modal is working with this compagny and CENTRALE Lille on Raman spectroscopy and Artificial Intelligence dedicated to the synthesis in chemistry

Duration: 07/2021 - 12/2026 (6 years)

9 Partnerships and cooperations

9.1 International initiatives

9.1.1 Associate Teams in the framework of an Inria International Lab or in the framework of an Inria International Program

Since 2020, Benjamin Guedj is the founder and scientific director of the Inria London programme, an ambitious initiative to establish a joint research lab between Inria and University College London (UCL), framed within a broader bilateral Franco-British scientific cooperation.

9.2 International research visitors

9.2.1 Visits of international scientists

Participants: Sophie Dabo.

Michelle Carey (University College Dublin) visited Sophie Dabo (July 2023, one week)
Project title: Ireland and France Need Healthier Air for healthier Lungs: the Evidence
PHC Ulysses 2023

Participants: Sophie Dabo.

Sebastian Kuehnert (UC DAVIS, USA) visited Sophie Dabo (August-September 2023, one month)
Project title: Functional Spatial Time series

9.2.2 Visits to international teams

Participants: Sophie Dabo.

Sophie Dabo visited

University College Dublin (December 2023, one week)
University of Malaya (Malaysia, July-August 2023, one month)
University of Tokyo (Japan, August 2023, one week)
North West University (South Africa, February 2023, one week)
University of Vienna (Austria, April 2023, one week)
University College London (March 2023, one week)

9.3 European initiatives

9.3.1 H2020 projects

H2020 FAIR

Participants: Guillemette Marot.

Acronym: FAIR
Project title: Flagellin aerosol therapy as an immunomodulatory adjunct to the antibiotic treatment of drug-resistant bacterial pneunomia
Coordinator: JC Sirard (Inserm, CIIL)
Duration: 5-6 years (2020-2025)
Funding: 10 M euros
Partners: Inserm (France), Univ Lille (France), Freie Universitaet Berlin (DE), Epithelix (CH), Aerogen (IE), Statens Serum Institut (DK), CHRU Tours (France), Academisch Medisch Centrum bij de Universiteit van Amsterdam (NL), University of Southampton (UK), European respiratory society (CH)
Contribution: FAIR, project coordinated by JC. Sirard (Inserm, CIIL), aims at evaluating an alternative adjunct strategy to standard of care antibiotics for treating pneumonia caused by antibiotic-resistant bacteria: activation of the innate immune system in the airways. Guillemette Marot is involved in this H2020 project both as the scientific head of Bilille platform and as a researcher. At the beginning of the project, she has contributed to preliminary development of a tool to facilitate multi-omics data integration (visCorVar 6.1.4). In 2023, following reorganisation of the whole project by the coordinator, she has mostly contributed to longitudinal omics data analysis, by co-supervising an engineer with Pierre Pericard (Ulille, PLBS).

9.4 National initiatives

9.4.1 PEPR IA

Benjamin Guedj is a co-I of the project SHARP (PI: Rémi Gribonval, EP OCKHAM, CRI LYS) funded by the PEPR IA (2023-2027, overall funding 7M euros).

9.4.2 SIRIC EN-HOPE SMART4CBT

Participants: Sophie Dabo.

Acronym: EN-HOPE SMART4CBT
East North-Hematology Oncology Pediatric consortium offering a research program of Social sciences, Microenvironment and multiomics Analyses in RadioTherapy resistance For Children Brain Tumors
Coordinator: ENTZ-WERLE Natacha (Inserm, University Hospital of Strasbourg)
Duration: 5 years (2024-2028)
Funding: 3M euros
Partners: Inserm, CNRS, (France), Univ Lille, University Hospital of Nancy (CHRU Nancy), Oscar Lambret Centre in Lille, University Hospital of Lille (CHU Lille), ICANS, Institut du CANcer de Strasbourg Europe in Strasbourg ICL, Institut de Cancérologie de Lorraine in Nancy University of Strasbourg University of Lorraine
Contribution: INCA

9.4.3 ANR

APRIORI

Participants: Benjamin Guedj.

Type: ANR PRC
Acronym: APRIORI
Project title: PAC-Bayesian theory and algorithms for deep learning and representation learning
Coordinator: Emilie Morvant (Université Jean Monnet)
Duration: 2019–2023
Funding: 300k EUR
Partners: MODAL, Laboratoire Hubert Curien (UMR CNRS 5516)

BEAGLE

Participants: Benjamin Guedj [coordinator], Pascal Germain.

Type: ANR JCJC
Acronym: BEAGLE
Duration: 2019–2023
Project title: PAC-Bayesian theory and algorithms for agnostic learning
Funding: 180k EUR
Partners: Pierre Alquier (RIKEN AIP, Japan), Peter Grünwald (CWI, The Netherlands), Rémi Bardenet (UMR CRIStAL 9189)

Synapark

Participants: Guillemette Marot.

Type: ANR PRC
Acronym: Synapark
Project title: Evaluation of the role of parkin to alpha-synuclein-regulation in vitro, in vivo and in Parkinson's disease patient's blood samples
Coordinator: Christine Alves da Costa (Inserm, IPMC)
Duration: 42 months (2020–2024)
Funding: 540k euros
Partners: CNRS, Université Côte d'Azur, Univ. Lille, Inserm
Contribution: Statistical analysis of transcriptomics data

CYTOMEMS

Participants: Sophie Dabo, Cristian Preda.

Type: ANR AAPG
Acronym: CYTOMEMS
Project title: Smart MEMS Instrumentation for Biophysical flow Cytometry with Statistical Learning
Coordinator: Dominique Collard (CNRS)
Duration: 2022–2024
Funding: 600k EUR
Partners: MODAL, Laboratoire Hubert Curien (UMR LIMMS CNRS IMU 2820)

Oesomics

Participants: Guillemette Marot.

Type: ANR AAP Recherche translationnelle en santé
Acronym: Oesomics
Project title: Molecular signatures of esophageal atresia: towards the identification of the molecular causes of the different forms of esophageal atresia and prenatal diagnosis
Coordinator: Frédéric Gottrand (Univ. Lille, CHU Lille, Infinite)
Duration: 36 months (2022–2027)
Funding: 233k euros
Partners: CHU Lille, PRISM, PLBS-Goal, PLBS-bilille
Contribution: Statistical analysis of multi-omics (mainly transcriptomics and proteomics) data

TransEAsome

Participants: Guillemette Marot.

Type: AMI Maladies rares
Acronym: TransEAsome
Project title: Long term outcome of esophageal atresia: transomics profiles in adolescence
Coordinator: Frédéric Gottrand (Univ. Lille, CHU Lille, Infinite)
Duration: 72 months (2022–2027)
Funding: 1.4M euros
Partners: CHU Lille, Univ. Lille, Inserm NO, Inserm ADR - GO, CRACMO, FIMATHO
Contribution: Statistical analysis of multi-omics (mainly transcriptomics and proteomics) data

9.4.4 RHU and FHU

A RHU (recherche hospitalo-universitaire) is an excellence programme funded by PIA (program of investment for the future) and selected by ANR. A FHU is a federative project and a label necessary to postulate for a RHU.

RHU PreciNASH

Participants: Guillemette Marot.

Acronym: PreciNASH
Project title: Non-alcoholic steato-hepatitis (NASH) from disease stratification to novel therapeutic approaches
Coordinator: François Pattou (Université de Lille, CHU Lille)
Duration: 7 years (2016–2023)
Partners: FHU Integra and Sanofi
Contribution: PreciNASH, project coordinated by Pr. F. Pattou (UMR 859, EGID), aims at better understanding non alcoholic stratohepatitis (NASH) and improving its diagnosis and care. In this RHU, Guillemette Marot has supervised a 2 years post-doc, as her team ULR 2694 METRICS is a member of the FHU Integra. She also has supervised during three years an engineer of Bilille platform for this project. METRICS is involved in the WP1 for the development of a clinical-biological model for the prediction of NASH. Bilille is involved in the task which consists to better stratify patients using unsupervised clustering. Other partners of the FHU are UMR 859, UMR 1011 and UMR 8199, these last three teams being part of the labex EGID (European Genomic Institute for Diabetes). Sanofi is the main industrial partner of the RHU PreciNASH. More information on this project at PreciNASH project.

FHU PRECISE

Participants: Guillemette Marot, Christophe Biernacki.

Acronym: PRECISE
Project title: PREcision health in Complex Immune-mediated inflammatory diseaSEs
Coordinator: David Launay (U. Lille, CHU Lille)
Duration: 5 years (2021–2025)
Partners: CHU Lille, CHU Amiens, CHU Rouen, CHU Caen, Université de Lille, Université de Picardie, Université de Rouen, Inserm
Contribution: The objective of FHU PRECISE is to structure care, research and teaching relative to care of patients who suffer from complex IMID (Immune mediated inflammatory diseases) with an interdisciplinary approach. Guillemette Marot is the co-head with Vincent Sobanski of the WP2 workpackage, which aims at creating a «virtual patient» and cluster patients based on their clinical and omic profiles. In this WP, she is involved both in the analysis task with Bilille platform and in the research task led by Christophe Biernacki, involving MODAL team. This research task aims at combining complex data and integrating temporal structure in order to identify patient's care pathways. Guillemette Marot is also participating with Bilille platform in WP3 for the research of a molecular signature predictive of the treatment response (resistance and complication).

9.4.5 Inria national initiatives

”Inria Challenge” ROAD-AI with Cerema

Participants: Vincent Vandewalle, Christophe Biernacki, Cristian Preda.

Cerema (Centre d'études et d'expertise sur les risques, l'environnement, la mobilité et l'aménagement - Centre for Studies on Risks, the Environment, Mobility and Urban Planning) is a public institution dedicated to supporting public policies, under the dual supervision of the ministry for ecological transition and the ministry for regional cohesion and local authority relations. MODAL is involved in the ROAD-AI (Routes et Ouvrages d'Art Diversiformes, Augmentés & Intégrés) “Inria Challenge”, with five other Inria teams (ACENTAURI, COATI, FUN, STATIFY, TITANE) including statistics, robotics, telecomunication, sensors network and 3D modeling. This four year project (starting in 2021) aims at having more sustainable, safer and more resilient transport infrastructures.

Program "Action Exploratoire" PATH : METRICS and CHU Lille

Participants: Sophie Dabo (coordinator), Christophe Biernacki, Guillemette Marot, Cristian Preda.

The research project is part of an INRIA exploratory action by a consortium of doctors, bio-statisticians and statisticians. The aim is to provide a better understanding of the key stages in the patient's care pathway by bringing together the producers of data as close to the patient as possible, those who manage them, those who pre-process them, and those who analyse them, in order to obtain results as close to the field as possible and to provide the most efficient feedback to the clinician and the patient.

The project, which is essentially interdisciplinary and exploratory, is a continuation of past collaborations between members of the two units INRIA-MODAL and METRICS (University of Lille/CHU Lille). It could not be carried out without close collaboration between doctors and researchers in applied mathematics.

The analysis of care pathways and their adequacy to needs and resources has thus become a major scientific and administrative challenge. Although the digital data available for this purpose is increasing rapidly, the statistical methods and tools available to researchers and health authorities remain limited and inefficient.

The types of care pathways are very numerous. As part of this exploratory action, we propose to focus on two cases of application: 1) an ambulatory care pathway (city-hospital link); 2) an intra-hospital care pathway. This choice is justified by METRICS' solid expertise in these pathways, based on several years of research, as well as close links with clinicians who are experts in these issues.

Duration: 3 years (1/09/2021 - 31/12/2024)

9.4.6 Other national initiatives

Industrial Chair Smart digicat

Participants: Cristian Preda, Sophie Dabo.

SmartDigiCat is a project led by Sebastien Paul (Professor at Centrale Lille, researcher at Unité de Catalyse et Chimie du Solide (UCCS – UMR CNRS 8181)) and involving several companies (SOLVAY, HORIBA, TEAMCAT SOLUTIONS) and academic laboratories (UCCS, CRIStAL, Inria and l’Institut Eugène Chevreul).

The consortium of the SmartDigiCat chair will develop an innovative approach for safer and more environmentally-friendly catalytic processes design. The innovation will emerge from the powerful combination of high-throughput experiments, theoretical chemistry and artificial intelligence. The domains of application of the tools developed for catalysis will be extended, among others, to materials and formulations.

Cristian Preda and Sophie Dabo are implicated in the artificial intelligence part of the project. This part requires functional data analysis tools and challenging developments, for example to optimize the chemical process in order to obtain a target spectrum.

Duration: 6 years (1/07/2021 - 31/12/2026)

French Institute of Bioinformatics (IFB) and EquipEx+ MuDiS4LS

Participants: Guillemette Marot.

Coordinators:IFB co-heads (changes in 2023)
Duration: 7 years (2021 – 2028)
Abstract: Bilille, the bioinformatics platform of Lille, is a member of IFB, the French Institute of Bioinformatics. IFB has obtained the funding of EquipEx+ MuDiS4LS (Mutualised Digital Spaces for FAIR data in Life and Health Science). As the scientific head of Bilille platform, Guillemette Marot is also the scientific head of the Univ. Lille partner for this EquipEx+. As a researcher, she will participate to implementation studies involving integration of complex data (IS1 and IS4). More information given by IFB.

9.4.7 Working groups

Sophie Dabo-Niang belongs to the following working groups:
- STAFAV (STatistiques pour l'Afrique Francophone et Applications au Vivant)
- ERCIM Working Group on computational and Methodological Statistics, Nonparametric Statistics Team
- Franco-African IRN (International Research Network) in Mathematics, funded by CNRS
- ONCOLille (Cancer Research Institute in Lille)
Benjamin Guedj belongs to the following working groups (GdR) of CNRS:
- ISIS (local referee for Inria Lille - Nord Europe)
- MaDICS
- MASCOT-NUM (local referee for Inria Lille - Nord Europe)
Guillemette Marot belongs to the StatOmique and the LEGO (machine learning for genomics) working groups.

9.5 Regional initiatives

Collaborations of the year linked to Bilille

Participants: Guillemette Marot.

Bilille, the bioinformatics platform of Lille, has offered opportunities of collaborations with teams in biology and Health for projects with local partners. Guillemette Marot has supervised the data analysis part for the following research projects involving engineers from Bilille (only the names of the principal investigators of the project are given even if several partners are sometimes involved in the project):

CIIL, Y. Rouillé
LilNCog, D. Vieau
SCALab, Y. Coello,
OncoThai, K. Chen
PLBS, J.-M. Saliou
PRISM, T. Cardon

Collaborations of the year linked to ONCOLille

Participants: Sophie Dabo.

SMMIL-E, C. Tarhan
LIMMS, D. Collard
Phycell, L. Lemonier,
Canther, M. Cheock

10 Dissemination

Participants: Christophe Biernacki, Benjamin Guedj, Cristian Preda, Sophie Dabo, Guillemette Marot, Hemant Tiagy.

10.1 Promoting scientific activities

10.1.1 Scientific events: organisation

Christophe Biernacki and Hemant Tyagi organized on March 2023 a workshop dedicated to statistical learning on LARge scale GRaphs (LARGR) .

Guillemette Marot organized on July 2023 a workshop about numerical twins (both scientific and logistic organization).

Guillemette Marot organized on November 2023 two workshops related to scientific days of CNRS GDR BIM (Bioinformatique Moléculaire): - StatOmique (logistic and scientific organization), - LEGO (only logistic organization).

General chair, scientific chair

Guillemette Marot was the scientific chair of the second session in the morning of LEGO workshop.

Member of the conference program committees

Christophe Biernacki was member of an Inria/IIT DELHI workshop in New Delhi, related to the partnership between Inria and IIT Delhi. He gave also a talk on Digital Science for Disability for presenting and overview of the initiatives undertaken by Inria on this topic 33.

10.1.2 Journal

Member of the editorial boards

Cristian Preda is an Associate Editor for Methodology and Computing in Applied Probability .

Benjamin Guedj is an Associate Editor for the journals JMLR, TMLR, Information and Inference, Data-Centric Engineering.

Reviewer - reviewing activities

Christophe Biernacki acted as a reviewer for different journals (Statistics and Computing, Journal of Classification, Journal of Computational and Graphical Statistics...) and a conference (CAp 2023).

Guillemette Marot acted as a reviewer for ANR evaluation committee CE45 (Mathematics and Numerical sciences for biology and health)

Cristian Preda acted as a reviewer for Computational Statistics Journal.

Benjamin Guedj is a reviewer for JMLR, TMLR, Annals of Statistics, EJS, and most of the top-tier machine learning conferences (AISTATS, COLT, ICML, NeurIPS).

10.1.3 Invited talks

Christophe Biernacki was invited to give a plenary talk 36 and several all other talks 42, 32, 35, 57.

Hemant Tyagi gave a talk at the Inria/IIT Delhi workshop in New Delhi, and also an online talk at the City U. Hong Kong (Dept. of Mathematics).

10.1.4 Leadership within the scientific community

Christophe Biernacki was elected as a Vice-head of the SFdS (Société Française de Statistique) since June 2022, which is the French society specialized in Statistics, whose mission is to promote the use of statistics and its understanding and to foster it smethodological developments.

Guillemette Marot is the scientific head of Bilille platform, labelled by IBiSA and member platform of the French Institute of Bioinformatics.

10.1.5 Scientific expertise

Cristian Preda gave a talk for Inria Academy program on generative models for articificial inteligence. See for more details Inria Academy program.

10.1.6 Research administration

Since January 2020, Christophe Biernacki acts as a deputy scientific director of Inria at the national level in charge of the domain “Applied mathematics, computation and simulation". Moreover, between October and December 2023, he was Director of the Inria research center at Lille (intérim).

Benjamin Guedj is the founder and scientific director of Inria London since 2020.

10.2 Teaching - Supervision - Juries

10.2.1 Teaching

Christophe Biernacki gave four lessons on clustering for the “Ateliers tistiques de la SFdS” on June 2023 62, 63, 64.
Hemant Tyagi is teaching
- Master: Statistics I, 24h, M1, Centrale Lille, France
- Master: Statistics II, 24h, M1, Centrale Lille, France
Sophie Dabo-Niang is teaching
- Master: Spatial Statistics, 24h, M2, Université de Lille, France
- Master: Advanced Statistics, 24h, M2, Université de Lille, France
- Master: Multivariate Data Analyses, 24h, M2, Université de Lille, France
- Licence: Probability, 24h, L2, Université de Lille, France
- Licence: Multivariate Statistics, 24h, L3, Université de Lille, France
Guillemette Marot is teaching
- Licence: Biostatistics, 20h, L1, Université de Lille (Faculty of Medicine), France
- Master: Biostatistics, 50h, M1, Université de Lille (Faculty of Medicine), France
- Master: Supervised classification, 20h, M1, Polytech'Lille, France
- Master: Biostatistics, 86h, M1, Université de Lille (Departments of Computer Science and Biology), France
- Master: Artificial intelligence and health, M2, 3h, Université de Lille (Graduate school precision Health), France
- Master: Statistical analysis of omic data, 15h, M2, Université de Lille (Department of Mathematics), France
- Doctorat: Introduction to statistical analysis of omics data, 12h, Université de Lille (Faculty of Medicine), France
Cristian Preda is teaching
- Polytech'Lille engineer school: Linear Models, 48h.
- Polytech'Lille engineer school: Advanced statistics, 48h.
- Polytech'Lille engineer school: Biostatistics, 10h.
- Polytech'Lille engineer school: Supervised clustering, 24h. France
Benjamin Guedj is teaching
- Probabilistic Modelling (M2, 30h), University College London, United Kingdom

10.2.2 Supervision

PhD in progress

Axel Potier works on sale prediction for low turn-over products. Started in November 2020 under the supervision of Christophe Biernacki, Matthieu Marbac, Vincent Vandewalle.
Clarisse Boinay works on anomaly detection and change point detection in contextual dynamic asynchronous graphs with applications in OT cybersecurity. Started in December 2021 under the supervision of Christophe Biernacki and Cristian Preda.
Violaine Courrier works on the analysis of multivariate, sparse longitudinal data, with mixed co-variates, from connected medical objects. Started in September 2023 under the supervision of Christophe Biernacki and Cristian Preda.
Eglantine Karle works on dynamic ranking and translation synchronization on dynamic graphs. Started in November 2020 under the supervision of Hemant Tyagi.
François Bassac works on functional data analysis for sport performance prediction. Started in September 2022 under the supervision of Cristian Preda.
Reuben Adams works on PAC-Bayes theory. Started in September 2020 under the supervision of Benjamin Guedj.
Clara Dubois works on functional data analysis with applications in Raman spectroscopy and chemical synthesis. Started in June 2023 under the supervision of Sophie Dabo.
Antonin Schrab focuses on designing kernel-based hypothesis tests for two-sample comparisons. Started in September 2020 under the supervision of Benjamin Guedj.
Maxime Haddouche works on statistical learning and PAC-Bayes theory. Started in September 2021 under the supervision of Benjamin Guedj.
Etienne Kronert works on anomaly detection in time series. Started in September 2020 under the supervision of Alain Celisse.

10.2.3 Juries

Christophe Biernacki acted as a reviewer for 5 PhD theses and for 1 HdR.
Cristian Preda acted as a reviewer for 1 HDR.
Benjamin Guedj acted as a reviewer for 3 PhD theses (Denmark, Germany, France).

11 Scientific production

11.1 Major publications

1 articleP.Pierre Alquier and B.Benjamin Guedj. Simpler PAC-Bayesian Bounds for Hostile Data.Machine Learning2018HAL DOI
2 articleP.Parmeet Bathia, S.Serge Iovleff and G.G. Govaert. An R Package and C++ library for Latent block models: Theory, usage and applications.Journal of Statistical Software2016HAL
3 articleC.Christophe Biernacki and A.Alexandre Lourme. Unifying Data Units and Models in (Co-)Clustering.Advances in Data Analysis and Classification1241May 2018HAL
4 articleA.Alain Celisse. Optimal cross-validation in density estimation with the L2-loss.The Annals of Statistics4252014, 1879--1910HAL
5 articleS.Sophie Dabo-Niang, C.Camille Ternynck and A.-F.Anne-Francoise Yao. Nonparametric prediction in the multivariate spatial context.Journal of Nonparametric Statistics2822016, 428-458HAL DOI
6 articleJ.Julie Dubois, V.Vanessa Dubois, H.Hélène Dehondt, P.Parisa Mazrooei, C.Claire Mazuy, A. A.Aurélien A. Sérandour, C.Céline Gheeraert, P.Penderia Guillaume, E.Eric Baugé, B.Bruno Derudas, N.Nathalie Hennuyer, R.Réjane Paumelle, G.Guillemette Marot, J. S.Jason S. Carroll, M.Mathieu Lupien, B.Bart Staels, P.Philippe Lefebvre and J.Jerome Eeckhoute. The logic of transcriptional regulator recruitment architecture at cis -regulatory modules controlling liver functions.Genome Research276June 2017, 985--996HAL DOI
7 inproceedingsG.Gaël Letarte, P.Pascal Germain, B.Benjamin Guedj and F.François Laviolette. Dichotomize and Generalize: PAC-Bayesian Binary Activated Deep Neural Networks.NeurIPS 2019Vancouver, CanadaDecember 2019HAL
8 articleM.Matthieu Marbac, C.Christophe Biernacki and V.Vincent Vandewalle. Model-based clustering of Gaussian copulas for mixed data.Communications in Statistics - Theory and MethodsDecember 2016HAL
9 articleC.Cristian Preda, Q.Quentin Grimonprez and V.Vincent Vandewalle. Categorical Functional Data Analysis. The cfda R Package.Mathematics 923December 2021, 31HAL DOI
10 articleH.Hemant Tyagi and J.Jan Vybiral. Learning general sparse additive models from point queries in high dimensions.Constructive ApproximationJanuary 2019HAL

11.2 Publications of the year

International journals

11 articleM.-S.Mohamed-Salem Ahmed, M.Mamadou N’diaye, M. K.Mohammed Kadi Attouch and S.Sophie Dabo-Niang. k-nearest neighbors prediction and classification for spatial data.Journal of Spatial Econometrics41November 2023, 12HAL DOI back to text
12 articleH.Hassan Amouzay, R.Raja Chakir, S.Sophie Dabo-Niang and A.Ahmed El Ghini. Structural Changes in Temperature and Precipitation in MENA Countries.Earth Systems and Environment72May 2023, 359-380HAL DOI back to text
13 articleE.Ernesto Araya, E.Eglantine Karlé and H.Hemant Tyagi. Dynamic Ranking and Translation Synchronization.Information and Inference123September 2023, 2224-2266HAL DOI back to text
14 articleC.Christophe Biernacki, J.Julien Jacques and C.C. Keribin. A Survey on Model-Based Co-Clustering: High Dimension and Estimation Challenges.Journal of ClassificationJuly 2023HAL back to text
15 articleS.Stéphane Bouka, S.Sophie Dabo-Niang and G. M.Guy Martial Nkiet. On estimation and prediction in spatial functional linear regression model.Lithuanian Mathematical Journal631February 2023, 13-30HAL DOI back to text
16 articleG.Guillaume Braun and H.Hemant Tyagi. Minimax Optimal Clustering of Bipartite Graphs with a Generalized Power Method.Information and Inference123September 2023, 1830-1866HAL DOI back to text
17 articleS.Sophie Dabo-Niang, E.Emad Darwich, L.Leila Hamdad and B.Baba Thiam. Spatial Relative Risk of Upper Aerodigestive Tract Cancers Incidence in French Northern Region.SN Computer Science41January 2023, 30HAL DOI back to text
18 articleC.Camille Frévent, M.-S.Mohamed-Salem Ahmed, S.Sophie Dabo-Niang and M.Michaël Genin. Investigating spatial scan statistics for multivariate functional data.Journal of the Royal Statistical Society: Series C Applied Statistics722May 2023, 450-475HAL DOI back to text
19 articleQ.Quentin Grimonprez, S.Samuel Blanck, A.Alain Celisse and G.Guillemette Marot. MLGL: An R package implementing correlated variable selection by hierarchical clustering and group-Lasso.Journal of Statistical Software1063March 2023HAL DOI back to text
20 articleM.Maxime Haddouche and B.Benjamin Guedj. PAC-Bayes Generalisation Bounds for Heavy-Tailed Losses through Supermartingales.Transactions on Machine Learning Research JournalApril 2023HAL
21 articleW.Wilfried Heyse, V.Vincent Vandewalle, G.Guillemette Marot, P.Philippe Amouyel, C.Christophe Bauters and F.Florence Pinet. Identification of patient subtypes based on protein expression for prediction of heart failure after myocardial infarction.iScience263March 2023, 106171HAL DOI back to text
22 articleE.Eglantine Karlé and H.Hemant Tyagi. Dynamic Ranking with the BTL Model: A Nearest Neighbor based Rank Centrality Method.Journal of Machine Learning Research24269September 2023, 1--57HAL back to text
23 articleT. H.Tzung Hsuen Khoo, D.Dharini Pathmanathan and S.Sophie Dabo-Niang. Spatial Autocorrelation of Global Stock Exchanges Using Functional Areal Spatial Principal Component Analysis.Mathematics 113January 2023, 674HAL DOI back to text
24 articleE.Esther Martin de Fourchambault, N.Nathalie Callens, J.-M.Jean-Michel Saliou, M.Marie Fourcot, O.Oceane Delos, N.Nicolas Barois, Q.Quentin Thorel, S.Santseharay Ramirez, J.Jens Bukh, L.Laurence Cocquerel, J.Justine Bertrand-Michel, G.Guillemette Marot, Y.Yasmine Sebti, J.Jean Dubuisson and Y.Yves Rouillé. Hepatitis C virus alters the morphology and function of peroxisomes.Frontiers in Microbiology14September 2023, 1254728HAL DOI back to text
25 articleI.-A.Issam-Ali Moindjié, S.Sophie Dabo-Niang and C.Cristian Preda. Classification of multivariate functional data on different domains with Partial Least Squares approaches.Statistics and ComputingOctober 2023, 5HAL DOI back to text
26 articleV.Violeta Raverdy, E.Estelle Chatelain, G.Guillaume Lasailly, R.Robert Caiazzo, J.Jimmy Vandel, H.Hélène Verkindt, C.Camille Marciniak, B.Benjamin Legendre, P.Pierre Bauvin, N.Naima Oukhouya-Daoud, G.Gregory Baud, M.Mikael Chetboun, M.-C.Marie-Christine Vantyghem, V.Viviane Gnemmi, E.Emmanuelle Leteurtre, B.Bart Staels, P.Philippe Lefebvre, P.Philippe Mathurin, G.Guillemette Marot and F.Francois Pattou. Combining diabetes, sex, and menopause as meaningful clinical features associated with NASH and liver fibrosis in individuals with class II and III obesity: A retrospective cohort study..Obesity31November 2023, 3066-3076HAL DOI
27 articleS.Sébastien Sanges, L.Lisa Rice, L.Ly Tu, E.Eleanor Valenzi, J.-L.Jean-Luc Cracowski, D.David Montani, J.Julio Mantero, C.Camille Ternynck, G.Guillemette Marot, A.Andreea Bujor, E.Eric Hachulla, D.David Launay, M.Marc Humbert, C.Christophe Guignabert and R.Robert Lafyatis. Biomarkers of haemodynamic severity of systemic sclerosis-associated pulmonary arterial hypertension by serum proteome analysis.Annals of the Rheumatic DiseasesFebruary 2023, ard-2022-223237HAL DOI back to text
28 articleH.Hélène Sarter, G.Guillaume Savoye, G.Guillemette Marot, D.Delphine Ley, D.Dominique Turck, J.-P.Jean-Pierre Hugot, F.Francis Vasseur, A.Alain Duhamel, P.Pauline Wils, F.Fred Princen, J.-F.Jean-Frédéric Colombel, C.Corinne Gower-Rousseau, M.Mathurin Fumery, R.R Al Hameedi, M.M Al Khatib, S.S Al Turk, E.E Agoute, J.J Andre, M.M Antonietti, A.A Aouakli, A.A Armand, L.L Armengol-Debeir, I.I Aroichane, F.F Assi, J.J Aubet, E.E Auxenfants, A.A Avram, F.F Ayafi-Ramelot, K.K Azzouzi, D.D Bankovski, B.B Barbry, N.N Bardoux, P.P Baron, A.A Baudet, P.P Bayart, B.B Bazin, A.A Bebahani, J.J Becqwort, S.S Bellati, V.V Benet, H.H Benali, C.C Benard, C.C Benguigui, E.E Ben Soussan, A.A Bental, I.I Berkelmans, J.J Bernet, K.K Bernou, C.C Bernou-Dron, P.P Bertot, N.N Bertiaux-Vandaële, V.V Bertrand, E.E Billoud, N.N Biron, B.B Bismuth, M.M Bleuet, F.F Blondel, V.V Blondin, M.M Bobula, P.P Bohon, V.V Bondjemah, E.E Boniface, D.D Bonkovski, P.P Bonnière, E.E Bonvarlet, P.P Bonvarlet, A.A Boruchowicz, R.R Bostvironnois, M.M Boualit, A.A Bouazza, B.B Bouche, C.C Boudaillez, C.C Bourgeaux, M.M Bourgeois, A.A Bourguet, A.A Bourienne, H.H Boutaleb, A.A Bouthors, J.J Branche, G.G Bray, F.F Brazier, P.P Breban, M.M Bridenne, H.H Brihier, L.L Bril, V.V Brung-Lefebvre, P.P Bulois, P.P Burgiere, J.J Butel, J.J Canva, V.V Canva-Delcambre, J.J Capron, F.F Cardot, S.S Carette, P.P Carpentier, E.E Cartier, J.J Cassar, M.M Cassagnou, J.J Castex, P.P Catala, S.S Cattan, S.S Catteau, B.B Caujolle, G.G Cayron, C.C Chandelier, M.M Chantre, J.J Charles, T.T Charneau, M.M Chavance-Thelu, A.A Cheny, D.D Chirita, A.A Choteau, J.J Claerbout, P.P Clergue, H.H Coevoet, G.G Cohen, R.R Collet, M.M Colin, J.J Colombel, S.S Coopman, L.L Cordiez, J.J Corvisart, A.A Cortot, F.F Couttenier, J.J Crinquette, V.V Crombe, I.I Dadamessi, H.H Daoudi, V.V Dapvril, T.T Davion, S.S Dautreme, J.J Debas, S.S Decoster, N.N Degrave, F.F Dehont, C.C Delatre, R.R Delcenserie, D.D Delesalle, O.O Delette, T.T Delgrange, L.L Delhoustal, J.J Delmotte, S.S Demmane, G.G Deregnaucourt, P.P Descombes, J.J Desechalliers, P.P Desmet, P.P Desreumaux, G.G Desseaux, P.P Desurmont, A.A Devienne, E.E Devouge, M.M Devred, A.A Devroux, A.A Dewailly, S.S Dharancy, A.A Di Fiore, D.D Djedir, R.R Djedir, W.W Doleh, M.M Dreher-Duwat, R.R Dubois, C.C Duburque, P.P Ducatillon, J.J Duclay, B.B Ducrocq, F.F Ducrot, P.P Ducrotte, A.A Dufilho, C.C Duhamel, D.D Dujardin, C.C Dumant-Forest, J.J Dupas, F.F Dupont, Y.Y Duranton, A.A Duriez, N.N Duveau, K.K El Achkar, M.M El Farisi, C.C Elie, M.M Elie-Legrand, A.A Elkhaki, M.M Eoche, E.E Essmaeel, D.D Evrard, J.J Evrard, A.A Fatome, B.B Filoche, L.L Finet, M.M Flahaut, C.C Flamme, D.D Foissey, P.P Fournier, M.M Foutrein-Comes, P.P Foutrein, D.D Fremond, T.T Frere, P.P Gallais, C.C Gamblin, S.S Ganga, R.R Gerard, G.G Geslin, Y.Y Gheyssens, N.N Ghossini, S.S Ghrib, T.T Gilbert, B.B Gillet, D.D Godart, P.P Godard, J.J Godchaux, R.R Godchaux, G.G Goegebeur, O.O Goria, F.F Gottrand, P.P Gower, B.B Grandmaison, M.M Groux, C.C Guedon, L.L Guerbeau, M.M Gueroult-Dero, J.J Guillard, L.L Guillem, F.F Guillemot, D.D Guimberd, B.B Haddouche, S.S Hakim, D.D Hanon, V.V Hautefeuille, P.P Heckestweiller, G.G Hecquet, J.J Hedde, H.H Hellal, P.P Henneresse, B.B Heyman, M.M Heraud, S.S Herve, P.P Hochain, L.L Houssin-Bailly, P.P Houcke, B.B Huguenin, S.S Iobagiu, S.S Istanboli, A.A Ivanovic, I.I Iwanicki-Caron, E.E Janicki, M.M Jarry, J.J Jeu, J.J Joly, C.C Jonas, A.A Jouvenet, F.F Katherin, A.A Kerleveo, A.A Khachfe, A.A Kiriakos, J.J Kiriakos, O.O Klein, M.M Kohut, R.R Kornhauser, D.D Koutsomanis, J.J Laberenne, E.E Lacotte, G.G Laffineur, M.M Lagarde, A.A Lalanne, A.A Lalieu, P.P Lannoy, J.J Lapchin, M.M Laprand, D.D Laude, R.R Leblanc, P.P Lecieux, S.S Lecleire, N.N Leclerc, C.C Le Couteulx, J.J Ledent, J.J Lefebvre, P.P Lefiliatre, C.C Le Goffic, C.C Legrand, A.A Le Grix, P.P Lelong, B.B Leluyer, C.C Lemaitre, C.C Lenaerts, G.G Lepeut, L.L Lepileur, A.A Leplat, E.E Lepoutre-Dujardin, H.H Leroi, M.M Leroy, P.P Le Roy, B.B Lesage, J.J Lesage, X.X Lesage, J.J Lesage, I.I Lescanne-Darchis, J.J Lescut, D.D Lescut, B.B Leurent, P.P Levy, M.M Lhermie, L.L Libier, A.A Lion, B.B Lisambert, I.I Loge, F.F Loire, J.J Loreau, S.S Louf, A.A Louvet, L.L Lubret, M.M Luciani, D.D Lucidarme, J.J Lugand, O.O Macaigne, D.D Maetz, D.D Maillard, H.H Mancheron, O.O Manolache, A.A Marks-Brunel, C.C Marre, R.R Marti, F.F Martin, G.G Martin, E.E Marzloff, P.P Mathurin, J.J Mauillon, V.V Maunoury, J.J Maupas, M.M Medam Djomo, C.C Mechior, Z.Z Melki, B.B Mesnard, P.P Metayer, L.L Methari, B.B Meurisse, F.F Meurisse, L.L Michaud, X.X Mirmaran, P.P Modaine, A.A Monthe, L.L Morel, P.P Mortier, E.E Moulin, O.O Mouterde, N.N Mozziconaci, J.J Mudry, M.M Nachury, M.M Ngo, E.Eric N’guyen Khac, B.B Notteghem, V.V Ollevier, A.A Ostyn, A.A Ouraghi, B.B Oussadou, D.D Ouvry, B.B Paillot, C.C Painchart, N.N Panien-Claudot, C.C Paoletti, A.A Papazian, B.B Parent, B.B Pariente, J.J Paris, P.P Patrier, T.T Paupard, B.B Pauwels, M.M Pauwels, E.E Penninck, R.R Petit, M.M Piat, S.S Piotte, C.C Plane, B.B Plouvier, E.E Pollet, P.P Pommelet, D.D Pop, C.C Pordes, G.G Pouchain, P.P Prades, A.A Prevost, J.J Prevost, G.G Quartier, B.B Quesnel, A.A Queuniet, J.J Quinton, A.A Rabache, P.P Rabelle, G.G Raclot, S.S Ratajczyk, D.D Rault, V.V Razemon, N.N Reix, T.T Renaut-Vantroys, M.M Revillion, G.G Riachi, C.C Richez, P.P Robinson, J.J Rodriguez, J.J Roger, J.J Roux, A.A Rudelli, A.A Saber, G.G Savoye, P.P Schlossberg, D.D Sefrioui, M.M Segrestin, D.D Seguy, C.C Seminur, M.M Serin, A.A Seryer, F.F Sevenet, N.N Shekh, J.J Silvie, V.V Simon, C.C Spyckerelle, N.N Talbodec, N.N Tavernier, H.H Tchandeu, A.A Techy, J.J Thelu, A.A Thevenin, H.H Thiebault, J.J Thomas, J.J Thorel, C.C Thuillier, G.G Tielman, M.M Tode, J.J Toisin, J.J Tonnel, J.J Touchais, P.P Toumelin, Y.Y Touze, J.J Tranvouez, C.C Triplet, N.N Triki, D.D Turck, S.S Uhlen, E.E Vaillant, C.C Valmage, D.D Vanco, N.N Vandaele-Bertiaux, H.H Vandamme, E.E Vanderbecq, E.E Vander Eecken, P.P Vandermolen, P.P Vandevenne, L.L Vandeville, A.A Vandewalle, C.C Vandewalle, P.P Vaneslander, J.J Vanhoove, A.A Vanrenterghem, C.C Vanveuren, P.P Varlet, I.I Vasies, G.G Verbiese, J.J Verlynde, G.G Vernier-Massouille, P.P Vermelle, C.C Verne, P.P Vezilier-Cocq, B.B Vigneron, M.M Vincendet, J.J Viot, Y.Y Voiment, A.A Wacrenier, L.L Waeghemaecker, J.J Wallez, M.M Wantiez, F.F Wartel, J.J Weber, J.J Willocquet, N.N Wizla, E.E Wolschies, O.O Zaharia, S.S Zaoui, A.A Zalar, B.B Zaouri, A.A Zellweger, C.C Ziade, L.L Beaugerie, M.M Allez, F.F Ruemmele, A.A Lamer and M.M Roy. A Novel 8-Predictors Signature to Predict Complicated Disease Course in Pediatric-onset Crohn’s Disease: A Population-based Study.Inflammatory Bowel DiseasesJune 2023HAL DOI back to text
29 articleA.Antonin Schrab, I.Ilmun Kim, M.Mélisande Albert, B.Béatrice Laurent, B.Benjamin Guedj and A.Arthur Gretton. MMD Aggregated Two-Sample Test.Journal of Machine Learning Research24194June 2023, 1-81HAL
30 articleN.Ning Sun, Z.Zoran Bursac, I.Ian Dryden, R.Roberto Lucchini, S.Sophie Dabo-Niang and B.Boubakari Ibrahimou. Bayesian spatiotemporal modelling for disease mapping: an application to preeclampsia and gestational diabetes in Florida, United States.Environmental Science and Pollution Research3050September 2023, 109283-109298HAL DOI back to text
31 articleR. T.Ringo Thomas Tchouya, S.Stefano Nasini and S.Sophie Dabo-Niang. An estimation approach for the influential–imitator diffusion.Computers and Operations Research159November 2023, 106315HAL DOI back to text

International peer-reviewed conferences

32 inproceedingsC.Christophe Biernacki, C.Claire Boyer, G.Gilles Celeux, J.Julie Josse, F.Fabien Laporte, M.M Marbac, A.Aude Sportisse and V.Vincent Vandewalle. Impact of missing data on mixtures and clustering with illustrations in Biology and Medicine.SPSR 2023 - The 24th annual Conference of the Romanian Society of Probability and StatisticsBucarest, RomaniaApril 2023HAL back to text back to text
33 inproceedingsC.Christophe Biernacki. Digital Science for Disability Overview of the initiatives undertaken by Inria.Inria/IIT DELHI workshopNew delhi, IndiaOctober 2023HAL back to text
34 inproceedingsC.C Biernacki. Levels Merging in the Latent Class Model.Statistical Learning Sustainability and Impact EvaluationAncona (IT), ItalyJune 2023HAL back to text
35 inproceedingsC.Christophe Biernacki. Levels Merging in the Latent Class Model.CFE-CMStatistics 2023Berlin (Germany), GermanyDecember 2023HAL back to text back to text
36 inproceedingsC.Christophe Biernacki, V.Vincent Vandewalle and M.Matthieu Marbac. Clustering: from modeling to visualizing Mapping clusters as spherical Gaussians.SFC 2023 - Rencontres de la Société Francophone de ClassificationRencontres de la Société Francophone de ClassificationStrasbourg, FranceJuly 2023HAL back to text back to text
37 inproceedingsF.Felix Biggs and B.Benjamin Guedj. Tighter PAC-Bayes Generalisation Bounds by Leveraging Example Difficulty.AISTATS 2023 - 26th International Conference on Artificial Intelligence and StatisticsValencia, SpainOctober 2022HAL
38 inproceedingsC.C Boinay, C.C Biernacki and C.C Preda. Graphs in OT Testing Graph Abnormality Application to a Real OT Data Set Ongoing & Future Works References.54e Journées de StatistiqueBruxelles, BelgiumJuly 2023HAL back to text
39 inproceedingsP.Paul Viallard, M.Maxime Haddouche, U.Umut Şimşekli and B.Benjamin Guedj. Learning via Wasserstein-Based High Probability Generalisation Bounds.NeurIPS 2023 - Thirty-seventh Conference on Neural Information Processing SystemsNew Orleans, United StatesJune 2023HAL DOI
40 inproceedingsP.Paul Viallard, M.Maxime Haddouche, U.Umut Şimşekli and B.Benjamin Guedj. Learning via Wasserstein-Based High Probability Generalisation Bounds.NeurIPS 2023 Workshop on Optimal Transport and Machine Learning (OTML'23)New Orleans, United StatesDecember 2023HAL back to text

National peer-reviewed Conferences

41 inproceedingsV.Violaine Courrier, C.Christophe Biernacki, C.Cristian Preda and B.Benjamin Vittrant. Comparative study of clustering models for multivariate time series from connected medical devices.EGC 2024 - 24ème Conférence Francophone sur l'Extraction et Gestion des ConnaissancesDijon, FranceJanuary 2024HAL back to text

Conferences without proceedings

42 inproceedingsC.Christophe Biernacki, M.Matthieu Marbac and V.Vincent Vandewalle. Clustering: from modeling to visualizing Mapping clusters as spherical Gaussians.Working Group on Model-Based ClusteringPittsburg, Etats-Unis, United StatesJuly 2023HAL back to text back to text
43 inproceedingsC.Cristian Preda and Q.Quentin Grimonprez. Linear approximation for multivariate categorical functional data analysis.THE 24th CONFERENCE of the ROMANIAN SOCIETY of PROBABILITY and STATISTICSBucharest, RomaniaApril 2023HAL back to text
44 inproceedingsC.Cristian Preda. Learning with categorical functional data.The Tenth Congress of Romanian Mathematicians, 2023Pitesti, RomaniaJune 2023HAL

Doctoral dissertations and habilitation theses

45 thesisI.-A.Issam-Ali Moindjié. Linear models for multivariate functional data.Université de LilleDecember 2023HAL back to text

Reports & preprints

46 miscF.Felix Biggs, A.Antonin Schrab and A.Arthur Gretton. MMD-FUSE: Learning and Combining Kernels for Two-Sample Testing Without Data Splitting.June 2023HAL
47 miscE.Eugenio Clerico and B.Benjamin Guedj. A note on regularised NTK dynamics with an application to PAC-Bayesian training.December 2023HAL back to text
48 miscM.Maxime Haddouche and B.Benjamin Guedj. Wasserstein PAC-Bayes Learning: Exploiting Optimisation Guarantees to Explain Generalisation.April 2023HAL back to text
49 miscM.Maxime Haddouche, B.Benjamin Guedj and O.Olivier Wintenberger. Optimistic Dynamic Regret Bounds.January 2023HAL back to text
50 miscF.Fredrik Hellström, G.Giuseppe Durisi, B.Benjamin Guedj and M.Maxim Raginsky. Generalization Bounds: Perspectives from Information Theory and PAC-Bayes.September 2023HAL back to text
51 miscF.Fredrik Hellström and B.Benjamin Guedj. Comparing Comparators in Generalization Bounds.October 2023HAL back to text
52 miscP.Pierre Jobic, M.Maxime Haddouche and B.Benjamin Guedj. Federated Learning with Nonvacuous Generalisation Bounds.October 2023HAL back to text
53 miscI.Ilmun Kim and A.Antonin Schrab. Differentially Private Permutation Tests: Applications to Kernel Methods.October 2023HAL
54 miscE.Etienne Krönert, A.Alain Célisse and D.Dalila Hattab. FDR control for Online Anomaly Detection.December 2023HAL back to text
55 miscI.-A.Issam-Ali Moindjié, C.Cristian Preda and S.Sophie Dabo-Niang. Fusion regression methods with repeated functional data.November 2023HAL back to text
56 miscA.Antonin Schrab, W.Wittawat Jitkrittum, Z.Zoltán Szabó, D.Dino Sejdinovic and A.Arthur Gretton. Discussion of `Multiscale Fisher's Independence Test for Multivariate Dependence'.November 2023HAL
57 miscA.Aude Sportisse, M.Matthieu Marbac, F.Fabien Laporte, G.Gilles Celeux, C.Claire Boyer, C.Christophe Biernacki and J.Julie Josse. Accompanying note : Model-based Clustering with Missing Not At Random Data.December 2023HAL back to text back to text
58 miscA.Aude Sportisse, M.Matthieu Marbac, F.Fabien Laporte, G.Gilles Celeux, C.Claire Boyer, J.Julie Josse and C.Christophe Biernacki. Model-based Clustering with Missing Not At Random Data.December 2023HAL
59 miscH.Hemant Tyagi and D.Denis Efimov. Learning linear dynamical systems under convex constraints.August 2023HAL back to text
60 miscE. A.Ernesto Araya Valdivia and H.Hemant Tyagi. Graph Matching via convex relaxation to the simplex.November 2023HAL back to text

Other scientific publications

61 articleJ.Julie Dubois‐chevalier, C.Céline Gheeraert, A.Alexandre Berthier, C.Clémence Boulet, V.Vanessa Dubois, L.Loïc Guille, M.Marie Fourcot, G.Guillemette Marot, K.Karine Gauthier, L.Laurent Dubuquoy, B.Bart Staels, P.Philippe Lefebvre and J.Jérôme Eeckhoute. An extended transcription factor regulatory network controls hepatocyte identity.EMBO ReportsJuly 2023HAL DOI

11.3 Other

Educational activities

62 unpublishedC.Christophe Biernacki. Clustering : une vision unifiée pour une utilisation éclairée - Partie 1 : Méthodes exploratoires en clustering.June 2023, DoctoralFranceHAL back to text
63 unpublishedC.C Biernacki. Clustering : une vision unifiée pour une utilisation éclairée - Partie 2 Evaluation d'une méthode de clustering.June 2023, DoctoralFranceHAL back to text
64 unpublishedC.C Biernacki. Clustering : une vision unifiée pour une utilisation éclairée - Partie 3 : Formalisation par modèles de mélange.June 2023, DoctoralFranceHAL back to text
65 unpublishedC.C Biernacki. Clustering : une vision unifiée pour une utilisation éclairée - Partie 4 : Traitement de la grande dimension & co-clustering.June 2023, DoctoralFranceHAL

11.4 Cited publications

66 articleC. J.Christelle Judith Agonkoui, F. D.Freedath Djibril Moussa and S.-D.Sophie-Dabo Niang. Multivariate functional principal component analysis for endogenously stratified data.Afrika Statistika174October 2022, 3321-337HAL DOI back to text
67 inproceedingsI.Idris Si-Ahmed, M.Mazamaesso Azeyou, L.Leila Hamdad and S.Sophie Dabo-Niang. Functional, Multivariate Functional and Spatial PCA: Application to Covid-19 Data in the African Continent.12th International Conference on Information Systems and Advanced Technologies ICISAT 2022624Lecture Notes in Networks and SystemsVirtual conference, FranceSpringer International PublishingAugust 2022, 318-328HAL DOI back to text

MODAL - 2023

MODAL - 2023

2023Activity reportProject-TeamMODAL

Keywords

Computer Science and Digital Science

Other Research Topics and Application Domains

1 Team members, visitors, external collaborators

Research Scientists

Faculty Members

Post-Doctoral Fellow

PhD Students

Technical Staff

Interns and Apprentices

Administrative Assistant

External Collaborator

2 Overall objectives

2.1 Context

2.2 Goals

3 Research program

3.1 Research axis 1: Unsupervised learning

3.2 Research axis 2: Performance assessment

3.3 Research axis 3: Functional data

3.4 Research axis 4: Applications motivating research

4 Application domains

4.1 Economic world

4.2 Biology and health

5 Social and environmental responsibility

6 New software, platforms, open data

6.1 New software

6.1.1 MixtComp.V4

6.1.2 cfda

6.1.3 ClusPred

6.1.4 visCorVar

6.1.5 metaRNASeq

6.1.6 HDSpatialScan

6.1.7 MLGL

6.2 New platforms

6.2.1 MASSICCC Platform

7 New results

7.1 Axis 1: Co-clustering as a (very) parsimonious clustering

7.2 Axis 1: Dealing with Missing Data in Model-based Clustering through a MNAR Model

7.3 Axis 1: Gaussian-based Visualization of Gaussian and non-Gaussian Model-based Clustering

7.4 Axis 1: Levels Merging in the Latent Class Model

7.5 Axis 1: Comparative study of series clustering models multivariate temporal data from connected medical objects

7.6 Axis 1: Dynamic Ranking with the BTL Model: A Nearest Neighbor based Rank Centrality Method

7.7 Axis 1&2: Dynamic Ranking and Translation Synchronization

7.8 Axis 1&2: Minimax Optimal Clustering of Bipartite Graphs with a Generalized Power Method

7.9 Axis 1&2: Graph Matching via convex relaxation to the simplex

7.10 Axis 2: Learning linear dynamical systems under convex constraints

7.11 Axis 2: An estimation approach for the influential–imitator diffusion

7.12 Axis 2: k-nearest neighbors prediction and classification for spatial data

7.13 Axis 2: FDR control for Online Anomaly Detection

7.14 Axis 2: Optimistic Dynamic Regret Bounds

7.15 Axis 2: Wasserstein PAC-Bayes Learning: Exploiting Optimisation Guarantees to Explain Generalisation

7.16 Axis 2: Generalization Bounds: Perspectives from Information Theory and PAC-Bayes

7.17 Axis 2: Comparing Comparators in Generalization Bounds

7.18 Axis 2: Federated Learning with Nonvacuous Generalisation Bounds

7.19 Axis 2: Learning via Wasserstein-Based High Probability Generalisation Bounds

7.20 Axis 2: A note on regularised NTK dynamics with an application to PAC-Bayesian training

7.21 Axis 3: Investigating spatial scan statistics for multivariate functional data

7.22 Axis 3: On estimation and prediction in spatial functional linear regression model

7.23 Axis 3: Spatial Autocorrelation of Global Stock Exchanges Using Functional Areal Spatial Principal Component Analysis

7.24 Axis 3: Multivariate functional principal component analysis for endogenously stratified data

7.25 Axis 3: PLS regression approach for multivariate functional data with different domains

7.26 Axis 3: Group lasso regression for spatially dependent functional data

7.27 Axis 3: Linear approximation for multivariate categorical functional data

7.28 Axis 4: Multi-layer group Lasso

7.29 Axis 4: Research of biomarkers using penalised regressions

7.30 Axis 4: Research of biomarkers using competing risks models and clustering

7.31 Axis 4: Statistical analysis of proteomic data with empirical bayesian approaches

7.32 Axis 4: Testing Abnormality of a Sequence of Graphs: Application to Cybersecurity

7.33 Axis 4:Bayesian spatiotemporal modelling for disease mapping: an application to preeclampsia and gestational diabetes in Florida, United States

7.34 Axis 4: Structural Changes in Temperature and Precipitation in MENA Countries

7.35 Axis 4: Spatial Relative Risk of Upper Aerodigestive Tract Cancers Incidence in French Northern Region

7.36 Axis 4: Functional, Multivariate Functional and Spatial PCA: Application to Covid-19 Data in the African Continent

8 Bilateral contracts and grants with industry

8.1 Bilateral contracts with industry

Diagrams Technologies startup

Program France-Relance : MODAL-Alicante

ADULM