PASTA is a joint research team between Inria - Nancy Grand Est, CNRS and University of Lorraine, located at Institut Élie Cartan of Lorraine.

PASTA aims to construct and develop new methods and techniques by promoting and interweaving stochastic modeling and statistical tools to integrate, analyze and enhance real data.

The specificity and the identity of PASTA are:

The leading direction of our research is to develop the topic of data enriched spatio-temporal stochastic models, through a mathematical perspective. Specifically, we jointly leverage major tools of probability and statistics: data analysis and the analytical study of stochastic processes. We aim at exploring the three different aspects, namely: shape, time and environment, of the same phenomenon. These mathematical methodologies will be intended for solving real-life problems through inter-disciplinary and industrial partnerships.

Our research program develops three interwoven axes:

In particular, we are interested in the evolution of stochastic dynamical systems evolving in intricate configuration spaces. These configuration spaces could be spatial positions, graphs, physical spaces with singularities, space of measures, space of chemical compounds, and so on.

While facing a new modeling question, we have to construct the appropriate class of models among
what we call the meta-models.
Meta-models and then models are selected according to the properties to be simulated or inferred
as well as the objectives to be reached.
Among other examples of such meta-models which we regularly use, let us mention
diffusion processes, Gibbs measures, multivariate time series and random graphs.
On these topics, the team has an intensive research experience from different perspectives.

Finding the balance between usability, interpretability and realism is our first guide. This is the keystone in modeling, and the main difference with black-box approaches in machine learning. Our second guide is to study the related mathematical issues in modeling, simulation and inference. Models are sources of interesting open mathematical questions. We are eager to expand the “capacity” of the models by exploring their mathematical properties, providing simulation algorithms or proposing more efficient ones, as well as new inference procedures with statistical guarantees.

To study and apply the class of stochastic models we consider, we have to handle the following questions:

Our main application domains are: insurance, geophysics, geology, medicine, astronomy and finance.

We aim at providing new tools regarding the modeling, simulation and inference of spatio-temporal stochastic processes and other dynamical random systems living in large state-spaces. As such, there are many application domains which we consider.

In particular, we have partnerships with practitioners in: cosmology, geophysics, healthcare systems, insurance, and telecom networks.

We detail below our actions in the most representative application domains.

Geophysics is a domain which requires the application of a broad range of mathematical tools related to probability and statistics while more and more data are collected. There are several domains in which we develop our methodology in relation with practitioners in the field.

On such topics, we hold long standing interdisciplinary collaborations with INRAE Grenoble, the RING Team (GeoRessources, University of Lorraine), IMAR (Institute of Mathematics of the Romanian Academy) in Bucharest. We also have the support of the interdisciplinary LUE Deepsurf project (University of Lorraine).

We have longstanding and continuous cooperation with astronomers and cosmologists in France, Spain and Estonia. In particular, we are interested in using the tools from spatial statistics to detect galaxies and other star patterns such as filaments detection. Such developments require us to design specific point processes giving appropriate morpho-statistical distributions, as well as specific inference algorithms which are based on Monte Carlo simulation and able to handle the large volume of data.

Graphs are essential to model complex systems such as the relations between agents, the spatial distribution of points that are connected such as stars, the connections in telecom networks, and so on. We develop various directions of the study of random graphs that are motivated by immediate applications:

We have longstanding collaborations on these topics with Agence de Biomédecine (ABM), Le Foyer (insurance company, Luxembourg), INRAE (Avignon), Dyogene (Inria Paris), Lip 6, UTC, LORIA (computer science laboratory, Nancy), University of Buenos Aires, Northwestern University and LAAS (CNRS, Toulouse).

A. Lejay was the main organizer and member of the scientific committee of the International Conference on Stochastic Pathwise Analysis held online in March 2022. Initially planned to take place at CIRM (Marseille), this online conference gathered around 90 participants, including the most prominent ones, over the topic of rough paths theory, regularity structures, and various applications including ones in data sciences.

We have a strong interest in the fragmentation equation for understanding snow or rock avalanches. Our point of view is to explore the probabilistic representations of transport equations in this framework as well as the possibilities they offer. The underlying stochastic process represents the typical evolution of the mass of a rock or of a snow aggregate subject to successive random breakages.

In particular, we have developed in 30 the connections between various probabilistic representations of the fragmentation equation.

We have also studied a probabilistic representation of the fragmentation equation with abrasion and developed suitable numerical Monte Carlo methods. This work is performed in an on-going collaboration with Caroline Le Bouteiller (INRAE) on the topic of rock fragmentation.

With Lucian Beznea (IMAR, Bucharest) and Oana Lupaşcu-Stamate (ISMMA, Bucharest), we developed a scaling property for the continuous time fragmentation processes related to a stochastic model for the fragmentation phase of an avalanche. We highlight in this framework numerical methods based on the stochastic differential equation of fragmentation and prove the fractal property of the solution 22.

With Oana Lupaşcu-Stamate (ISMMA, Bucharest) we are developing the asymptotic behavior of an avalanche in a particular sand-pile model. This is done by combining results for discrete random processes and numerical procedures introduced in our previous works.

Hawkes processes represent a common class of self-excited stochastic processes.

We have studied the use of Hawkes processes in the context of insurance. In particular, we built a recommendation system for insurance products based on individual probabilities of life events 32. This system is tested on the database from the insurance company Le Foyer (Luxembourg).

The numerical approximation of stochastic differential equations (SDEs) and in particular new methodologies to approximate hitting times of SDEs is a challenging problem which is important for a large class of practical issues such as: geophysics, finance, insurance, biology, etc.

With Samuel Herrmann (University of Burgundy) we made progress on this topic by developing new methods for the strong convergence and pathwise approximation of one-dimensional SDEs. In particular we developed a new technique for the path approximation of one-dimensional stochastic processes, more precisely the Brownian motion and families of stochastic differential equations sharply linked to the Brownian motion (usually known as L and G-classes). We are interested here in the

Together with Samuel Herrmann (University of Burgundy) and Cristina Zucca (University of Torino) we are working on the exact simulation of the hitting times of multi-dimensional diffusions with a grant from University of Burgundy.

With Denis Villemonais (IECL, University of Lorraine), we constructed an estimator of the stickyness parameter of the sticky Brownian motion and other general diffusion processes from high frequency observations. This work is based on the construction of suitable estimators for the local time and the occupation time. Besides, this work provides a construction of sticky stochastic differential equations.

With Renaud Marty (IECL, University of Lorraine), we are studying an invariant embedding problem, which consists in solving a differential equation whose initial and terminal conditions are linked by a linear relation. Using tools from the theory of rough paths, we consider Rough Differential Equations which extend ordinary differential equations driven by rough signals. In particular, we use our development in the context of equations driven by a Brownian motion while avoiding all the difficulties related to the use of anticipative stochastic calculus.

We are studying an expansion of the maximum likilhood estimator using formal series expansions. The aim of this work is to understand the lack of Gaussianity in the non-asymptotic regime. We apply this expansion to the estimator of the skewness parameter of a skew Brownian motion, whose asymptotic mixed normality is also proved.

In collaboration with Paolo Pigato (University Tor Vergata, Roma) we previously studied parameter estimation for the linear drift of the self Vasicek model which follows a two-regime Ornstein-Uhlenbeck dynamic. The model fits well the behavior in financial markets related to crisis periods. In addition we provided a test for detecting the presence of two regimes and in the later months we extended our results to multiple thresholds as well. These results will improve the paper under revision

36. After considering high frequency observations, we study new estimators for low frequency observations and the presence of several regimes.

We have made various advances in the analysis and optimization of stochastic matching models:

In an ongoing collaboration with Ohad Perry (Northwesten University, USA), we have obtained

(in-)stability criteria for parallel service systems with routing errors. These models have applications in the optimization of large call centers and server farms. These results have been published in 17.

In an ongoing collaboration with Mohamed Habib Diallo Aoudi and Vincent Robin (UTC), we have proposed a Markov exploration algorithm, coupled with the construction of the Configuration model, to represent a coupling algorithm on a large random graph, which is a typical model for

, dating websites and online advertisement. We obtain a tractable estimate of the matching coverage via fluid approximation, the accuracy of which is illustrated by extensive simulations, for various graph degree distributions. These results can be found in the submitted paper

25.

Following the research internship of Nicolas Lengert (now at University of Luxembourg) in 2020, we also investigate the dual approach of stochastic matching as the construction of maximal couplings in large labelled random graphs generated from a root compatibility graph.

In an ongoing collaboration with Eustache Besançon (Telecom Paristech), Laurent Decreusefond (Telecom Paristech) and Laure Coutin (Université Paul Sabatier), we investigate the speed of convergence in the functional Central Limit Theorem for Continuous Time Markov Chains (CTMC), by using stochastic analysis tools (Malliavin calculus and the Stein method). These results allow us to characterize the accuracy (and thereby the confidence interval) in diffusion approximations of many practical processes, as shown in the submitted paper

27.

In a collaboration with Anne Philippe (University of Nantes) and Lluis Hurtado-Gil (UPM, Madrid), we developed in

19a new algorithm for statistical inference and analysis of spatial patterns assumed to be realizations of Gibbs point processes. This approach has a general character and it contributes to the existing methods based on Approximate Bayesian Computation, by providing control properties of the proposed solution. Results on simulated data and real data are presented. The real data application fits an inhomogeneous area interaction point process to cosmological data. The obtained results validate two important aspects of the galaxy distribution in our universe: proximity of the galaxies from the cosmic filament network; and the territorial clustering at a given range of interaction.

In a cooperation with Lluis Hurtado-Gil (UPM, Madrid), and Vincent Martinez and Pablo Arnalte Mur (Observatori Astronòmic de la Universitat de València) we propose in

13a morphostatistical characterization of the galaxy distribution through spatial statistical modeling based on inhomogeneous Gibbs point processes. The galaxy distribution is supposed to exhibit two components. The first one is related to the major geometrical features exhibited by the observed galaxy field, here, its corresponding filamentary pattern. The second one is related to the interactions exhibited by the galaxies. Gibbs point processes are statistical models able to integrate these two aspects in a probability density, controlled by some parameters. Several such models are fitted to real observational data via the ABC shadow algorithm. This algorithm provides simultaneous parameter estimation and posterior-based inference, hence allowing the derivation of the statistical significance of the obtained results.

The huge amount of temporal data available nowadays in numerous scientific fields requires dedicated analysis and prediction methods. Stochastic temporal point processes are certainly one of the popular approaches available to model time series. While point processes have been successfully applied in many application domains, they need strong assumptions. For instance, the conditional intensity is often supposed to follow a particular parametric function, hence fixing

the structure of the event distribution: purely random or independent, clustered or regular. Recent papers investigate the use of models from machine learning dedicated to sequential event analysis, namely recurrent neural networks (RNN). These RNNs are expected to be versatile enough to automatically adapt to the data, without the need for

choosing the character of the event distribution. This work presents an introduction to the so-called neural point processes and discusses numerical experiments. In particular, the presented real data application considers seismic data from the Guadeloupe region.

This work was done during the internship of P.-A. Simon, co-directed by R. Stoica and F. Sur (Tangram project-team, Inria NGE). The paper was presented in the conference RING 2021 20.

We are extending our respective results on high frequency approximation of the local time of oscillating-skew-sticky diffusion processes. The purpose is to estimate the parameters and to model some critical behavior in financial markets related to crisis periods.

With Lucian Beznea (IMAR, Bucharest) and Oana Lupaşcu-Stamate (ISMMA, Bucharest) we are developing a stochastic approach for the two-dimensional Navier Stokes equation in a bounded domain. More precisely we consider the vorticity equation and construct a specific non-local branching process. This approach is new and can conduct to important advances as it will give also a new numerical algorithm if successful.

Lucian Beznea (IMAR Bucharest) visited the Pasta team for two weeks in September.

S. Mazzonetto is assistant professor, P. Moyal and R. Stoica are professors. They have full teaching duties with lectures at all the levels of the university. We mention here only lectures at master 1 and master 2 levels as well as responsabilities.

A. Lejay is a editor of the project Success Stories (AMIES and FSMP) dedicated to create 2-page sheets that present successful collaborations between industry and academia.

M. Deaconu gave an interview at the France Embassy in Bucharest and to the Institut Français de Roumanie – for the action Women in Sciences, 12 February 2021.

S. Mazzonetto participated in the organization of the exhibition “les mathématiques se conjuguent aussi au féminin’'.