GEOSTAT is a research project which investigates the analysis of some classes of natural complex signals (physiological time series, turbulent universe and earth observation data sets) by determining, in acquired signals, the properties that are predicted by commonly admitted or new physical models best fitting the phenomenon. Consequently, when statistical properties discovered in the signals do not match closely enough those predicted by accepted physical models, we question the validity of existing models or propose, whenever possible, modifications or extensions of existing models. A new direction of research, based on the CONCAUST exploratory action and the newly accepted (in February 2021) associated team COMCAUSA proposed by N. Brodu with USA / UC Davis, Complexity Sciences Center, Physics Department is developped in the team.

An important aspect of the methodological approach is that we don't rely on a predetermined "universal" signal processing model to analyze natural complex signals. Instead, we take into consideration existing approaches in nonlinear signal processing (wavelets, multifractal analysis tools such as log-cumulants or micro-canonical multifractal formalism, time frequency analysis etc.) which are used to determine the micro structures or other micro features inside the acquired signals. Then, statistical analysis of these micro data are determined and compared to expected behaviour from theoretical physical models used to describe the phenomenon from which the data is acquired. From there different possibilities can be contemplated:

GEOSTAT is a research project in nonlinear signal processing which develops on these considerations: it considers the signals as the realizations of complex extended dynamical systems. The driving approach is to describe the relations between complexity (or information content) and the geometric organization of information in a signal. For instance, for signals which are acquisitions of turbulent fluids, the organization of information may be related to the effective presence of a multiscale hierarchy of coherent structures, of multifractal nature, which is strongly related to intermittency and multiplicative cascade phenomena ; the determination of this geometric organization unlocks key nonlinear parameters and features associated to these signals; it helps understand their dynamical properties and their analysis. We use this approach to derive novel solution methods for super-resolution and data fusion in Universe Sciences acquisitions . Specific advances are obtained in GEOSTAT in using this type of statistical/geometric approach to get validated dynamical information of signals acquired in Universe Sciences, e.g. Oceanography or Astronomy. The research in GEOSTAT encompasses nonlinear signal processing and the study of emergence in complex systems, with a strong emphasis on geometric approaches to complexity. Consequently, research in GEOSTAT is oriented towards the determination, in real signals, of quantities or phenomena, usually unattainable through linear methods, that are known to play an important role both in the evolution of dynamical systems whose acquisitions are the signals under study, and in the compact representations of the signals themselves.

Signals studied in GEOSTAT belong to two broad classes:

Every signal conveys, as a measure experiment, information on the physical system whose signal is an acquisition of. As a consequence, it seems natural that signal analysis or compression should make use of physical modelling of phenomena: the goal is to find new methodologies in signal processing that goes beyond the simple problem of interpretation. Physics of disordered systems, and specifically physics of (spin) glasses is putting forward new algorithmic resolution methods in various domains such as optimization, compressive sensing etc. with significant success notably for NP hard problem heuristics. Similarly, physics of turbulence introduces phenomenological approaches involving multifractality. Energy cascades are indeed closely related to geometrical manifolds defined through random processes. At these structures’ scales, information in the process is lost by dissipation (close to the lower bound of inertial range). However, all the cascade is encoded in the geometric manifolds, through long or short distance correlations depending on cases. How do these geometrical manifold structures organize in space and time, in other words, how does the scale entropy cascades itself ? To unify these two notions, a description in term of free energy of a generic physical model is sometimes possible, such as an elastic interface model in a random nonlinear energy landscape : This is for instance the correspondence between compressible stochastic Burgers equation and directed polymers in a disordered medium. Thus, trying to unlock the fingerprints of cascade-like structures in acquired natural signals becomes a fundamental problem, from both theoretical and applicative viewpoints.

The research described in this section is a collaboration effort of GEOSTAT, CNRS LEGOS (Toulouse), CNRS LAM (Marseille Laboratory for Astrophysics), MERCATOR (Toulouse), IIT Roorkee, Moroccan Royal Center for Teledetection (CRST), Moroccan Center for Science CNRST, Rabat University, University of Heidelberg. Researchers involved:

The analysis and modeling of natural phenomena, specially those observed in geophysical sciences and in astronomy, are influenced by statistical and multiscale phenomenological descriptions of turbulence; indeed these descriptions are able to explain the partition of energy within a certain range of scales. A particularly important aspect of the statistical theory of turbulence lies in the discovery that the support of the energy transfer is spatially highly non uniform, in other terms it is intermittent. Because of the absence of localization of the Fourier transform, linear methods are not successful to unlock the multiscale structures and cascading properties of variables which are of primary importance as stated by the physics of the phenomena. This is the reason why new approaches, such as DFA (Detrented Fluctuation Analysis), Time-frequency analysis, variations on curvelets etc. have appeared during the last decades. Recent advances in dimensionality reduction, and notably in Compressive Sensing, go beyond the Nyquist rate in sampling theory using nonlinear reconstruction, but data reduction occur at random places, independently of geometric localization of information content, which can be very useful for acquisition purposes, but of lower impact in signal analysis. We are successfully making use of a microcanonical formulation of the multifractal theory, based on predictability and reconstruction, to study the turbulent nature of interstellar molecular or atomic clouds. Another important result obtained in GEOSTAT is the effective use of multiresolution analysis associated to optimal inference along the scales of a complex system. The multiresolution analysis is performed on dimensionless quantities given by the singularity exponents which encode properly the geometrical structures associated to multiscale organization. This is applied successfully in the derivation of high resolution ocean dynamics, or the high resolution mapping of gaseous exchanges between the ocean and the atmosphere; the latter is of primary importance for a quantitative evaluation of global warming. Understanding the dynamics of complex systems is recognized as a new discipline, which makes use of theoretical and methodological foundations coming from nonlinear physics, the study of dynamical systems and many aspects of computer science. One of the challenges is related to the question of emergence in complex systems: large-scale effects measurable macroscopically from a system made of huge numbers of interactive agents , . Some quantities related to nonlinearity, such as Lyapunov exponents, Kolmogorov-Sinai entropy etc. can be computed at least in the phase space . Consequently, knowledge from acquisitions of complex systems (which include complex signals) could be obtained from information about the phase space. A result from F. Takens about strange attractors in turbulence has motivated the theoretical determination of nonlinear characteristics associated to complex acquisitions. Emergence phenomena can also be traced inside complex signals themselves, by trying to localize information content geometrically. Fundamentally, in the nonlinear analysis of complex signals there are broadly two approaches: characterization by attractors (embedding and bifurcation) and time-frequency, multiscale/multiresolution approaches.
In real situations, the phase space associated to the acquisition of a complex phenomenon is unknown. It is however possible to relate, inside the signal's domain, local predictability to local reconstruction and to deduce relevant information associated to multiscale geophysical signals . A multiscale organization is a fundamental feature of a complex system, it can be for example related to the cascading properties in turbulent systems. We make use of this kind of description when analyzing turbulent signals: intermittency is observed within the inertial range and is related to the fact that, in the case of FDT (fully developed turbulence), symmetry is restored only in a statistical sense, a fact that has consequences on the quality of any nonlinear signal representation by frames or dictionaries.

The example of FDT as a standard "template" for developing general methods that apply to a vast class of complex systems and signals is of fundamental interest because, in FDT, the existence of a multiscale hierarchy critical exponents which explain the macroscopic properties of a system around critical points, and the quantitative characterization of universality classes, which allow the definition of methods and algorithms that apply to general complex signals and systems, and not only turbulent signals: signals which belong to a same universality class share common statistical organization. During the past decades, canonical approaches permitted the development of a well-established analogy taken from thermodynamics in the analysis of complex signals: if

The team is working on a new class of models for modeling physical systems, starting from measured data and accounting for their dynamics . The idea is to statistically describe the evolution of a system in terms of causally-equivalent states; states that lead to the same predictions . Transitions between these states can be reconstructed from data, leading to a theoretically-optimal predictive model . In practice, however, no algorithm is currently able to reconstruct these models from data in a reasonable time and without substantial discrete approximations. Recent progress now allows a continuous formulation of predictive causal models. Within this framework, more efficient algorithms may be found. The broadened class of predictive models promises a new perspective on structural complexity in many applications.

Phonetic and sub-phonetic analysis: We developed a novel algorithm for automatic detection of Glottal Closure Instants (GCI) from speech signals using the Microcanonical Multiscale Formalism (MMF). This state of the art algorithm is considered as a reference in this field. We made a Matlab code implementing it available to the community (). Our approach is based on the Microcanonical Multiscale Formalism. We showed that in the case of clean speech, our algorithm performs almost as well as a recent state-of-the-art method. In presence of different types of noises, we showed that our method is considerably more accurate (particularly for very low SNRs). Moreover, our method has lower computational times does not rely on an estimate of pitch period nor any critical choice of parameters. Using the same MMF, we also developed a method for phonetic segmentation of speech signal. We showed that this method outperforms state of the art ones in term of accuracy and efficiency.

Pathological speech analysis and classification: we made a critical analysis of some widely used methodologies in pathological speech classification. We then introduced some novel methods for extracting some common features used in pathological speech analysis and proposed more robust techniques for classification.

Speech analysis of patients with Parkinsonism: with our collaborators from the Czech Republic, we started preliminary studies of some machine learning issues in the field essentially due the small amount of training data.

Data are often acquired at the highest possible resolution, but that scale is not necessarily the best for modeling and understanding the system from which data was measured. The intrinsic properties of natural processes do not depend on the arbitrary scale at which data is acquired; yet, usual analysis techniques operate at the acquisition resolution. When several processes interact at different scales, the identification of their characteristic scales from empirical data becomes a necessary condition for properly modeling the system. A classical method for identifying characteristic scales is to look at the work done by the physical processes, the energy they dissipate over time. The assumption is that this work matches the most important action of each process on the studied natural system, which is usually a reasonable assumption. In the framework of time-frequency analysis , the power of the signal can be easily computed in each frequency band, itself matching a temporal scale.

However, in open and dissipating systems, energy dissipation is a prerequisite and thus not necessarily the most useful metric to investigate. In fact, most natural, physical and industrial systems we deal with fall in this category, while balanced quasi-static assumptions are practical approximation only for scales well below the characteristic scale of the involved processes. Open and dissipative systems are not locally constrained by the inevitable rise in entropy, thus allowing the maintaining through time of mesoscopic ordered structures. And, according to information theory , more order and less entropy means that these structures have a higher information content than the rest of the system, which usually gives them a high functional role.

We propose to identify characteristic scales not only with energy dissipation, as usual in signal processing analysis, but most importantly with information content. Information theory can be extended to look at which scales are most informative (e.g. multi-scale entropy ,

Building on these notions, it should also possible to fully automate the modeling of a natural system. Once characteristic scales are found, causal relationships can be established empirically. They are then clustered together in internal states of a special kind of Markov models called

This research topic involves Geostat team and is used to set up an InnovationLab with

Sparsity can be used in many ways and there exist various sparse models in the literature; for instance minimizing the

We have shown that the two powerful concepts of sparsity and scale invariance can be exploited to design fast and efficient imaging algorithms. A general framework has been set up for using non-convex sparsity by applying a first-order approximation. When using a proximal solver to estimate a solution of a sparsity-based optimization problem, sparse terms are always separated in subproblems that take the form of a proximal operator. Estimating the proximal operator associated to a non-convex term is thus the key component to use efficient solvers for non-convex sparse optimization. Using this strategy, only the shrinkage operator changes and thus the solver has the same complexity for both the convex and non-convex cases. While few previous works have also proposed to use non-convex sparsity, their choice of the sparse penalty is rather limited to functions like the

Edge aware smoothing: given an input image

where

We solve sub-problem

GeoStat is participating in the Covid-19 Inria mission: : Vocal biomarkers of respiratory diseases.

Observations of the interstellar medium (ISM) show a complex density and velocity structure, which is in part attributed to turbulence. Consequently, the multifractal formalism should be applied to observation maps of the ISM in order to characterize its turbulent and multiplicative cascade properties. However, the multifractal formalism, even in its more advanced and recent canonical versions, requires a large number of realizations of the system, which usually cannot be obtained in astronomy. We present a self-contained introduction to the multifractal formalism in a "microcanonical" version, which allows us, for the first time, to compute precise turbulence characteristic parameters from a single observational map without the need for averages in a grand ensemble of statistical observables (e.g., a temporal sequence of images). We compute the singularity exponents and the singularity spectrum for both observations and magnetohydrodynamic simulations, which include key parameters to describe turbulence in the ISM. For the observations we focus on the 250 µm Herschel map of the Musca filament. Scaling properties are investigated using spatial 2D structure functions, and we apply a two-point log-correlation magnitude analysis over various lines of the spatial observation, which is known to be directly related to the existence of a multiplicative cascade under precise conditions. It reveals a clear signature of a multiplicative cascade in Musca with an inertial range from 0.05 to 0.65 pc. We show that the proposed microcanonical approach provides singularity spectra that are truly scale invariant, as required to validate any method used to analyze multifractality. The obtained singularity spectrum of Musca, which is sufficiently precise for the first time, is clearly not as symmetric as usually observed in log-normal behavior. We claim that the singularity spectrum of the ISM toward Musca features a more log-Poisson shape. Since log-Poisson behavior is claimed to exist when dissipation is stronger for rare events in turbulent flows, in contrast to more homogeneous (in volume and time) dissipation events, we suggest that this deviation from log-normality could trace enhanced dissipation in rare events at small scales, which may explain, or is at least consistent with, the dominant filamentary structure in Musca. Moreover, we find that subregions in Musca tend to show different multifractal properties: While a few regions can be described by a log-normal model, other regions have singularity spectra better fitted by a log-Poisson model. This strongly suggests that different types of dynamics exist inside the Musca cloud. We note that this deviation from log-normality and these differences between subregions appear only after reducing noise features, using a sparse edge-aware algorithm, which have the tendency to "log-normalize" an observational map. Implications for the star formation process are discussed. Our study establishes fundamental tools that will be applied to other galactic clouds and simulations in forthcoming studies.

Publication: Astronomy & Astrophysics, . This publication has been selected in the "Highlights" section of Astronomy & Astrophysics.

Concaust Exploratory Action

.

The associate team Comcausa was created as part of the Inria@SiliconValley international lab, between Inria Geostat and the Complexity Sciences Center at University of California, Davis. This team is managed by Nicolas Brodu (Inria) and Jim Crutchfield (UC Davis) and the full list of collaborators is given on the web site. We organized

in which we invited team members and external researchers to present their results. This online seminars series was a federative moment during the covid lockdowns and fostered new collaborations. Additional funding was obtained (co-PIs Nicolas Brodu, Jim Crutchfield, Sarah Marzen) from the Templeton Foundation in the form of 2×1 years post-doctorate, to work on bioacoustic signatures in whale communication signals. We recruited Alexandra Jurgens for one year, renewable, on an exploratory topic of research: seeking new methods for inferring how much information is being transferred at every scale in a signal. Nicolas Brodu is actively co-supervising her on this program, which she may pursue at Inria in fall 2022 on the Concaust exploratory action post-doc budget. More preliminary results from this associate team were obtained on CO and water flux in₂ ecosystems (collaboration between Nicolas Brodu, Yao Liu and Adam Rupe). Nicolas Brodu presented these at the yearly meeting of the ICOS network on monitoring stations, run mostly by INRAE. This in turn lead to the writing of a proposal for the joint Inria-INRAE « Agroécologie et numérique » PEPR, which passed the pre-selection phase in December 2021: this project is being proposed, jointly with a partner at INRAE, as one of the 10 flagship projects retained for the round 1 of this PEPR. The final decision for whether this PEPR will be funded or not will be made in 2022. Similarly, preliminary results from the El Niño data (collaboration between Nicolas Brodu and Luc Bourrel), have lead to the submission of an ANR proposal. This ANR funding would allow us to extend the work of the post-doctorate researcher which we will co-supervise on the Concaust budget. The Associated Team budget of 2021 could only be partially used as travels were restricted for most of the year. An extensive lab tour was still possible (Nov.-Dec. 2021), where Nicolas Brodu has met with most US associate team members. This tour was scientifically fruitful and we are currently preparing articles detailing our new results

In the early stage of disease, the symptoms of Parkinson's disease (PD) are similar to atypical Parkinsonian disorders (APD) such as Progressive Supranuclear Palsy (PSP) and Multiple System Atrophy (MSA). The early differential diagnosis between PD and APD and between APD groups is thus a very challenging task. It turns out that speech disorder is an early and common symptom to PD and APD. The goal of our research is to develop a digital biomarker based on speech analysis in order to assist the neurologists in their diagnosis. We identified distinctive speech features for discrimination between PD and MSA-P, the variant of MSA where Parkinsonism dominates. These features were inferred from French speaking patients and consist in the detection of distortions in the production of particular voiced consonants (plosives and fricatives). We also continued of work on differential diagnosis between PSP and MSA Czech speaking patients. We designed two composite speech indices based on two categorizations of speech features, production subsystems and dysarthria type. These new indices led to high accuracy discrimination not only between PSP and MSA but also between MSA/PSP and PD. These results made us confident to step on to the second clinical phase of the Voice4PD-MSA project, the validation phase, thought we didn’t reach the expected number of patient inclusions in the first phase (because of the pandemic). The clinical protocol of this second phase will be submitted to CPP in January 2022.

Publications: PhD thesis (B. Das ), Proc. Interspeech’ 2021 (K. Daoudi et al.).

GeoStat continues its contribution to the Covid-19 mission of Inria via the VocaPnée project in partnership with AP-HP and co-directed by K. Daoudi and Thomas Similowski, responsible for the pulmonology and resuscitation service at La Pitié-Salpêtrière hospital and UMR-S 1158. The objective of the VocaPnée project is to bring together skills available at Inria to develop and validate a vocal biomarker for the remote monitoring of patients at home suffering from acute respiratory diseases (such as Covid-19) or chronic one (such as asthma). This biomarker will then be integrated into a telemedicine platform, ORTIF or COVIDOM for Covid, to assist the doctors in assessing the patient's respiratory status. VocaPnée is divided into 2 longitudinal clinical studies, a hospital study, as a proof of concept, followed by another in tele-medicine. The former received the clearance of CPP and we are preparing the file for the CNIL clearance. In this context, the ADT project VocaPy started in November for a 2 years duration. The goal of VocaPy is to develop a Python library dedicated to pathological speech analysis and vocal biomarkers conception. VocaPy works in synergy with the ADT VocaPnée-Infra dedicated to the development of an infrastructure to interact with protected health data.

Hyperspectral images are corrupted by a combination of Gaussian-impulse noise. On one hand, the traditional approach of handling the denoising problem using maximum a posteriori criterion is often restricted by the time-consuming iterative optimization process and design of hand-crafted priors to obtain an optimal result. On the other hand, the discriminative learning-based approaches offer fast inference speed over a trained model; but are highly sensitive to the noise level used for training. A discriminative model trained with a loss function which does not accord with the Bayesian degradation process often leads to sub-optimal results. In this paper, we design the training paradigm emphasizing the role of loss functions; similar to as observed in model-based optimization methods. As a result; loss functions derived in Bayesian setting and employed in neural network training boosts the denoising performance. Extensive analysis and experimental results on synthetically corrupted and real hyperspectral dataset suggest the potential applicability of the proposed technique under a wide range of homogeneous and heterogeneous noisy settings.

The InnovationLab with I2S is extended one year starting 1st February 2021.

During the Inria-i2s Innovationlab partnership (2017-2020), several iterative optimization-based image processing algorithms were developed. Such methods are able to provide significant image quality improvements with respect to the previous i2s image processing methods. However, such algorithms seek the minimum of a mathematical function iteratively. In some cases, their convergence can be significantly slow and demand an execution time prohibitive for production goals.

This motivated a deep-learning approach in order to reduce computational time. We called this approach Emulation. A convolutional neural network (CNN) is designed and trained it to reproduce the results obtained by the optimization approach. This makes it possible to take advantage of the quality level of an expensive iterative optimization algorithm at a lower computational cost and shorter execution time.

Iterative and deep learning methods make both use of numerical optimization to solve a mathematical problem. In the case of deep learning, however, the optimization takes places during training, which is performed offline. Once trained, a CNN can be used online to provide a fast execution time to process an image (which is commonly referred to as inference). This approach was shown to be effective and allowed comparable image quality levels along with a reduction in execution time 5 times shorter.

A. Rashidi has pursued the implementation of deconvolution and denoising algorithms within the algorithms of I2S. A fast image deconvolution algorithm is used to demonstrate the resolution enhancement of video rate camera acquired Terahertz images. Our algorithm is based on variable splitting technique with the use of a family of sparsity inducing regularizers for the first time in an image deconvolution application, it is also suitable for practical applications in industry with computationally constrained conditions. The results of the proposed process provide substantial enhancement on the quality and resolution of THz images.

A first work concerned the classical cameras of I2S which is called "Eagle".

Industrial Quality assessment of the images and results showed effective and encouraging results. The images showed better image quality contrast.

Following on, the super-resolution algorithm was integrated into the new cameras of I2S. The new camera is called "Xtra". we are currently in the stage of verifying the effectiveness of the results and their quality with standard measurement methods.

Publication: 46th International Conference on Infrared, Millimeter, and Terahertz Waves,

InnovationLab with I2S company, starting scheduled after 1st 2019 COPIL in January 2019. This InnovationLab is extended one year starting February 2021.

GENESIS Project (Geostat, Laboratoire d'Astrophysique de Bordeaux, Physics Inst. (Köln University) 5-year contract, 2017-2022 (initially 3 years, extented).

CovidVoice project: Inria Coind-19 mission. The CovidVoice project evolved into the VocaPnée project in partnership with AP-HP and co-directed by K. Daoudi and Thomas Similowski, responsible for the pulmonology and resuscitation service at La Pitié-Salpêtrière hospital and UMR-S 1158. The objective of the VocaPnée project is to bring together all the skills available at Inria to develop and validate a vocal biomarker for the remote monitoring of patients at home suffering from an acute respiratory disease (such as Covid) or chronic (such as asthma) . This biomarker will then be integrated into a telemedicine platform, ORTIF or COVIDOM for Covid, to assist the doctors in assessing the patient's respiratory status. VocaPnée is divided into 2 longitudinal pilot clinical studies, a hospital study and another in tele-medicine.

In this context, a voice data collection platform was developed by Inria's SED. This platform is used to collect data from healthy controls. It will then be migrated to the AP-HP servers to collect patient data.

The ADT (IA Plan) project VocaPy, led by K. Daoudi, started in November 2021 for a 2 years duration. The goal of VocaPy is to develop a Python library dedicated to pathological speech analysis and vocal biomarkers conception. Ms Zhe Li was recruited as an engineer on this project.

H. Yahia is a member of the editorial board of "Frontiers in Fractal Physiology" journal.

H. Yahia was a member of the PhD thesis jury of H. MAHAMAT. Thesis defended on March 30, 2021, 10.30 am, Université de Bourgogne, UFR Sciences et Techniques - BP 47870 21078 Dijon cedex.