EN FR
EN FR


Section: Application Domains

Data analysis and local regression

Our expertise in data analysis and advanced statistics methods has given raise no a wide number of interdisciplinary collaborations. Among those, here are the most challenging at a scientific level:

(i) Peanut allergy: In the recent past, a direct application of factorial analysis techniques has been concerned with a study about allergic patients. This project was focusing on allergies to peanut, and aimed at predicting the level of an allergic crisis according to some biological parameters. In this context, no rigorous discriminant analysis had been performed before, and the article [2] has been considered as an achievement in this direction.

(ii) Fetal pathology: An ongoing work concerning local regression techniques is related to Fetal Biometry, an investigation line suggested by a collaboration between our team and the Centre de Placentologie et Foetopathologie de la Maternité Régionale de Nancy, under the direction of Professor Bernard Foliguet. The methods involved in Fetal Biometry are usually based on the comparison of some measured values with the predicted values derived from reference charts or equations in a normal population. However, it happens that maternal and pregnancy characteristics have a significant influence on in-utero Fetal Biometry. We will thus produce some models allowing to construct customized fetal biometric size charts. In order to evaluate them, classical and polynomial regression can be used, but they are not the most appropriate to the kind data we have to handle. Hence, we plan to use local regression estimation in order to perform such an evaluation.

(iii) Cohorts analysis: Some medical teams in Nancy are faced with an overwhelming amount of data, for which a serious statistical assessment is needed. Among those let us mention the Stanislas cohort handled at the Centre Alexis Vautrin, which provides a huge amount of data potentially enabling a sharp identification of the biological characters involved in cardiovascular deceases. As in many instances in Biostatistics, one is then faced with a very high dimensional data, from which we hope to extract a reduced number of significant variables allowing to predict the cardiovascular risk accurately. Moreover, these characters should be meaningful to practitioners. The objective for us is thus to design an appropriate variable selection, plus a classification procedure in this demanding context.

Let also mention the starting collaboration with the INSERM team of Pr. Jean-Louis Guéant and the INRIA team Orpailleur (particularly with Marie-Dominique Desvignes and Malika Smail). The goal of this collaboration is to extract biological markers for different diseases (cognitive decline; inflammatory intestinal diseases; liver cancer). To this aim, the INSERM team provides us with several data cohorts with a high number of variables and subjects. As in the Stanislas cohort, the objective for us is to design an appropriate variable selection, plus a classification procedure in this demanding context. This work has the originality to combine our own techniques with those developed by the Orpailleur team, based on symbolic tools. We hope that this experience will enrich both points of view and give raise to new methods of data analysis.