Section: New Results
Regression and machine learning
Participants: E. Albuisson, R. Azaïs (Inria, Lyon), T. Bastogne, L. Batista, K. Duarte, S. Ferrigno, A. Gégout-Petit, P. Guyot, J.-M. Monnez, N. Sahki, S. Mézières
In the purpose to detect change of health state for lung-transplanted patient, we have begun to work on breakdowns in multivariate physiological signals. Based on the CUSUM statistics, we have used dynamical thresholds of detection [27]. A more general talk about statistical learning and connected patient was given in a workshop "Evaluation des objets en santé connectée" [35].
We consider the analysis of cardiomyocyte signals (cardiac cells) for the cardiotoxicity assessment of new pharmaceutical compounds in preclinical assays. The experimental data are either impedance signals measuring the contractility of cardiomyocytes [39], [4], field potential signals measuring their functionality or fluorescence signals measuring the activity of some ion channels such as calcium pumps (Ca2+). At this preclinical level, our main contribution is the estimation of important characteristics such the field potential duration [17] or the identification of cardiotoxic events such as the early-afterdepolarization.We have also developed new methods for the analysis of electrocardiograms at patient level and more precisely the estimation of parameters such as the RR and QT intervals in long and noisy signals provided by wearable sensors [24], [30], [23], [25], [29]. We also study the efficacy of a new biomarker in radiotherapy. The objective is to compute a score able to predict risk of radiosensitivity for patients in radiotherapy [19], [20].We are also developing a new method to characterize the potential interactions between nanoparticles and biological compounds of complex media such as blood. This new method aims at predicting risks on the biodistribution and toxicity of the nanoparticles [16], [36].
In [7], we present a methodology for constructing a short-term event risk score from an ensemble predictor using bootstrap samples, two different classification rules, logistic regression and linear discriminant analysis for mixed data, continuous or categorical, and random selections of variables into the construction of predictors. We establish a property of linear discriminant analysis for mixed data and define an event risk measure by an odds-ratio. This methodology is applied to heart failure patients on whom biological, clinical and medical history variables were measured and the results obtained from our data are detailed.
The study [8] addresses the problem of sequential least square multidimensional linear regression, particularly in the case of a data stream, using a stochastic approximation process. To avoid the phenomenon of numerical explosion which can be encountered and to reduce the computing time in order to take into account a maximum of arriving data, we propose using a process with online standardized data instead of raw data and the use of several observations per step or all observations until the current step. Herein, we define and study the almost sure convergence of three processes with online standardized data: a classical process with a variable step-size and use of a varying number of observations per step, an averaged process with a constant step-size and use of a varying number of observations per step, and a process with a variable or constant step-size and use of all observations until the current step. Their convergence is obtained under more general assumptions than classical ones. These processes are compared to classical processes on 11 datasets for a fixed total number of observations used and thereafter for a fixed processing time. Analyses indicate that the third-defined process typically yields the best results.
Many articles were devoted to the problem of estimating recursively the
eigenvectors and eigenvalues in decreasing order of the expectation of a
random matrix using an i.i.d. sample of it. In [43], we make the
following contributions. The convergence of a normed process is proved under
more general assumptions: the random matrices are not supposed i.i.d. and a
new data mini-batch or all data until the current step are taken into
account at each step without storing them; three types of processes are
studied; this is applied to online principal component analysis of a data
stream, assuming that data are realizations of a random vector
Let
In epidemiology, we are working with clinicians to study fetal development in the last two trimesters of pregnancy. We have data from the "Service de foetopathologie et de placentologie" of the "Maternité Régionale Universitaire" (CHU Nancy) and from the EDEN cohort (INSERM). We propose to use non parametric methods of estimation to obtain reference curves of fetus and child growth. In addition, we want to develop a test, based on Z-scores, to detect any slope breaks in the fetal development curves (work in progress).