Section: New Results
Statistical learning methods for high-dimensional data
Automatic analysis of cell populations
We have developed two different approaches to classify the cell populations according to the high dimensional data obtained with flow cytometry assays. The first one is based on an interesting development of Dirichlet processes and the second one is based on a simple tree classification providing very high performances:
Hejblum BP, Alkhassim C, Gottardo R, Caron F, Thiébaut R. Sequential Dirichlet process mixtures of multivariate skew t-distributions for model-based clustering of flow cytometry data. Annals of Applied Statistics. In press.
Commenges D, Alkhassim C, Gottardo R, Hejblum B, Thiébaut R. cytometree: A binary tree algorithm for automatic gating in cytometry analysis. Cytometry A. 2018;93:1132-1140.
Missing Value Treatment in Longitudinal High Dimensional Supervised Problems
Poor blood sample quality introduces a large number of missing values in the context of sequencing data production. Furthermore, strong technical biases may force the analyst to remove the considered sequenced samples. Then entire day dependent data are then missing.
We have developed a regularized SVD based method using the temporal structure (through multi-block approach) of the missing values to estimate missing values with the objective of predicting uni-variate or multivariate regression responses but also classification problems. That regularizing method uses soft-thresholding on the co-variance matrices implying natural variable selection of covariate and response through a single hyper-parameter to be tuned.
Left-censored data treatment in High Dimensional Supervised Problems
Data could be censored either by the limit of detection or the limit of quantification. We have developed a regularized method for handling high-dimensional exposure data in the presence of censored values in the field of HIV that could be applied to other fields.
Soret P, Avalos M, Wittkop L, Commenges D, Thiébaut R. Lasso regularization for left-censored Gaussian outcome and high-dimensional predictors. BMC Med Res Methodol. 2018 Dec 4;18(1):159.