Section: New Results

Classification trees, functional data, applications in biology

Participants : Valérie Monbet, Audrey Poterie.

This is a collaboration with Jean–François Dupuy (INSA Rennes) and Laurent Rouvière (université de Haute Bretagne, Rennes).

Classification and discriminant analysis methods have grown in depths during the past 20 years. Fisher linear discriminant analysis (LDA) is the basic but standard approach. As the structure and dimension of the data becomes more complex in a wide range of applications, such as functional data, there is a need for more flexible nonparametric classification and discriminant analysis tools, especially when the ratio of learning sample size to number of covariates is low and the covariates are highly correlated and the covariance matrix is highly degenerated or when the large number of covariates are generally weak in predicting the class labels. For some data such as spectrometry data, only some parts of the observed curves are discriminant leading to groups of variables.

We proposed a classification tree based on groups of variables. Like usual tree-based methods, the algorithm partitions the feature space into M regions, by recursively performing binary splits. The main difference is that each split is based on groups of variables and the boundary between both classes is the hyperplane which minimizes the Bayes risk in the set generated by the selected group of variables. We demonstrate on several toy examples and real spectrometry data that the performances of the proposed tree groups algorithm are at least as good as the one of the standard CART algorithm and group Lasso logistic regression.