EN FR
EN FR


Section: New Results

Block regression for variable clustering: Application to genetic data

Participants : Christophe Biernacki, Julien Jacques, Loïc Yengo.

Genome Wide Association (GWA) studies have proved the implication of numerous single nucleotides polymorphisms (SNP) in the etiology of common diseases. Nevertheless, only a small part of the expected heritability of those diseases is explained by the most significantly associated SNPs. Many researches that have been lately investigating this missing heritability have considered interactions between genes and/or environmental factors as a plausible and promising explanation. Considering all if not a large number (hundreds of thousands) of variants altogether stresses the problem of the high dimensionality that most regression-based methods cannot afford. To solve this issue one either reduces the number of variants to be analyzed (shrinkage approaches) or groups them according to a certain similarity. We introduce here a regression model that simultaneously clusterizes the variants sharing close effect size while selecting the most informative clusters. The estimation of the model parameters is proposed by maximizing the likelihood. The challenges of this research rely on finding efficient algorithms for the clustering part while studying the consistency of our estimators for which the classical asymptotic theory does not apply [33] , [40] .