Section: Application Domains
Work in this application domain started recently, with two main lines of research : dimension reduction of genetic datasets and prediction tasks using genetic data (such as the prediction of past human demography).
Flora Jay collaborated with Kevin Caye and colleagues (TIMC-IMAG, Grenoble) who developed an R package for inferring coefficients of genetic ancestry, using matrix factorization, alternating quadratic programming and projected least squares algorithms . The extension of ancestry inference and visualization methods to temporal data (for paleogenetics applications) remains to be done.
The demographic history of one or several population (of any organism) can be partially reconstructed using modern or ancient genetic data. A common approach in the population genetics field is to simulate pseudo-datasets for which the demographic parameters are known and summarize them into handcrafted features. These features are then used as a reference panel in an Approximate Bayesian Computation (likelihood-free) framework. Flora Jay has been developping such methods for the application to whole-genome data .
An untackled challenge in the field is to skip the summary step and directly handle raw data of genetic variations. Théophile Sanchez, who did a 6 month internship in TAU, started his PhD in October 2017 and is currently designing deep learning architectures that are suitable for multi-genome data . In particular these networks should be invariant to the permutation of indiviual genomes and flexible to the input size (see Section 7.2.7).