Section: Research Program

Multivariate decompositions

Multivariate decompositions provide a way to model complex data such as brain activation images: for instance, one might be interested in extracting an atlas of brain regions from a given dataset, such as regions exhibiting similar activity during a protocol, across multiple protocols, or even in the absence of protocol (during resting-state). These data can often be factorized into spatial-temporal components, and thus can be estimated through regularized Principal Components Analysis (PCA) algorithms, which share some common steps with regularized regression.

Let 𝐗 be a neuroimaging dataset written as an (nsubjects,nvoxels) matrix, after proper centering; the model reads

𝐗 = 𝐀𝐃 + Ο΅ , (5)

where 𝐃 represents a set of ncomp spatial maps, hence a matrix of shape (ncomp,nvoxels), and 𝐀 the associated subject-wise loadings. While traditional PCA and independent components analysis are limited to reconstructing components 𝐃 within the space spanned by the column of 𝐗, it seems desirable to add some constraints on the rows of 𝐃, that represent spatial maps, such as sparsity, and/or smoothness, as it makes the interpretation of these maps clearer in the context of neuroimaging. This yields the following estimation problem:

min 𝐃 , 𝐀 βˆ₯ 𝐗 - 𝐀𝐃 βˆ₯ 2 + Ξ¨ ( 𝐃 ) s.t. βˆ₯ 𝐀 i βˆ₯ = 1 βˆ€ i ∈ { 1 . . n f e a t u r e s } , (6)

where (𝐀i),i∈{1..nfeatures} represents the columns of 𝐀. Ξ¨ can be chosen such as in Eq. (2) in order to enforce smoothness and/or sparsity constraints.

The problem is not jointly convex in all the variables but each penalization given in Eq (2) yields a convex problem on 𝐃 for 𝐀 fixed, and conversely. This readily suggests an alternate optimization scheme, where 𝐃 and 𝐀 are estimated in turn, until convergence to a local optimum of the criterion. As in PCA, the extracted components can be ranked according to the amount of fitted variance. Importantly, also, estimated PCA models can be interpreted as a probabilistic model of the data, assuming a high-dimensional Gaussian distribution (probabilistic PCA).

Utlimately, the main limitations to these algorithms is the cost due to the memory requirements: holding datasets with large dimension and large number of samples (as in recent neuroimaging cohorts) leads to inefficient computation. To solve this issue, online method are particularly attractive.