Section: Research Program

Inverse problems in Neuroimaging

Many problems in neuroimaging can be framed as forward and inverse problems. For instance, the neuroimaging inverse problem consists in predicting individual information (behavior, phenotype) from neuroimaging data, while the forward problem consists in fitting neuroimaging data with high-dimensional (e.g. genetic) variables. Solving these problems entails the definition of two terms: a loss that quantifies the goodness of fit of the solution (does the model explain the data reasonably well ?), and a regularization schemes that represents a prior on the expected solution of the problem. In particular some priors enforce some properties of the solutions, such as sparsity, smoothness or being piece-wise constant.

Let us detail the model used in the inverse problem: Let 𝐗 be a neuroimaging dataset as an (nsubj,nvoxels) matrix, where nsubj and nvoxels are the number of subjects under study, and the image size respectively, 𝐘 an array of values that represent characteristics of interest in the observed population, written as (nsubj,nf) matrix, where nf is the number of characteristics that are tested, and β an array of shape (nvoxels,nf) that represents a set of pattern-specific maps. In the first place, we may consider the columns 𝐘1,..,𝐘nf of Y independently, yielding nf problems to be solved in parallel:

𝐘 i = 𝐗 β i + ϵ i , i { 1 , . . , n f } ,

where the vector contains βi is the ith row of β. As the problem is clearly ill-posed, it is naturally handled in a regularized regression framework:

β ^ i = argmin β i 𝐘 i - 𝐗 β i 2 + Ψ ( β i ) , (1)

where Ψ is an adequate penalization used to regularize the solution:

Ψ ( β ; λ 1 , λ 2 , η 1 , η 2 ) = λ 1 β 1 + λ 2 β 2 + η 1 β 1 + η 2 β 2 (2)

with λ1,λ2,η1,η20 (this formulation particularly highlights the fact that convex regularizers are norms or quasi-norms). In general, only one or two of these constraints is considered (hence is enforced with a non-zero coefficient):

  • When λ1>0 only (LASSO), and to some extent, when λ1,λ2>0 only (elastic net), the optimal solution β is (possibly very) sparse, but may not exhibit a proper image structure; it does not fit well with the intuitive concept of a brain map.

  • Total Variation regularization (see Fig. 1 ) is obtained for (η1>0 only), and typically yields a piece-wise constant solution. It can be associated with Lasso to enforce both sparsity and sparse variations.

  • Smooth lasso is obtained with (η2>0 and λ1>0 only), and yields smooth, compactly supported spatial basis functions.

Figure 1. Example of the regularization of a brain map with total variation in an inverse problem. The problem here consists in predicting the spatial scale of an object presented as a stimulus, given functional neuroimaging data acquired during the observation of an image. Learning and test are performed across individuals. Unlike other approaches, Total Variation regularization yields a sparse and well-localized solution that enjoys particularly high accuracy.

The performance of the predictive model can simply be evaluated as the amount of variance in 𝐘i fitted by the model, for each i{1,..,nf}. This can be computed through cross-validation, by learning β^i on some part of the dataset, and then estimating (Yi-Xβ^i) using the remainder of the dataset.

This framework is easily extended by considering

  • Grouped penalization, where the penalization explicitly includes a prior clustering of the features, i.e. voxel-related signals, into given groups. This is particularly important to include external anatomical priors on the relevant solution.

  • Combined penalizations, i.e. a mixture of simple and group-wise penalizations, that allow some variability to fit the data in different populations of subjects, while keeping some common constraints.

  • Logistic regression, where a logistic non-linearity is applied to the linear model so that it yields a probability of classification in a binary classification problem.

  • Robustness to between-subject variability is an important question, as it makes little sense that a learned model depends dramatically on the particular observations used for learning. This is an important issue, as this kind of robustness is somewhat opposite to sparsity requirements.

  • Multi-task learning: if several target variables are thought to be related, it might be useful to constrain the estimated parameter vector β to have a shared support across all these variables.

    For instance, when one of the variables 𝐘i is not well fitted by the model, the estimation of other variables 𝐘j,ji may provide constraints on the support of βi and thus, improve the prediction of 𝐘i. Yet this does not impose constraints on the non-zero parameters of the parameters βi.

    𝐘 = 𝐗 β + ϵ , (3)


    β ^ = argmin β = ( β i ) , i = 1 . . n f i = 1 n f 𝐘 𝐢 - 𝐗 β 𝐢 2 + λ j = 1 n v o x e l s i = 1 n f β 𝐢 , 𝐣 2 (4)