Section: Software
The SpaCEM program
Participants : Senan James Doyle, Florence Forbes.
SpaCEM
This software, developed by present and past members of the team, is the result of several research developments on the subject. The current version 2.09 of the software is CeCILLB licensed.
Main features. The approach is based on the EM
algorithm for clustering and on Markov Random Fields (MRF) to
account for dependencies. In addition to standard clustering tools
based on independent Gaussian mixture models, SpaCEM
The unsupervised clustering of dependent objects. Their dependencies are encoded via a graph not necessarily regular and data sets are modelled via Markov random fields and mixture models (eg. MRF and Hidden MRF). Available Markov models include extensions of the Potts model with the possibility to define more general interaction models.
The supervised clustering of dependent objects when standard Hidden MRF (HMRF) assumptions do not hold (ie. in the case of non-correlated and non-unimodal noise models). The learning and test steps are based on recently introduced Triplet Markov models.
Selection model criteria (BIC, ICL and their mean-field approximations) that select the "best" HMRF according to the data.
A specific setting to account for high-dimensional observations.
An integrated framework to deal with missing observations, under Missing At Random (MAR) hypothesis, with prior imputation (KNN, mean, etc), online imputation (as a step in the algorithm), or without imputation.
The software is available at http://spacem3.gforge.inria.fr . A user manual in English is available on the web site above together with example data sets. The INRA Toulouse unit is more recently participating to this project for promotion among the bioinformatics community [75] .