Members
Overall Objectives
Research Program
Application Domains
Software and Platforms
New Results
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Section: Software and Platforms

Data Mining

Classification and Clustering Methods

Participants : Marc Csernel, Yves Lechevallier [co-correspondent] , Brigitte Trousse [co-correspondent] .

We developed and maintained a collection of clustering and classification software, written in C++ and

/or Java:

Supervised methods

Unsupervised methods : partitioning methods

Unsupervised methods : agglomerative methods

A Web interface developed in C++ and running on our Apache internal Web server .is available for the following methods: SCluster, Div, Cdis, CCClust.

Previous versions of the above software have been integrated in the SODAS 2 Software  [61] which was the result of the european project ASSO (ASSO: Analysis System of Symbolic Official data.) (2001-2004). SODAS 2 supports the analysis of multidimensional complex data (numerical and non numerical) coming from databases mainly in statistical offices and administration using Symbolic Data Analysis [39] . This software is registrated at APP (Agence de la Protection des Programmes). For the latest version of the SODAS 2 software, see [60] , [79] .

In 2013, a new release of MND (Dynamic Clustering Method for Mixed Data) algorithm has been done based on [80] (cf. section 6.2.5 ) and used on clustering the user profiles and analysing user behaviour change (cf. Section  6.5.4 ).

Extracting Sequential Patterns with Low Support

Participant : Brigitte Trousse [correspondent] .

Two methods for extracting sequential patterns with low support have been developed by D. Tanasa in his thesis (see Chapter 3 in [72] for more details) in collaboration with F. Masseglia and B. Trousse :

These methods have been successfully applied from 2005 on various Web logs.

Mining Data Streams

Participant : Brigitte Trousse [correspondent] .

In Marascu's thesis (2009) [57] , a collection of software have been developed for knowledge discovery and security in data streams. Three clustering methods for mining sequential patterns (Java) in data streams method have been developed in Java:

Such methods take batches of data in the format "Client-Date-Item" and provide clusters of sequences and their centroids in the form of an approximate sequential pattern calculated with an alignment technique.

In 2010 the Java code of one method called SCDS has been integrated in the MIDAS demonstrator and a C++ version has been implemented by F. Masseglia for the CRE contract with Orange Labs with the deliverability of a licence) with a visualisation module (in Java).

It has been tested on the following data:

In 2012 within the context of the ELLIOT contract, SCDS has been integrated as a Web service (Java version) in the first version of FOCUSLAB platform: a demonstration was made on San Rafaelle Hospital media use case at the first ELLIOT review at Brussels. We applied SCDS web service on data issued from two other use cases in Logistics (BIBA) and Green Services (Inria) [38] .

The three C++ codes done for the CRE (Orange Labs) have been deposit at APP. The java code will be deposit in 2014 at APP.