Context-Based Object-Class Recognition and Retrieval by Generalized Correlograms

Imedia Images and Multimedia: Indexing, Retrieval and Navigation

Perception, Cognition, Interaction

Vision, Perception and Multimedia Understanding

Nozha Boujemaa INRIA Chercheur

Rocquencourt

Team leader, Research Director (DR) INRIA oui Laurence Bourcier INRIA Assistant

Rocquencourt

Secretary (TRS) INRIA (shared with Salsa and Micmac project-team) Alexis Joly INRIA Chercheur

Rocquencourt

Research Associate (CR) INRIA Anne Verroust-Blondet INRIA Chercheur

Rocquencourt

Research Associate (CR) INRIA oui Jean-Paul Chièze INRIA Technique

Rocquencourt

Senior Technical Staff INRIA (half-time) Michel Crucianu UnivFr CollaborateurExterieur

Rocquencourt

Professor at CNAM oui Itheri Yahiaoui UnivFr CollaborateurExterieur

Rocquencourt

Assistant Professor at Reims University Donald Geman UnivEtrangere Visiteur

Rocquencourt

Professor at Johns Hopkins University, USA (May and June 2009) oui Mohamed Chaouch INRIA PostDoc

Rocquencourt

Expert engineer INRIA since April 2008 Raffi Enficiaud INRIA PostDoc

Rocquencourt

Expert engineer INRIA Joost Geurts INRIA PostDoc

Rocquencourt

Postdoctoral fellow Hervé Goëau INRIA PostDoc

Rocquencourt

Expert engineer INRIA since June 2008 Henri Gouraud INRIA Technique

Rocquencourt

Scientific advisor INRIA since May 2009 Mondher Khadraoui INRIA Technique

Rocquencourt

Junior Technical Staff INRIA since January 2009 Vincent Ladeveze INRIA Technique

Rocquencourt

Technical Staff INRIA since September 2009 Souheil Selmi INRIA Technique

Rocquencourt

Junior Technical Staff INRIA since December 15th 2009 Sahbi Bahroun UnivEtrangere PhD

Rocquencourt

Joint tutorship with Sup'Com, national grant since September 1st 2006 Olfa Besbes UnivEtrangere PhD

Rocquencourt

Joint tutorship with Sup'Com, national grant since September 1st 2004 Mohamed Chaouch INRIA PhD

Rocquencourt

INRIA grant, Télécom ParisTech until March 31st 2009 Mehdi Ellouze UnivEtrangere PhD

Rocquencourt

Joint tutorship with University of Sfax, national grant since September 1st 2006 Amel Hamzaoui INRIA PhD

Rocquencourt

INRIA grant, Paris-Sud University since October 1st 2008 Nicolas Hervé INRIA PhD

Rocquencourt

INRIA grant, Paris-Sud University until December 1st 2009 Saloua Ouertani-Litayem INRIA PhD

Rocquencourt

INRIA grant, Télécom ParisTech since October 1st 2009 Wajih Ouertani AutreEtablissementPublic PhD

Rocquencourt

INRA grant, Paris-Sud University since October 1st 2008 Ahmed Rebaï INRIA PhD

Rocquencourt

INRIA grant, Paris-Sud University since September 1st 2007 Mohamed Riadh Trad INRIA PhD

Rocquencourt

INRIA grant, Télécom ParisTech since October 1st 2009 Overall Objectives Introduction

One of the consequences of the increasing ease of use and significant cost reduction of computer systems is the production and exchange of more and more digital and multimedia documents. These documents are fundamentally heterogeneous in structure and content as they usually contain text, images, graphics, video and sounds.

Information retrieval can no longer rely on text-based queries alone; it will have to be multi-modal and to integrate all the aspects of the multimedia content. In particular, the visual content has a major role and represents a central vector for the transmission of information. The description of that content by means of image analysis techniques is less subjective than the usual keyword-based annotations, whenever they exist. Moreover, being independent from the query language, the description of visual content is becoming paramount for the efficient exploration of a multimedia stream.

In the IMEDIA group we focus on the intelligent access by visual content. With this goal in mind, we develop methods that address key issues such as content-based indexing, interactive search and image database navigation, in the context of multimedia content.

Content-based image retrieval systems provide help for the automatic search and assist human decisions. The user remains the maître d'oeuvre, the only one able to take the final decision. The numerous research activities in this field during the last decade have proved that retrieval based on the visual content was feasible. Nevertheless, current practice shows that a usability gap remains between the designers of these techniques/methods and their potential users.

One of the main goals of our research group is to reduce the gap between the real usages and the functionalities resulting from our research on visual content-based information retrieval. Thus, we apply ourselves to conceive methods and techniques that can address realistic scenarios, which often lead to exciting methodological challenges.

Among the "usage" objectives, an important one is the ability, for the user, to express his specific visual interest for a part ofa picture. It allows him to better target his intention and to formulate it more accurately. Another goal in the same spirit is to express subjective preferences and to provide the system with the ability to learn those preferences. When dealing with any of these issues, we keep in mind the importance of the scalability of such interactive systems in terms of indexing and response times. Of course, what value these times should have and how critical they are depend heavily on the domain (specific or generic) and on the cost of the errors.

Our research work is then at the intersection of several scientific specialities. The main ones are image analysis, pattern recognition, statistical learning, human-machine interaction and database systems. It is structured into the following main themes:

Image indexing: this part mainly concerns modeling the visual aspect of images, by means of image analysis techniques. It leads to the design of image signatures that can then be obtained automatically.

Clustering and statistical learning: generic and fundamental methods for solving problems of pattern recognition, which are central in the context of image indexing.

Interactive search and personalization: to let the system take into account the preferences of the user, who usually expresses subjective or high-level semantic queries.

Cross-media indexing, and in particular bimodal text + imageindexing, which addresses the challenge of combining those two media for a more efficient indexing and retrieval.

More generally, the research work and the academic and industrial collaborations of the IMEDIA team aim to answer the complex problem of the intelligent access to multimedia content.

Highlights of the year

The final CHORUS conferenceidentified cross-disciplinary challenges and recommendations in the domain of search engine technology. It had a great success: in addition to high representatives of the European Commission, the conference was attended by major industrial (e.g. Yahoo!, Thomson, Phillips, Exalead, etc.) and academic stakeholder of the search engine community (including representatives from North America and Japan).

Pl@ntNet projectBegining of the Pl@ntNet project “Plant Computational Identification & Collaborative Information System” http:// www. agropolis-fondation. fr/ fr/ nos-actions/ programmes-etendards/ pl-ntnet. html . Accurate knowledge of the identity, geographic distribution and uses of plants underpins the success of agricultural development and biodiversity conservation. Unfortunately, such basic data are often hardly available for professional stakeholders, as well as for teachers, scientists and citizens. Pl@ntNet will contribute to bridge this knowledge gap by:

- Developing cutting-edge transdisciplinary researches at the frontier between integrative systematics and computational sciences, based on the exploitation of large datasets, knowledge and expertise on plant morphology, anatomy, taxonomy, ecology, biogeography and uses.

- Providing free, easy-access software tools and methods for plant identification and for the collection, management, share and exploitation.

- Promoting citizen science, as a powerful means to enrich databases with new informations on plants and to meet the need of capacity building in agronomy, botany and ecology.

SHREC 2009 - Content-based retrieval of 3D generic modelsOur 3D alignment method coupled with one of our 2D/3D descriptors, the MDLA approach, has been positioned in the first place of the SHREC 2009 - Generic Shape retrieval Contest, with respect to the precision-recall measures.

Scientific Foundations Introduction

We group the existing problems in the domain of content-based image indexing and retrieval in the following themes: image indexing, pattern recognition, personalisation and cross-media indexing. In the following we give a short introduction to each of these themes.

Modelling, construction and structuring of the feature space Nozha Boujemaa Mohamed Chaouch Jean-Paul Chièze Amel Hamzaoui Nicolas Hervé Alexis Joly Ahmed Rebai Anne Verroust-Blondet Itheri Yahiaoui Content-based indexing

the process of extracting from a document (here a picture) compact and structured significant visual features that will be used and compared during the interactive search.

The goal of the IMEDIA team is to provide the user with the ability to do content-based search into image databases in a way that is both intelligent and intuitive to the users. When formulated in concrete terms, this problem gives birth to several mathematical and algorithmic challenges.

To represent the content of an image, we are looking for a representation that is both compact (less data and more semantics), relevant (with respect to the visual content and the users) and fast to compute and compare. The choice of the feature space consists in selecting the significant features, the descriptorsfor those features and eventually the encoding of those descriptors as image signatures.

We deal both with generic databases, in which images are heterogeneous (for instance, search of Internet images), and with specific databases, dedicated to a specific application field. The specific databases are usually provided with a ground-truth and have an homogeneous content (faces, medical images, fingerprints, etc.)

Note that for specific databases one can develop dedicated and optimal features for the application considered (face recognition, etc.). On the contrary, generic databases require generic features (colour, textures, shapes, etc.).

We must not only distinguish generic and specific signatures, but also local and global ones. They correspond respectively to queries concerning parts of pictures or entire pictures. In this case, we can again distinguish approximate and precise queries. In the latter case one has to be provided with various descriptions of parts of images, as well as with means to specify them as regions of interest. In particular, we have to define both global and local similarity measures.

When the computation of signatures is over, the image database is finally encoded as a set of points in a high-dimensional space: the feature space.

A second step in the construction of the index can be valuable when dealing with very high-dimensional feature spaces. It consists in pre-structuring the set of signatures and storing it efficiently, in order to reduce access time for future queries (tradeoff between the access time and the cost of storage). In this second step, we have to address problems that have been dealt with for some time in the database community, but arise here in a new context: image databases. The diversity of the feature spaces we deal with force us to design specific methods for structuring each of these spaces.

Pattern recognition and statistical learning

Statistical learning and classification methods are of central interest for content-based image retrieval .

We consider here both supervised and unsupervised methods. Depending on our knowledge of the contents of a database, we may or may not be provided with a set of labelled training examples. For the detection of knownobjects, methods based on hierarchies of classifiers have been investigated. In this context, face detection was a main topic, as it can automatically provide a high-level semantic information about video streams. For a collection of pictures whose content is unknown, e.g. in a navigation scenario, we are investigating techniques that adaptively identify homogeneous clusters of images, which represent a challenging problem due to feature space configuration.

Statistical learning and object detection Donald Geman Nozha Boujemaa Nicolas Hervé Alexis Joly Ahmed Rebai Statistical learning boosting object detection object retrieval kernel methods

Object detection is the most straightforward solution to the challenge of content-based image indexing. Classical approaches (artificial neural networks, support vector machines, etc.) are based on induction, they construct generalisation rules from training examples. The generalisation error of these techniques can be controlled, given the complexity of the models considered and the size of the training set.

Our research on object detection addresses the design of invariant kernels and algorithmically efficient solutions as well as boosting method for similarity learning. We have developed several algorithms for face detection based on a hierarchical combination of simple two-class classifiers. Such architectures concentrate the computation on ambiguous parts of the scene and achieve error rates as good as those of far more expensive techniques.

Clustering methods Nozha Boujemaa Michel Crucianu Hervé Goëau Itheri Yahiaoui Nicolas Hervé clustering membership number of classes pattern recognition competitive agglomeration

Unsupervised clustering techniques automatically define categories and are for us a matter of visual knowledge discovery. We need them in order to:

Solve the "page zero" problem by generating a visual summary of a database that takes into account all the available signatures together.

Perform image segmentation by clustering local image descriptors.

Structure and sort out the signature space for either global or local signatures, allowing a hierarchical search that is necessarily more efficient as it only requires to "scan" the representatives of the resulting clusters.

Given the complexity of the feature spaces we are considering, this is a very difficult task. Noise and class overlap challenge the estimation of the parameters for each cluster. The main aspects that define the clustering process and inevitably influence the quality of the result are the clustering criterion, the similarity measure and the data model.

We investigate a family of clustering methods based on the competitive agglomeration that allows us to cope with our primary requirements: estimate the unknown number of classes, handle noisy data and deal with classes (by using fuzzy memberships that delay the decision as much as possible).

Interactive search and personalisation Donald Geman Nozha Boujemaa Raffi Enficiaud Anne Verroust-Blondet Jean-Paul Chièze

We are studying here the approaches that allow for a reduction of the "semantic gap". There are several ways to deal with the semantic gap. One prior work is to optimise the fidelity of physical-content descriptors (image signatures) to visual content appearance of the images. The objective of this preliminary step is to bridge what we call the numerical gap. To minimise the numerical gap, we have to develop efficient images signatures. The weakness of visual retrieval results, due to the numerical gap, is often confusingly attributed to the semantic gap. We think that providing richer user-system interaction allows user expression on his preferences and focus on his semantic visual-content target.

Rich user expression comes in a variety of forms:

allow the user to notify his satisfaction (or not) on the system retrieval results–method commonly called relevance feedback. In this case, the user reaction expresses more generally a subjective preference and therefore can compensate for the semantic gap between visual appearance and the user intention,

provide precise visual query formulation that allows the user to select precisely its region of interest and pull off the image parts that are not representative of his visual target,

provide interactive visualisation tools to help the user when querying and browsing the database,

provide a mechanism to search for the user mental image when no starting image example is available. Several approaches are investigated. As an example, we can mention the logical composition from visual thesaurus. Besides, learning methods related to information theory are also developed for efficient relevance feedback model in several context study including mental image retrieval.

Cross-media indexing and retrieval Nicolas Hervé Nozha Boujemaa

We have described, up to now, our research approaches in using the visual content alone. But when additional information is available, it may prove complementary and potentially valuable in improving the results returned to the user. We may cite here metadata(file name, date of creation, caption, etc.) but also the textual annotations that are sometimes available. We must note that annotations usually carry high-level information related to a prior knowledge of the context. The use of these sources of information implies that we can speak of multimedia indexing.

We can think of several approaches for combining textual and visual information in the context of indexing and retrieval. As examples, we may cite the automatic textual annotation of images based on similarities between visual signatures or the propagation of textual annotations relying on the interaction between textual ontologies and visual ontologies. We also investigate methods that allow automatic textual annotation from visual content analysis. This part of our research activities is yet another solution for the reduction of the "semantic gap".

Application Domains Application Domains

Security applicationsExamples: Identify faces or digital fingerprints (biometry). Biometry is an interesting specific application for both a theoretical and an application (recognition, supervision, ...) point of view. Two PhDs were defended on themes related to biometry. Our team also worked with a database of images of stolen objects and a database of images after a search (for fighting pedophilia).

Audio-visual applicationsExamples: Look for a specific shot in a movie, documentary or TV news, present a video summary. Help archivists to annotate the contents. Detect copies of a given material in a TV stream or on the web. Our team has a collaboration with INA (French TV archives), IRT (German broadcasters) and press agencies AFP and Belga in the context of an European project. Text annotation is still very important in such applications, so that cross-media access is crucial.

Scientific applicationsExamples: environmental images databases: fauna and flora; satellite images databases: ground typology; medical images databases: find images of a pathological character for educational or investigation purposes. We have an ongoing project on multimedia access to biodiversity collections for species identifications.

Culture, art and designIMEDIA has been contacted by the French ministry of culture and by museums for their image archives.

Finding a specific texture for the textile industry, illustrating an advertisement by an appropriate picture. IMEDIA is working with a picture library that provides images for advertising agencies. IMEDIA is involved in TRENDS European project dedicated to provide designers (CRF Fiat, Stile Bertone) with advanced content selection and visualisation tools.

Software IKONA/MAESTRO Software Nozha Boujemaa Marin Ferecatu Nicolas Hervé Jean-Paul Chièze Mathieu Coutaud Alexis Joly Mehdi Bouabta Raffi Enficiaud Mondher Khadhraoui Souheil Selmi Francois Fleuret

IKONA is a framework for building Content Based Image Retrieval software prototypes. It has been designed and implemented in our team during the last four years . The current version is fully generic and is highly adaptable to any CBIR scenario thanks to its level of abstraction. As a research environment, IKONA offers support to the researchers in their work by providing stable and tested tools. As an application, it can easily be deployed and used by non-specialist users.

IKONA is based on a client/server architecture. The communication between the two components is achieved through a proprietary network protocol. It is a set of commands the server understands and a set of answers it returns to the client. The communication protocol is extensible, i.e. it is easy to add new functionalities without disturbing the overall architecture. It is also modular and therefore can be replaced by any new or existing protocol dealing with multimedia information retrieval.

The main processes are on the server side. They can be separated in two main categories:

offline processes: data analysis, features extraction and structuration

on-line processes: answer the client requests

The images are characterised with Globalsignatures that are implemented in the server:

Generic signatures: Colour, Shape and Texture features investigated at the IMEDIA Group.

Specific signatures: Faces and signatures for fingerprints.

Annotations: Some keywords.

Besides, two localsignatures are included: The region-based description and the point-based one. The server uses image signatures and offers several types of query paradigms, available to the user through the graphical interfaces of the clients:

query by global example: The user selects an entire image as visual query.

partial queries: the user is looking for regions in images that are visually similar to a the selected region.

relevance feedback on global and partial query: the user interacts with the system in a feedback loop, by giving positive and negative examples to help the system identify the category of images she/he is interested in ;

mental image search: Two different methods are investigated. The first is Target Image Search with relevance feed-back model based on mutual information, the second one consist on Logical Query Composition.

We have developed two main clients that can communicate with the server. A good starting point for exploring the possibilities offered by IKONA is our web demo, available at http://www-roc.inria.fr/cgi-bin/imedia/circario.cgi/bio_diversity?select_db=1. This CGI client is connected to a running server with several generalist and specific image databases, including more than 23,000 images. It features query by example searches, switch database functionality and relevance feedback for image category searches. The second client is a desktop application. It offers more functionalities. More screen-shots describing the visual searching capabilities of IKONA are available at http://www-rocq.inria.fr/imedia/cbir-demo.html.

The architecture of this client/server software and several visual signatures were a subject of a deposit to APP. It is distributed to INA, AFP, INRA, Ministry of Interior, JRC and Alinari.

PMH Library Alexis Joly Olivier Buisson INA

PMH is a generalist software library dedicated to locality sensitive hashing in metric spaces for approximate similarity search. It allows to index and exploit efficiently large datasets of content descriptors, usually represented by high dimensional feature vectors. The construction of the index and the required memory space are linear in dataset size. The nearest neighbour search algorithm is sublinear in dataset size.

PMH is globally related to Locality Sensitive Hashing methods (LSH) that have been proved to be the most efficient ones for approximate similarity search in large and high-dimensional datasets. Contrary to classical LSH method (such as the ones used in MIT E2LSH package), PMH includes a multi-probe search algorithm which allows to drastically reduce the memory space complexity enabling to deal with datasets of several order of magnitude larger. Our multi-probe algorithm being based on a probabilistic control of buckets success probability also offers to control accurately the quality of the approximate search. Finally, PMH library is widely more generic than concurrent libraries (such as FLANN or LSHKIT). It allows the use of different metric types (L1, L2, Hamming, inner product, weighted distances, etc.), different data types (binary, float, sparse, non vectorial, etc.), different query types (K nearest neighbours, range queries, probabilist queries, empirical models, etc.), differentes hashing functions families (random projections with different distributions, kernel based projections, optimized projections such as PCA or LDA, etc.).

Notably, PMH library is the core technology for the scalability issues addressed by VITALAS European project and is fully integrated in the resulting VITALAS multimedia search engine. It has been successfully applied to multi-users real-time content-based retrieval in 20 millions Flickr images and to real-time local search of small objects in a 100K images collection (including 120 millions SIFT features).

New Results Construction and organisation of the visual feature space Multi-source web image search results clustering Amel Hamzaoui Alexis Joly Nozha Boujemaa Multi-source shared neighbors clustering web search

Millions of users interact with search engines daily. Most of existing popular search engines allow users to represent their search intents by issuing the query as a list of keywords. However, keywords queries are usually ambiguous. This ambiguity often leads to unsatisfying search results. For example, the query “apple” covers several different topics; fruit, smart phone, computer and so on. Heterogeneous search results need to be combined and structured efficiently and generically.

We propose to use clustering techniques that are using only ranked nearest neighbours information (and not directly features or similarity measures). Such method has been proved to be very interesting.

We are notably using some a contrario principle to normalize connexity information.

The goal is to easy fuse different sources of information without any learning or prior knowledge and to produce either mono or multi source clusters in the same clustering results.

The first step is to consider all objects as candidate cluster centers and to compute a significance score for each center with his nearest neighbours including an oracle selection step to decide which modalities are more significant for each candidate cluster.

Because this step is time consuming, we construct a fast shared neighbour's intersection matrix for each modality at the beginning of the process.

This optimization accelerates our algorithm so that a user can quickly get an overview of the different clusters with the mention of the modalities used.

We experimented our approach on the Exalead Corpus http://www.exalead.com/search/and we found very interesting multi-source clusters for different queries.

We plan to evaluate our work in the scope of re-ranking rather than clustering since there is not an evaluation dataset for web search clustering.

For now, the different information sources that we use are mostly visual ones (Bof, Global features, etc). We would like to test our fusion (re-ranking /clustering) algorithm on different modalities and see how we perform compared to state-of-the art.

An example of a structured result for the query “Flag” is shown in Figure (see also http://www-roc.inria.fr/ hamzaoui/InterfaceExa.html).

3D indexing Mohamed Chaouch Anne Verroust-Blondet Skander El Fekih 3D alignment 3D model retrieval Principal Component Analysis symmetry detection choice of the optimal pose

This year, we pursued our work on 3D model retrieval and indexing in several directions.

A new global descriptor, called 3D gaussian descriptor (3DGA), derived from the Gauss transform has been proposed in and . It consists in a spatial description of the model built from the Gaussian law and obtained by a summation on the surface of the model (see figure ). The 3DGA descriptor is efficient but less effective than our 2D/3D descriptors for the generic models. Nevertheless, it may be useful to describe the 3D model having an important part of its surface hidden when computing its 2D projections.

Our 3D alignment method , has shown again its good performances. Indeed, our alignment method, coupled with the MDLA descriptor (AL-MDLA) won the generic track of the SHREC 2009 contest (see figure ).

Moreover, this result has been reinforced by the detailed evaluations made by Mohamed Chaouch in his thesis on the main 3D generic shapes databases: once again the AL-MDLA approach obtained the best retrieval performances in all the cases. These results confirmed the importance of an appropriate choice of a 3D alignment method during the normalisation step of the retrieval process and the effectiveness of our 2D/3D descriptor when retrieving 3D models inside a database of 3D generic models.

Our alignment work has also been extended to reduce the number of reference frames that can be associated to a 3D model to find its natural pose among the 48 coordinate systems associated to the alignment axes. The principle of the extension is detailed in and in . It is based on observations of human perception w.r.t. the vertical symmetries of the models.

An interactive tool has been developed by Skander El Fekih during his master's thesis . Figure shows examples of reduced sets of models reference frames proposed to the user by the tool.

Alignment of 2D objects Olfa Mzoughi Itheri Yahaioui Nozha Boujemaa Alignment of 2D Objects Principal Component Analysis Symmetry Detection

The main difficulty in 2D shape recognition is that shapes of objects can vary within the same semantic class. These variations, called deformations, can be due to multiple reasons: the objects may be viewed from different perspectives, the objects may be structurally different (in the case of articulated and deformable objects), or objects may have a different scale. In general, a normalization step to achieve invariance under all possible deformations is required before the recognition process. The normalization consists of three steps. The first step centers the objects to achieve translation invariance. The second step normalizes the scale of the objects. The third step aligns the objects to achieve rotation invariance. Most existing normalization methods are efficient solutions for centering and scaling. However, alignment remains unsolved.

Humans achieve this task efficiently by placing objects in the way that they are most commonly seen in their surroundings. Finding a technique that simulates this behavior is challenging. Results from psychological tests on human perception and recent 3D alignment methods show that symmetry is an important factor that contributes to such intuitive alignment. Based on this, we propose a new approach to automatically align 2D shape in an intuitive way. Inspired by an idea related to 3D alignment , this approach is based on two types of symmetry: the reflective symmetry and the local translational symmetry. The reflective symmetry is used as a criterion to validate the principal component analysis (PCA) alignment.

In case the PCA alignment is rejected, an alternative technique is proposed, which is based on the local translational symmetry. This is defined as the repetition of the same geometrical properties along a given direction. In our algorithm, we used two representations of shape: its boundary and its surface. We show that the surface representation, which takes into account all points of the shape, often works better than the boundary representation. It can be argued that points on periphery are more sensitive to deformations. In general, compared to other alignment approaches, our method computes rapidly and efficiently intuitive alignments, such as the ones presented in figure .

Grape leaves segmentation Sofiène Mouine Raffi Enficiaud Nozha Boujemaa Ezzedine Zagrouba computational botany segmentation mathematical morphology

In the scope of the Pl@ntNet project, we are working on plant identification. Previous work on Orchidae of Laos showed precise identification by the use of images of their leaves. In this case, the leaves were scanned and appropriately cropped in order to retain only the relevant information. We are now extending this preliminary work onto the grapes identification, first by the use of a regular digital camera and second by evaluating several shooting protocols. The latters aim at being more realistic against the working conditions in the field.

Contour-based shape descriptors, such as the one presented in , have interesting discriminative properties and should address all these previous issues. Before being able to describe the regions of interest, a segmentation should be performed beforehand. The segmentation algorithm should ideally be working with a few and yet intuitive parameters, and should be fast. The original watershed transform along with some of its improvements and extensions , were interesting candidates for this task.

Our work consisted in implementing and evaluating the original watershed on images under varying shooting conditions. We first focused on images with relatively homogeneous background, with either controlled or uncontrolled illuminating conditions. In the semi-supervised version of the watershed, an image marking the inside of each interest region is needed. We postponed the automatic choice of the markers to a future work, and used manually placed markers.

The details of this work are presented in and an example of segmentation is shown in figure .

These results mainly show that the watershed transform is able to address the extraction of regions of interest. Some work should be done in order to address less controlled shooting conditions and automatic processing.

The extension of this work are threefold. First, we are investigating automatic markers placement. The visual cues on which we lead our work are the colour and the vein network. Indeed, the vein network for the grape families is almost always visible. Second, the segmentation should be robust to varying illuminating conditions and particularly to shadows. We propose to enhance the currently used gradients for that purpose. Finally, partial image description inside the regions should make the final identification's step robust to frequently occurring occlusions. Finally, we also would like to extend these investigation to flower segmentation.

High resolution satellite image classification by using multi-cue combination and Discriminative Random Field framework Olfa Besbes Nozha Boujemaa Ziad Belhadj SUP'COM - Tunisia high-resolution satellite images classification homogenous/non-homogenous DRF model multi-cue combination contextual interactions

In recent years the resolution of images that are obtained from satellites has increased significantly to reach nowadays 41 cm/pixel in the panchromatic band with GeoEye-1 sensor. Consequently, new challenges arise for an accurate land-cover interpretation of greatly spectral and spatial heterogeneous data. Because of this heterogeneity, satellite images are ambiguous and their classification remains a difficult task despite many thoughtful attempts. Indeed, most existing classification methods are only suitable to a specific range of resolution and on the whole they fail as the resolution is high. In order to overcome this shortcoming, we proceed in , as follows: Fist, we perform a multi-cue combination by incorporating various features such as color, texture and edge in a single unified discriminative model. Given a high resolution satellite image database, we learn an appropriate dictionary which consists of cue meaningful clusters namely color clusters, textons and shapemes. Second, we adopt a probabilistic modeling approach to resolve uncertainties and intra-region variabilities as well as to enforce global labeling consistency. In fact, we define a Discriminative Random Field (DRF) model on an adjacency graph of superpixels which focuses directly on the conditional distribution $Im1 ${p\mfenced o=( c=) L\mfenced o=| X,\#952 }$$ of labels Lgiven the image observations Xand the learned parameters $\theta$ . Our DRF model captures similarity, proximity and familiar configuration so that a powerful discrimination is ensured. In order to capture contextual interactions of the labels as well as the data, we define in a non-homogeneous discriminative model with spatially dependent association and pairwise potentials. Third, we take a feature selection approach based on sharing boosting to learn efficiently the feature functions and to discriminate powerfully the regions of interest though the content complexity. Finally, we apply a cluster sampling algorithm , which combines the representational advantages of DRF and graph cut approaches, to infer the global optimal labeling.

We train and test our model on high resolution SPOT-5 satellite images. Our method is suitable to any range of resolution since we need just to perform training in the appropriate database. Promising results are obtained as shown in figures and .

The non-homogeneous DRF model provides better results than the homogeneous DRF model which demonstrates the importance of contextual information integration. In figure , we illustrate results obtained by our homogeneous DRF model for urban area extraction. In future work, we plan to learn the weighting parameters of potentials and extend our model to a multi-scale framework.

Texture Based Satellite Image indexing with Local Binary Pattern Correlograms Sahbi Bahroun Nozha Boujemaa Ziad Belhadj SUP'COM - Tunisia Multispectral satellite image Textures Interest points Local Binary Pattern Correlograms

Description and recognition of textures in satellite images has attracted growing attention in recent years. In a novel approach for retrieval of textures based on a novel type of image representation is presented: the Local Binary Pattern Correlograms (LBPCs). Our representation is obtained by first performing an extraction of the most informative points in the image. Then, we compute local binary patterns around these interest points. Furthermore, we propose a novel texture feature by computing the correlogram of the LBP computed around the interest points. Our new LBPCs combine the potential of local and global descriptors. Local descriptors, represented by local features extracted around interest points, are characterized by their robustness to occlusions, scale and geometric transformations. Global descriptors, represented by correlograms, are very informative about the overall visual structure of an object. The LBP occurrence correlogram is proved to be a very powerful texture feature. Our proposed LBP Correlograms has been tested on a real SPOT image database. The experimental result shows good average retrieval accuracy. Excellent results are achieved compared against some state of the art methods.

In Figure , the precision recall curve of our proposed approach (LBPCs) is compared with the curve of the other approaches: (I) LBPCs combining the monochrome and opponent LBP and with three set neighborhood, (II) LBPCs with only one set neighborhood, (III) MLBP Histogram [7] and (IV) traditional Correlograms. It is clearly showed that the performance of our method is better than others. There is not much different in accuracy between our proposed LBPCs and LBPCs combining monochrome and opponent LBP. Our method is faster and with a smaller memory size to store index.

Interactive retrieval and navigation Relevance feedback on local image features, with application to the identification of plant species Wajih Ouertani Nozha Boujemaa Michel Crucianu relevance feedback local features feature selection active learning

The characterization, evaluation and use of plant biodiversity is based on the precise and efficient identification of its components and especially of the species. The identification keys issued from systematic botany mainly rely on characteristics that are ineffective in many real-world situations. The development of the inventory of species, of community ecology and of the monitoring of self-propagating plants is limited because it requires an active and continuing involvement of the very few highly specialized botanists. The collaboration between the UMR AMAP and the IMEDIA team aims to address this challenge by exploiting image analysis and recognition in a generic interactive species identification system. Since the identification process should be interactive, we decided to further explore relevance feedback on sets of local image features that describe regions of interest of an image. In the case under focus here, such regions would correspond to plant organs whose attributes are potentially relevant for identification. It should be noted that the problem we address is very difficult, since there is significant variability in pose and the relevant plant organs often correspond to sets of patches that are scattered in a region of interest.

During the first year of Wajih Ouertani's PhD, we have studied and tested state of the art kernels for matching sets of vectors, in order to extend SVM-based relevance feedback to the use of local features. The first experiments concerned the Pyramid Match Kernel (PMK) , for which interesting results are reported in the literature on object class recognition. PMK is based on a hierarchical uniform quantization of the feature space and represents a set of local features as a multi-resolution histogram. The kernel is obtained as a modified histogram intersection, with level-specific weight. Our experiments on Graz-02 dataset with several descriptors including SIFT show that there are two significant problems with this approach. The first is related to PMK and more specifically to the construction of the hierarchy. Consider a quantization level that is too coarse and unable to provide enough discrimination to separate local features that have low similarity. When getting from this level to the next, more refined level, each quantization interval is divided by two in all dimensions. If the dimension of the description space is high, the resulting quantization intervals are too small and local features that should be considered similar actually fall in different intervals. To address this issue, we investigated the random histogram features set representation , associated to a linear kernel, and this representation appears to be more able to avoid this kind of problem.

The second problem is not specific to the use of PMK but rather to the fact that all the local features selected in a region of interest are used as a single (positive or negative) example. Set kernels tend to “bind” the local features in a set together, so it becomes harder to ignore part of them that are actually irrelevant (e.g. they come from the background) or to be robust to strong occlusions. We explore solutions based on replacing a large set of local features by several localized subsets of features.

Another issue is object/noise separation: since the relevant plant organs often correspond to sets of patches that are scattered in a region of interest, many of the local features falling in the region selected by the user actually belong to the background or have in their description a strong influence from the background. It is then necessary to find appropriate feature selection solutions in order to reduce the level of such noise.

As part of our work, several programs and software modules were developed to handle, integrate and evaluate this type of feedback. In order to evaluate the performance in the target application, a botanical database with a local ground truth (region-based annotations) was prepared by AMAP using annotation software developed by IMEDIA.

Multi-criteria classification for Plant Identification Hervé Goëau Donald Geman Nozha Boujemaa computational botany multi-criteria classification

In the frame of the Pl@ntNet project, we begun to work on classification methods helping botanists to identify plant species. One field of investigation which has recently started concerns the “multi biological criterias” classification. Indeed, botanists are used to observe and analyse specimens according various visual aspects, various “characteristics” or “biological criterias”, in order to identify the biological taxonomy of one plant and to discern plants between them.

Figure shows an example of this biological description on one specimen. This sample, one “Ebenaceae Diospyros Elliotii” plant in its natural environment, is represented by several pictures where each picture is annotated by a set of labels, i.e. some usual botanical characteristics as “bark”, “flowers”, “inflorescence”, “limb marge”, “leaf”, “petiole”, ...). These annotations in this botanical context lead us to an original image classification problem where each individual sample in the training data is represented by several multi-labeled pictures. Moreover, each class (i.e. each specie) is represented by several specimens which are not necessary covering the same botanical characteristics. Furthermore, this is a challenging classification problem because even one flora of a limited geographical area can contain several hundred species.

First investigations are centered on an hierarchical classification model, an extension of a previous work on information fusion . This classification method combines the visual signatures of the partial and complementary views of the known species and the botanical expert annotations. Our future challenge is to take into account the botanical expertise knowledge of a user in an interactive approach in order to improve the classification performances.

Logo retrieval with a contrario visual query expansion Alexis Joly Olivier Buisson INA visual query expansion logo retrieval a contrario geometric consistency SIFT LSH

In the scope of a use case of VITALAS European project, we did work on a new content-based retrieval framework applied to logo retrieval in large natural image collections. The first contribution is a new challenging dataset, called BelgaLogos , which was created in collaboration with professionals of BELGA press agency, in order to evaluate logo retrieval technologies in real-world scenarios. The dataset as well as baseline results have been made available to the community on a dedicated web page http://www-rocq.inria.fr/imedia/belga-logo.htmland exchanges with other partners did start on the topic.

BelgaLogos results
	Baseline	Qexp acontrario
Logo name	Qset1	Qset2	Qset1	Qset2
Adidas	7.8	0.7	13.3	0.7
Adidas-text	5.6	1.1	7.8	1.1
Base	14.4	38.9	21.5	58.2
Bouygues	18.2	11.3	18.6	15.3
Citroën	6.1	4.5	38.4	4.5
Citroën-text	5.3	0.1	18.8	0.1
CocaCola	23.0	0.1	48.6	0.1
Cofidis	26.0	55.2	26.6	65.3
Dexia	16.6	29.3	24.0	51.3
Ecusson	1.1	0.1	5.9	0.1
Eleclerc	78.1	74.1	80.6	80.1
Ferrari	24.7	7.5	41.4	17.5
Gucci	50.0	0.0	50.0	0.0
Kia	32.8	61.3	67.5	75.6
Mercedes	9.7	18.5	15.0	19.2
Nike	1.4	1.2	3.5	2.6
Peugeot	20.0	20.7	20.2	23.2
US President	64.3	60.3	96.6	100.0
Puma	8.6	2.2	20.0	2.2
Puma-text	51.6	0.7	56.6	0.7
Quick	24.4	39.0	41.4	56.6
Roche	50.0	0.2	50.0	0.2
SNCF	33.3	27.9	35.4	33.7
StellaArtois	32.7	31.8	39.3	43.4
TNT	22.5	2.5	33.54	4.4
VRT	11.1	5.8	12.53	11.2
All	20.8	19.0	34.11	25.7

Mean Average Precision and Prediction time for the 10 studied classes of Caltech256
	Exhaustive	Approximate
Classes	Time(sec)	M.A.P	Time(sec)	M.A.P
airplanes	9008.92	0.2037	35.43	0.3881
american-flag	8935.19	0.2922	35.72	0.3903
chess-board	9537.36	0.7156	33.21	0.7446
golf-ball	8908.83	0.1156	39.31	0.2361
mars	9017.57	0.1603	31.2	0.0909
motorbikes	9001.33	0.2863	34.43	0.4516
sunflower	8942.48	0.5797	32.63	0.6214
swiss-army-knife	1604.16	0.0201	31.52	0.1196
tennis-racket	8923.87	0.2266	33.08	0.2715
tower-pisa	8911.52	0.2683	40.01	0.5512
Means	8279.123	0.2868	34.65	0.3865

The second and main contribution is a new visual query expansion method using an a contrario thresholding strategy in order to improve the accuracy of expanded query images. Whereas previous methods based on the same paradigm used a purely hand tuned fixed threshold, we provide a fully adaptive method enhancing both genericity and effectiveness. This new technique has been evaluated on both OxfordBuilding dataset and our new BelgaLogos dataset. Results did show that the proposed technique outperforms both the baseline method and previous state-of-the-art visual query expansion method. Mean Average Precision results on BelgaLogos dataset are provided in Table . More details can be found in .

Baseline Qexp acontrario

    Logo name Qset1    Qset2    Qset1    Qset2

   Adidas 7.8    0.7    13.3    0.7

   Adidas-text 5.6    1.1    7.8    1.1

   Base 14.4    38.9    21.5    58.2

   Bouygues 18.2    11.3    18.6    15.3

   Citroën 6.1    4.5    38.4    4.5

   Citroën-text 5.3    0.1    18.8    0.1

   CocaCola 23.0    0.1    48.6    0.1

   Cofidis 26.0    55.2    26.6    65.3

   Dexia 16.6    29.3    24.0    51.3

   Ecusson 1.1    0.1    5.9    0.1

   Eleclerc 78.1    74.1    80.6    80.1

   Ferrari 24.7    7.5    41.4    17.5

   Gucci 50.0    0.0    50.0    0.0

   Kia 32.8    61.3    67.5    75.6

   Mercedes 9.7    18.5    15.0    19.2

   Nike 1.4    1.2    3.5    2.6

   Peugeot 20.0    20.7    20.2    23.2

   US President 64.3    60.3    96.6    100.0

   Puma 8.6    2.2    20.0    2.2

   Puma-text 51.6    0.7    56.6    0.7

   Quick 24.4    39.0    41.4    56.6

   Roche 50.0    0.2    50.0    0.2

   SNCF 33.3    27.9    35.4    33.7

   StellaArtois 32.7    31.8    39.3    43.4

   TNT 22.5    2.5    33.54    4.4

   VRT 11.1    5.8    12.53    11.2

   All 20.8    19.0    34.11    25.7

BelgaLogos results
Video database navigation Raffi Enficiaud Alexis Joly Olivier Buisson INA navigation video key frame clustering
In the scope of the VITALAS project, we developed a graphical interface in order to exploit the temporal relationships of images within videos. The tests were conducted on a database of 10 hours of news videos (approximately 75000 images). The interface combines the classical similarity search of Maestro with the temporal information available from news events. Based on this information we allow a user to more efficiently navigate through a large collection of audio-visual data. Indeed, the navigation allows to combine the images similarity search with their temporal relationship by proposing two views. The first one shows an unordered similarity search and, for each image, the videos and their time stamp within each video. The events that are closer to the beginning of the videos are, for instance, more related to the hot news. The second view shows the main topics and the key frames of the video associated to the selected images, along with their temporal occurrence. The main topics are drawn from a clustering on the whole database. The clusters with a large number of elements are considered as structuring each videos (reports, interviews, jingle...), while clusters of smaller size are considered as providing information on t he topics covered by the videos. In the example shown in figure , we used maps as entry points on the semantic content of news report, which are in this case the events related to identifiable and geographically located parts of the world. This first view on the left reveals that the first map is used in three different videos in the 10 hours, and might be a location covered by a series of reports. By selecting an image, a time-line view ( on the right) is shown. This view stresses the contents co-occurring with the map within the same time period . It may then provide information on events, people, polls or popular opinions that, in some extent, are related to this geographical event and might be hard to infer with the visual similarity only.

We demonstrated the functionalities of this interface during the VITALAS annual review.
Scene Pathfinder: Unsupervised Clustering Techniques for Movie Scenes Extraction Mehdi Ellouze Nozha Boujemaa Adel Alimi ENIS, Tunisia video segmentation scene detection shots clustering
The need for watching movies is in perpetual increase due to the widespread of the internet and the increasing popularity of the video on demand service. The important mass of movies stored in the Internet or in VOD servers need to be structured to accelerate the browsing operation. We propose in a new system called "The Scene Pathfinder" that aims at segmenting the movies into scenes to give users the opportunity to have a non-sequential access and to watch particular scenes of the movie. This helps them to judge quickly the movie and decide if they have to buy or to download it and avoiding waste of time and money. The proposed approach is multimodal (see also , , ). We use both of visual and auditory information to accomplish the segmentation. We base on the assumption that every movie scene is either action or non-action scene. Non-action scenes are generally characterized by static backgrounds and occur in the same place. For this reason, we base on the content information and on the Kohonen map to extract these kinds of scenes (shots agglomerations). Action scenes are characterized by high tempo and motion. For this reason, we base on tempo features and on the Fuzzy CMeans to classify shots and to localize the action zones. The two processes are complementary. Indeed, the over segmentation that may occur in the extraction of action scenes by basing on the content information is repaired by the Fuzzy clustering. Our system has been tested on a varied database and obtained results show the merit of our approach (compared to ) and that our assumptions are well-founded.

In figure , we present our framework. We divide the scenes of the movies into two important classes: action scenes and non-action scenes. To detect non-action scenes (dialog, monolog, landscape, romance...) we use the content information and the Kohonen map to discover the agglomerations of shots (scenes) having common backgrounds and objects. In the other hand, we use audio-visual tempo features and the Fuzzy CMeans classifier to delimit the core of action scenes (fight, car chase, war, gun fire...) to remedy the over segmentation that may occur in action scenes.
Automatic annotation and learning Categorical object retrieval Ahmed Rebai Alexis Joly Nozha Boujemaa interest points local descriptors interpretability feature selection
Unlike search by similarity techniques which have—to some extent—become reliable over the past few years, object retrieval still have got many issues. In fact, searching for a concept addresses various problems related to feature extraction (e.g. invariance in viewpoint, illumination, affine transformations, etc.) and machine learning (robustness, over-fitting, genericity, computation time, etc.).

Our research focuses on building concise and powerful models to make it possible to retrieve objects in large heterogeneous image collections. We integrated indeed a good feature selection algorithm based on both boosting and lasso techniques.

Contrary to most training algorithms used in automatic object recognition, this new algorithm generates sparse models from the complete space of all the local features of the training images. The intuitive idea is to add an extra term to the loss function. This term represents a constraint which causes shrinkage of the solutions towards zero.

Given a loss function L( y, f( x))where f( x)being the sum of base learners $Im2 ${f{(x)}=\#8721 _{j}^{}\#946 _jh_j{(x)}}$$ , the objective is to minimize the following function:

$Im3 ${\#915 {(\#946 ,\#955 )}=\munderover \#8721 {i=1}n{L(}{(y_i,f{(x_i)})}+\#955 ·{||\#946 ||}}$$

where nis the number of the training examples and $\lambda$ $\ge$ 0is a parameter that controls the amount of the shrinkage applied. The bigger the $\lambda$ coefficient, the sparser the model. Sparsity is known to be a good tradeoff between model simplicity and good category representation. Added to that, sparsity tends to favor interpretability which is practical for a human interaction afterwards.

Preliminary experiments carried out on the PascalVOC database using two successful state-of-the-art descriptors (SIFT and SURF) are promising.
Automatic image annotation through visual word co-occurences Nicolas Hervé Nozha Boujemaa automatic image annotation object detection bag-of-word visual descriptor
The bag-of-visual-words is a popular representation for images that has proven to be quite effective for automatic annotation.

The main idea behind bag-ofvisual-words is to represent an image with a collection of visual patches and to compute an histogram counting the occurrences of these patches as a global signature. This representation can then be used in any learning framework to manage the automatic annotation problem. It is simple to implement and provides current state-of-the-art performances on several evaluation benchmarks. One of the main characteristic of bag-of-visual-words is their orderless nature. The spatial position of the visual patches is dropped and never used. On one hand this choice brings flexibility and robustness to the representation as it is able to deal with changes in viewpoint or occlusion. On the other hand, the spatial relations between patches could be useful to describe the internal structure of objects or to highlight the importance of contextual visual information for these objects.

We extend this representation in order to include weak geometrical information by using visual word pairs. We choose to consider the co-occurrence of words in a predefined local neighborhood of each patch. Thus, we only consider the distance between two patches, whatever their relative orientation. This way, we include both contextual and structural information in our new visual signature.

Following our previous work, we choose to extract standard low-level visual patches on a regular grid before creating the pairs and we use SVMs as a learning strategy.

On a standard image database (Pascal VOC 2007), we achieve 10% higher annotation performances by considering the word pairs.

Embedding the word pairs in a standard bag-of-visual-words representation brings very significant improvement for an automatic annotation task. The weak geometrical information they encode is complementary to the standard words occurrences histogram.

This work as been published in . The overall system is described in the thesis .
Objects retrieval with efficient boosting Saloua Ouertani-Litayem Alexis Joly Nozha Boujemaa scalability relevance feedback local descriptors Adaboost multi-probe locality sensitive hashing feature selection
Most recent and effective recognition techniques are based on high-dimensional and sparse representations induced by the large number of local visual features. Classifiers learned on such representations are usually applied to test images one by one and the complexity in a retrieval context is intrinsically linear in dataset size. Hence, we explored an efficient boosting strategy in order to reduce the retrieval complexity when using feature rich representations of images. For learning step we used AdaBoost algorithm with a weak learner based on distances between training local features. Instead of predicting the scores of the images one by one, we performed Trange queries in the dataset $\upper_omega$ according to the Tweak classifiers parameters $Im4 ${(\#119855 _t,\#952 _t)}$$ . Therefore we used the a posteriori multi-probe locality sensitive hashing similarity search structure . Each range query returns a set of features R_tsuch as:

$Im5 ${R_t=range_\#937 {(\#119855 _t,\#952 _t)}=\mfenced o={ c=} \#119855 \#8712 V_\#937 {\#8741 d}{(\#119855 ,\#119855 _t)}\lt \#952 _t}$$

Experiments on Caltech 256 dataset show that the technique is about 250 times faster than the naive exhaustive method with surprisingly better performances (see Table ).

Exhaustive Approximate

Classes Time(sec) M.A.P Time(sec) M.A.P

airplanes 9008.92 0.2037 35.43 0.3881

american-flag 8935.19 0.2922 35.72 0.3903

chess-board 9537.36 0.7156 33.21 0.7446

golf-ball 8908.83 0.1156 39.31 0.2361

mars 9017.57 0.1603 31.2 0.0909

motorbikes 9001.33 0.2863 34.43 0.4516

sunflower 8942.48 0.5797 32.63 0.6214

swiss-army-knife 1604.16 0.0201 31.52 0.1196

tennis-racket 8923.87 0.2266 33.08 0.2715

tower-pisa 8911.52 0.2683 40.01 0.5512

Means 8279.123 0.2868 34.65 0.3865

Mean Average Precision and Prediction time for the 10 studied classes of Caltech256

We also applied the proposed method to a real time relevance feedback mechanism based on freely selected image regions. Experiments show that the active learning provides significant effectiveness improvements (see figure and , for more details).
Software IKONA/MAESTRO software Raffi Enficiaud Nicolas Hervé Jean-Paul Chieze Mondher Khadhraoui Souheil Selmi Alexis Joly Nozha Boujemaa CBIR image retrieval by content relevance feedback user interface video retrieval
As each year, the integration of latest research results has been achieved.

Some OS specific dependencies, on which we were working in 2008, were now tested on several platforms such as different Unices (32 and 64 bits), Mac OSX and with different compilers. The program now runs safely on all these architectures. To achieve that, we progressively moved to the Boost library (threads, unit tests , graphs, linear algebra supports) which is known for its quality, portability and permissive licence. The dependencies of the core software were also cleaned, and we are now able to build easily the software from scratch, which was previously possible only at the expense of a lot of time. A particular care was taken concerning the licence issues of these dependencies.

A lot of efforts were also put on the quality and stability of the program in order to develop new functionalities without introducing regressions, and to safely deploy MAESTRO on distant servers within the scope of industrial partnerships (need of VITALAS, Pl@ntNet,...). Automatic reports are built at each modification of the code repository, for several architectures and platforms in a distributed manner.
PMH library Alexis Joly Olivier Buisson INA indexing structure hashing locality sensitive hashing approximate similarity search nearest neighbours multi-probe
During year 2009, we finalized jointly with INA a release version v1 of PMH library. The global architecture of the library was modified in order to make it more flexible and generic. Several parts of of indexing chain have been isolated in independent modules: a Transformer module that enables to project original data in new feature space, a scalar Quantizer module, a multi-dimensional Quantizer module that enables the creation of the hash keys and a query modeller that enables to learn the prior probability tables for a given query model. Several new functionalities were also added, the most important one being the ability to search directly on binary compressed signatures instead of original space signatures. The second main new functionality is a two-level hierarchy of hash tables that allows to bridge the previous limit of hash key sizes induced by memory limitations. Other new functionalities include new metrics, new scalar quantization and new hash function families. Discussions with INA regarding licencing of the software are currently engaged with the objective to have an open source licence next year. Finally, new research results on random maximum margin hashing have been implemented for experiments and should be fully integrated in the next few months.
Other Grants and Activities National Initiatives Pl@ntNet project [2009-2012]
It is a joint project with AMAP (CIRAD, Montpellier) and Tela Botanica, an international botanical network with 8,500 members and an active collaborative web platform (10,000 visits /day). The project has its financial support from Agropolis International Foundation ( http:// www. agropolis. fr/ ) and is titled “Plant Computational Identification and Collaborative Information System”.

Dissemination:

Project presentation at SIA 2009 (“La botanique numérique” meeting, Salon International de l'Agriculture) February 23, 2009, Paris, France.

Posters ( , ) and demo at e-Biosphere 09, June 1-3 2009, London, U.K.

Presentation at the XIII Congrès Forestier Mondial in Buenos Aires, in October 2009 .

Presentations at the Taxonomic Database Working Group annual conference , (TDWG 2009), November 9-13 2009, Montpellier, France.

Other collaborations with INRA
The PhD thesis of Wajih Ouertani, financed by INRA, in the context of a strategic collaboration between INRIA and INRA, addresses interactive species identification through advanced relevance feedback mechanisms based on local image information.
ANR project R2I [2008-2010]
The project "R2I - Recherche Interactive d'Images" is a joint project which aims at designing new methods for interactive image search. The final goal of this project is a system which can index about one billion of images and provide users with advanced interaction capabilities. The partners are the company Exalead, a leader in the area of corporate network indexing and a specialist for user-centered approaches, the INRIA project-team Imedia, a research group with a strong background in interactive search of multi-media documents, as well as LEAR and the University of Caen, both specialists in object recognition. Amel Hamzaoui begun her PhD thesis inside the R2I project (see section for more details).
European Initiatives Integrated project “VITALAS” [2007-2009]
“Video & image indexing and retrieval in the large scale” ( http:// vitalas. ercim. org/ ) in the call6 of 6th Framework Programme. VITALAS is an innovative project designed to provide advanced solution for indexing, searching and accessing large scale digital audio-visual content through cross-modal content enrichment and personalised. The strength of this initiative relies on the capacity of the project to confront its technology to real use-cases, reflecting the joint concerns of two major European content providers. The project will develop new technological functionalities and services to access to large scale multimedia databases. The project is composed of 12 industrial and academic European partners (ERCIM, EADS, CWI, Fraunhofer, Robotiker, INA, Univ. of Sunderland, CERTH-ITI, Codeworks, Belga, IRT). AFP (Agence France Press) has recently joined the project. Nozha Boujemaa is the scientific coordinator of the project, Alexis Joly and Anne Verroust-Blondet are Workpackage leaders.

A presentation of VITALAS Project (The VITALAS project : Video and Image Indexing and Retrieval in the Large Scale) has been made at the International Symposium of the THESEUS Research Program - Kick-off talks, in Berlin, Germany, in June 2009.

The demo of the second version of VITALAS system has been made at CHORUS Final conference in Brussels, Belgium, in May 2009.

VITALAS participated to the TRECVID-2009 evaluation (High-Level Feature Extraction and Interactive search tasks, )
Coordination Action “CHORUS” [2007-2009]
CHORUS is a coordination Action in the field of Audio-Visual Search Engines accepted in the call6 of the 6th Framework Programme ( http:// www. ist-chorus. org/ ). An important objective of the project consists of supporting the preparation of an analysis and a roadmap for the realisation of Audio-visual search engines in EU. Hence, CHORUS coordinates all the ongoing European and national efforts/projects on the topic of "multimedia search engine". The consortium represents established and well-reputed research institutions and consultancies with a broad range of intellectual and technological expertise in the area, both as regards concrete actions and policy development and track records of national, Union-wide, and international cooperation and activity (Thomson, Philips, JCP consult, IRT, France Telecom, Exalead, etc.). Nozha Boujemaa is the scientific co-ordinator of the project.

The EU coordination action CHORUS organised the international CHORUS conference (held May 26-28, 2009 in Brussels, Belgium) to present its final report, which identified cross-disciplinary challenges and recommendations in the domain of search engine technology. In addition to high representatives of the European Commission, the conference was attended by major industrial (e.g. Yahoo!, Thomson, Phillips, Exalead etc.) and academic stakeholder of the search engine community (including representatives from North America and Japan).
International Initiatives Cooperation with NII, Japan
Joint collaboration with Shin'Ichi Satoh and Michael Houle has been established since 2006. Several visits and mobilities have been achieved between IMEDIA and NII. The main topics consist on social web mining, scalable clustering and object recognition.
Cooperation with John Hopkins University, USA
Don Geman is a regular visiting professor since several years; The scientific topics adressed are related to relevance feedback and mental category image search.
Cooperation with Tunisia: CIVE project
The CIVE (Classification d'Images d'espèces VEgétales) project is a collaborative project between AMAP, INRIA, ISI (Institut Supérieur d'Informatique de Tunisie) and Sup'Com (Ecole Supérieure de Communication de Tunis). It is financed by both the Tunisian Universities and INRIA.

Participation to the “Conférence annuelle de la société savante des naturalistes en Tunisie” in Hammamet, in November 2009.
MUSCLE-VCD corpora for benchmarking
In 2007, IMEDIA did organise the first international benchmark event about video copy detection technologies, as a "live" event during ACM CIVR 2007 conference ( http:// www-rocq. inria. fr/ imedia/ civr-bench/ benchMuscle. html). This corpus is still maintained and distributed with success.
Dissemination Seminars, presentations and other dissemination activities Demos
Demos of IKONA/MAESTRO software have been presented at:

CHORUS Final conferenceMay 26-27, 2009 in Brussels, Belgium;

Salon Européen de la recherche et de l'innovationJune 3-5, 2009 in Paris, France.

Dissemination to large public and European community

Pl@ntNet:La Tribune (March 3 2009) “L'herbier collaboratif arrive”, les Echos (March 2009) “La reconnaissance vidéo au service de la botanique”, RFI and France3.

See also http:// www. inria. fr/ actualites/ 2009/ ikona. fr. html

VITALAS:Les Echos (June 26 2009) “Le moteur de recherche vidéo européen Vitalas entre en piste”

European dissemination during the Final Chorus Conference: http:// www. ist-chorus. org/ conference_day1. asp http:// www. ist-chorus. org/ conference_day2. asp

Leadership with scientific community Nozha Boujemaa

General co-Chair of ACM Multimedia Information Retrieval (ACM MIR 2010 - March 29-31 - Philadelphia, Pennsylvania): http:// riemann. ist. psu. edu/ mir2010/ (Preparation of the event and paper selection in 2009)



Co-chair of "Track V: Multimedia and Document Analysis, Processing and Retrieval" in ICPR 2010: International Conference on Pattern Recognition 2010 (23-26 August Istanbul), http:// www. icpr2010. org/    and http:// www. icpr2010. org/ tracks. php#Multimedia_and_Document_Analysis_Processing_and_Retrieval(Preparation of the event (topic and committee selection) in 2009)

Chair of "Brave New Ideas" in ACM Multimedia 2010 25-29 October 2010, Florence, Italy ( http:// www. acmmm10. org/ ) following ACM MM'09 current conference: http:// www. acmmm09. org/ (Definition of the session scope and format in 2009 for the lunching of the call for papers)



Founding member of the ACM ICMR "ACM International Conference on Multimedia Retrieval" born from the fusion of: ACM MIR (International Conference on Multimedia Information Retrieval) and ACM CIVR (International Conference on Image and Video Retrieval)

Member of the steering committee of ACM ICMR (4 years)

Program Chair of final Chorus conference http:// www. ist-chorus. org/ conference. asp

Member of the Scientific Advisory board of the japanese project "Multimedia Web Social Analysis and Mining" supported by the MEXT (Japonese ministry of Research)

Member of an academia Think-Tank for the PPP european initiative.

Scientific coordinator of VITALAS IP FP6;

Scientific coordinator of CHORUS CA FP6;

Expert for ESF (European Science Foundation : http:// www. esf. org), appointed since 2008

Expert for the EC for FP7 preparation, participation to several expert meetings.

Expert for NWO (Netherland)

Elected member in the Steering Board of NEM ETP (Networked and Electronic Media European Technology Platform) and acting as INRIA representative

Invited speaker in the "Session: Content" http:// www. prime-pco. com/ nict-nwgn/ events/ 2ndEUJsymposium/ during de EU-Japan symposium on Futur Internet, Tokyo, Japan;

Organizers: European Commission, NICT - Tokyo, october 2009

Member of Futur Internet of Content (FCN) Cluster within FIA. - Scientific committee of "Search and Discovery" session at FIA (Futur Internet Assembly) - Stockholm nov 2009

French expert for COST ICT Domain (intergovernmental network for European Cooperation in the field of Scientific and Technical Research)

Member of ACM - SIGMM committee and of ACM Multimedia Information Retrieval International Conference steering committee

Member of the Editorial board of scientific journals: I3, PRA

Member of several Technical program committees (TPC) of major international conferences: ACM MM, ACM, CIVR, ACM, MIR, IEEE ICME, IEEE ICPR, CBMI, SAMT, WIAMIS...

Responsabilities within INRIA:   member of COPIL SDRH of INRIA (comité de pilotage national sur les priorités et la prospective de la politique des ressources humaines), member of the “Comité d'animation scientifique” of the research topic "Perception, cognition, interaction" of INRIA,     member of the direction team of the CRI Paris-Rocquencourt to represent the researchers and     member of the BCP (Bureau du comité des projets) of the CRI Paris-Rocquencourt

Michel Crucianu

Co-organizer of the ISIS Workshop “Scalability in multimedia information retrieval” ( http:// gdr-isis. org/ rilk/ gdr/ ReunionListe-502), June 9, 2009, Telecom ParisTech, Paris.

Scientific expert for the French National Research Agency (ANR), call “Programme Blanc”, and for PENEK (Cyprus).

On leave from CNAM between March and August 2009 (“Congé pour Recherches”) at INRIA Rocquencourt and New Jersey Institute of Technology (USA).

Journal reviewer: IEEE Transactions on Neural Networks, Information Science, Multimedia Tools and Applications, Pattern Recognition Letters.

Raffi Enficiaud

Member of the steering committee of Pl@ntNet: project assistant-coordinator, WP3 co-leader (platform specifications and development) and WP5 co-leader (dissemination).

Journal Reviewer: IEEE Transactions on Image Processing.

Joost Geurts

Technical Programme committee member for SAMT 2009, the international conference on Semantic and Digital Media Technologies.

Alexis Joly

Member of the steering committee of VITALAS IP FP6 (leader of WP2 "Enabling technologies: Media Content Description and Summarisation").

Member of the steering committee of Pl@ntNet: WP2 co-leader.

Anne Verroust-Blondet

Member of the steering committee of VITALAS IP FP6 (leader of WP7:“User interface and visualisation”),

Member of the steering committee of Pl@ntNet (WP5 co-leader: “Dissemination”).

Member of the Humanities and Social Sciences committee for the “Blanc” and “Young researcher” 2009 programmes of the French National Research Agency (ANR),

Member of the steering committee of the CNRS GDR IG (Informatique Graphique) ;

Member of the technical programme committee of the First International Conference on Advances in Multimedia (MMEDIA 2009),

Member of the editorial board of the “Revue Electronique Francophone d'Informatique Graphique”.

Journal Reviewer: International Journal of Computer Vision, IEEE Transactions on Visualization and Computer Graphics, Computers & Graphics.

Teaching Nozha Boujemaa
ANNEE 2008

20h course on multimedia indexing at ISI and SupCom Tunis.

Michel Crucianu

In charge of the course “Multimedia Databases” of the Master in computer science of the University Paris Dauphine.

Raffi Enficiaud

24h training course on “Advanced C++ programming” for researchers and PhD students at INRIA Rocquencourt in February 2009.

Nicolas Hervé

24h TP on Java Database Connectivity (JDBC), CNAM 3rd year (NFA011)

Itheri Yahiaoui

192 hours in the Mathematic and Computer Science Departement of Reims Champagne Ardenne University;

In charge of the course "Images Acquisition and Analyses" of the Master "enginnering, images and knowlegle" of Reims Champagne Ardenne University.

Context-Based Object-Class Recognition and Retrieval by Generalized Correlograms Jaume Amores J. Nicu Sebe N. P. Radeva P. IEEE Transactions on Pattern Analysis and Machine Intelligence 29 10 2007 1818 – 1833 Alignment of 3D models Mohamed Chaouch M. Anne Verroust-Blondet A. Graphical Models 71 2 March 2009 63–76 An Interactive System for Mental Face Retrieval Y. Fang Y. Donald Geman D. Nozha Boujemaa N. 7th ACM SIGMM International Workshop on Multimedia Information Retrieval, Singapore in conjunction with ACM Multimedia 2005 November 10–11 2005 Mental image search by boolean composition of region categories Julien Fauqueur J. Nozha Boujemaa N. Multimedia Tools and Applications September 2006 95-117 Semantic interactive image retrieval combining visual and conceptual content description Marin Ferecatu M. Nozha Boujemaa N. Michel Crucianu M. ACM Multimedia Systems 2007 Active semi-supervised fuzzy clustering Nizar Grira N. Michel Crucianu M. Nozha Boujemaa N. Pattern Recognition 41 5 2008 1834-1844 Recherche par le contenu d'objets 3D Mohamed Chaouch M. Telecom ParisTech March 2009 Ph. D. Thesis Vers une description efficace du contenu visuel pour l'annotation automatique d'images Nicolas Hervé N. Paris-Sud University June 2009 Ph. D. Thesis Alignment of 3D models Mohamed Chaouch M. Anne Verroust-Blondet A. 1524-0703 Graphical Models 71 2 March 2009 63–76 Scene Pathfinder: Unsupervised Clustering Techniques for Movie Scenes Extraction Mehdi Ellouze M. Nozha Boujemaa N. Adel M. Alimi A. M. 1380-7501 Multimedia Tools and Applications 2009 to appear A statistical framework for image category search from a mental picture Marin Ferecatu M. Donald Geman D. 0162-8828 IEEE Transactions on Pattern Analysis and Machine Intelligence 31 6 June 2009 1087–1101 Nicolas Hervé N. Nozha Boujemaa N. Automatic image annotation Encyclopedia of database systems Springer 2009 ViCopT: a robust system for content-based video copy detection in large databases Julien Law-To J. Olivier Buisson O. Valérie Gouet-Brunet V. Nozha Boujemaa N. 0942-4962 Multimedia Systems 15 6 December 2009 337–353 Texture Based Satellite Image indexing with Local Binary Pattern Correlograms Sahbi Bahroun S. Ziad Belhadj Z. Nozha Boujemaa N. The First International Conference on Internet Multimedia Computing and Service (ICIMCS 2009), Kunming, Yunnan, China November 2009 International Conference on Internet Multimedia Computing and Service 1 ICIMCS The Pl@ntNet project: plant computational identification & collaborative information system Daniel Barthélémy D. Nozha Boujemaa N. Daniel Mathieu D. Jean-François Molino J.-F. e-Biosphere 09 : International Conference on Biodiversity Informatics, London July 2009 International Conference on Biodiversity Informatics 2009 e-Biosphere Agropolis global presentation of Pl@ntNet Daniel Barthélémy D. Nozha Boujemaa N. Daniel Mathieu D. Jean-François Molino J.-F. Pierre Bonnet P. Raffi Enficiaud R. Elise Mouysset E. TDWG 2009 Annual Conference, Montpellier, France November 2009 TDWG Annual Conference 2009 TDWG Contextual Classification of High-Resolution Satellite Images Olfa Besbes O. Nozha Boujemaa N. Ziad Belhadj Z. IEEE Symposium on Computational Intelligence for Image Processing (CIIP 2009), Waterloo, Canada April 2009 41–47 IEEE Symposium on Computational Intelligence for Image Processing 2009 CIIP Cue Integration for Urban Area Extraction in Remote Sensing Images Olfa Besbes O. Nozha Boujemaa N. Ziad Belhadj Z. International Conference on Image Analysis and Recognition (ICIAR 2009), Halifax, Canada July 2009 International Conference on Image Analysis and Recognition 2009 ICIAR 3D Gaussian Descriptor for 3D Shape Retrieval Mohamed Chaouch M. Anne Verroust-Blondet A. IEEE International Conference on Multimedia and Expo (ICME09), New York June 2009 IEEE International Conference on Multimedia and Expo 2009 ICME Advances in taxonomic identification by image recognition with the generic content-based image retrieval IKONA Mathieu Coutaud M. Pierre Bonnet P. Alexis Joly A. Raffi Enficiaud R. Nozha Boujemaa N. Daniel Barthélémy D. e-Biosphere 09 : International Conference on Biodiversity Informatics, London July 2009 International Conference on Biodiversity Informatics 2009 e-Biosphere VITALAS at TRECVID-2009 Christos Diou C. George Stephanopoulos G. Nikos Dimitriou N. Panagiotis Panagiotopoulos P. Christos Papachristou C. Anastasios Delopoulos A. Henning Rode H. Theodora Tsikrika T. Arjen de Vries A. Daniel Schneider D. Jochen Schwenninger J. Marie-Luce Viaud M.-L. Agnès Saulnier A. Peter Altendorf P. Birgit Schröter B. Matthias Elser M. Angel Rego A. Alex Rodriguez A. Cristina Martínez C. Iñaki Etxaniz I. Gérard Dupont G. Bruno Grilhères B. Nicolas Martin N. Nozha Boujemaa N. Alexis Joly A. Raffi Enficiaud R. Anne Verroust-Blondet A. Souheil Selmi S. Mondher Khadhraoui M. NIST TRECVID Workshop, Gaithersburg, MD November 2009 TREC Video Retrieval Workshop 2008 TRECVID New challenges for visual information retrieval in biodiversity applications Raffi Enficiaud R. Nozha Boujemaa N. TDWG 2009 Annual Conference, Montpellier, France November 2009 TDWG Annual Conference 2009 TDWG SHREC 2009 - Generic shape retrieval contest Afzal Godil A. Helin Dutagaci H. Ceyhun Akgül C. Apostolos Axenopoulos A. Benjamin Bustos B. Mohamed Chaouch M. Petros Daras P. Takahiko Furuya T. Sebastian Kreft S. Zhouhui Lian Z. Thibault Napoléon T. Athanasios Mamledis A. Ryutarou Ohbuchi R. Paul Rosin P. Bülent Sankur B. Tobias Schreck T. Xianfang Sun X. Masaki Tezuka M. Anne Verroust-Blondet A. Michael Walter M. Yücel Yemez Y. Eurographics Workshop on 3D Object Retrieval (3DOR'09), Berlin March 2009 61–68 Eurographics Workshop on 3D Object Retrieval 2009 3DOR Visual word pairs for automatic image annotation Nicolas Hervé N. Nozha Boujemaa N. IEEE International Conference on Multimedia and Expo (ICME09), New York June 2009 IEEE International Conference on Multimedia and Expo 2009 ICME Document description: what works for images should also work for text? Nicolas Hervé N. Nozha Boujemaa N. Michael Houle M. 21st Annual IS&T/SPIE Symposium on Electronic Imaging, San Jose January 2009 SPIE-IS&T Electronic Imaging Symposium 21 SPIE-IS&T Logo Retrieval with A Contrario Visual Query Expansion Alexis Joly A. Olivier Buisson O. Seventeen ACM international conference on Multimedia (MM '09), New York, NY, USA ACM 2009 ACM International Conference on Multimedia 17 ACMM Interactive objects retrieval with efficient boosting Saloua Litayem S. Alexis Joly A. Nozha Boujemaa N. Seventeen ACM international conference on Multimedia (MM '09), New York, NY, USA ACM 2009 545–548 ACM International Conference on Multimedia 17 ACMM The Pl@ntNet project : Plant Computational Identification and Collaborative Information System (Proyecto RedVegetal: un sistema de identificacion de flora por computadora y de informacion en cooperacion) Elise Mouysset E. Daniel Barthélémy D. Nozha Boujemaa N. Daniel Mathieu D. Jean-François Molino J.-F. Pierre Bonnet P. Raffi Enficiaud R. XIII Congrès Forestier Mondial, Buenos Aires, Argentina October 2009 Congres Forestier Mondial 13 Traitement morphologique des images de feuilles Sofiène Mouine S. Institut Supérieur d'Informatique (Tunisia) September 2009 Masters thesis Étude et développement d'un outil interactif permettant de calculer la pose optimale d'un objet 3D Skander El Fekih S. Institut Supérieur d'Informatique (Tunisia) September 2009 Masters thesis Large scale object retrieval on local visual features Saloua Litayem S. Institut Supérieur d'Informatique (Tunisia) September 2009 Masters thesis Generalizing Swendsen-Wang to Sampling Arbitrary Posterior Probabilities. A. Barbu A. S.C. Zhu S. IEEE Transactions on Pattern Analysis and Machine Intelligence 27 8 2005 1239–1253 Algorithmes sans biais de Ligne de Partage des Eaux Serge Beucher S. Mathematical Morphology Center, Paris School of Mines April 2004 Technical report Use of watersheds in contour detection Serge Beucher S. Christian Lantuéjoul C. International Workshop on image processing, real-time edge and motion detection/estimation Rennes, France September 1979 "Sur la classification non-exclusive en analyse d'images" Nozha Boujemaa N. Université de Versailles-Saint-Quentin 2000 Habilitation à diriger des recherches Ikona: Interactive specific and generic image retrieval Nozha Boujemaa N. Julien Fauqueur J. Marin Ferecatu M. François Fleuret F. Valérie Gouet-Brunet V. Bertrand Le Saux B. Hichem Sahbi H. International workshop on Multimedia Content-Based Indexing and Retrieval (MMCBIR'2001) 2001 A New Descriptor for 2D Depth Image Indexing and 3D Model Retrieval Mohamed Chaouch M. Anne Verroust-Blondet A. IEEE International Conference on Image Processing 2007 (ICIP 2007), San Antonio, Texas, USA September 2007 Movie scene segmentation using background information L.H. Chen L. Y.C. Lai Y. H.Y.M. Liao H. Pattern Recognition 41 3 March 2008 1056-1065 Efficiently matching sets of features with random histograms Wei Dong W. Zhe Wang Z. Moses Charikar M. Kai Li K. MM'08: Proceeding of the 16th ACM international conference on Multimedia, New York, NY, USA ACM 2008 179–188 http:// doi. acm. org/ 10. 1145/ 1459359. 1459384 Multi-dimensional and multi-spectral algorithms in the field of Mathematical Morphology : The meta-programming approach. Raffi Enficiaud R. Mathematical Morphology Center, Paris School of Mines 2007 Ph. D. Thesis Détection hiérarchique de visages par apprentissage statistique François Fleuret F. Université Paris-VI, Paris 2000 Ph. D. Thesis Structuration de collections d'images par apprentissage actif crédibiliste Hervé Goëau H. University of Grenoble May 2009 Ph. D. Thesis The Pyramid Match Kernel: Efficient Learning with Sets of Features Kristen Grauman K. Trevor Darrell T. Journal of Machine Learning Research 8 2007 725–760 A Posteriori Multi-Probe Locality Sensitive Hashing Alexis Joly A. Olivier Buisson O. ACM International Conference on Multimedia (MM'08), Vancouver, British Columbia, Canada October 2008 209–218 Discriminative Random Fields. S. Kumar S. M. Hebert M. Int. J. Computer Vision 68 2 2006 179–201 Generic Object Recognition with Boosting Andreas Opelt A. Michael Fussenegger M. Peter Auer P. IEEE Trans. Pattern Anal. Mach. Intell. 28 3 2006 416–431 Detection and Representation of Scenes in Videos Z. Rasheed Z. M. Shah M. Multimedia, IEEE Transactions on 7 6 2005 1097–1105 http:// ieeexplore. ieee. org/ xpls/ abs_all. jsp?arnumber=1542086 Video Scene Segmentation Using Video And Audio Features Hari Sundaram H. Shih fu Chang S. IEEE International Conference on Multimedia and Expo 2000 Sharing Visual Features for Multiclass and Multiview Object Detection. A. B. Torralba A. B. K. P. Murphy K. P. W. T. Freeman W. T. IEEE Transactions on Pattern Analysis and Machine Intelligence 29 5 2007 854–869 Shape-based image retrieval in botanical collections Itheri Yahiaoui I. Nicolas Hervé N. Nozha Boujemaa N. Pacific-Rim Conference on Multimedia (PCM'06), Hangzhou, China November 2006