An Interactive System for Mental Face Retrieval

Imedia Images and Multimedia : Indexing, Retrieval and Navigation COG Nozha Boujemaa Research Director (DR2) Laurence Bourcier shared with Salsa and Micmac project-team Anne Verroust-Blondet Research Scientist (CR1) Jean-Paul Chièze Senior Technical Staff, half-time Olivier Buisson Research Scientist at Institut National de l?Audiovisuel Michel Crucianu Professor at CNAM Valérie Gouet-Brunet Assistant Professor at CNAM Jean-Philippe Tarel Research Scientist (CR1) at LCPC Peter Belhumeur Professor at Columbia University, USA (January to July 2005) Donald Geman Professor at Johns Hopkins University, USA (Febuary, March, April 2005) Masashi Inoue Assistant Professor at NII (National Institute of Informatics) Japon (3rd October to 1st November 2005) Mathieu Coutaud Junior Technical Staff Nicolas Hervé Technical Staff Alexis Joly Technical Staff Sabri Boughorbel ACET Scientifique Marin Ferecatu ACET Scientifique Ithéri Yahiaoui ACET Scientifique Olfa Besbes Joint tutorship with Sup'Com, national grant since 1/9/2004 Mohamed Chaouch INRIA grant, Télécom Paris Nizar Grira INRIA grant, Orsay University Paris XI Hichem Houissa INRIA grant, Orsay University Paris XI Julien Law-To CIFRE grant with INA since 1/5/2004 Construction and structuring of the feature space Pattern recognition Interactive retrieval Application & Software Other topics Overall Objectives Overall Objectives

One of the consequences of the increasing ease of use and significant cost reduction of computer systems is the production and exchange of more and more digital and multimedia documents. These documents are fundamentally heterogeneous in structure and content as they usually contain text, images, graphics, video and sounds.

Information retrieval can no longer rely on text-based queries alone; it will have to be multi-modal and to integrate all the aspects of the multimedia content. In particular, the visual content has a major role and represents a central vector for the transmission of information. The description of that content by means of image analysis techniques is less subjective than the usual keyword-based annotations, whenever they exist. Moreover, being independent from the query language, the description of visual content is becoming paramount for the efficient exploration of a multimedia stream.

In the IMEDIA group we focus on the intelligent access by visual content. With this goal in mind, we develop methods that address key issues such as content-based indexing, interactive search and image database navigation, in the context of multimedia content.

Content-based image retrieval systems provide help for the automatic search and assist human decisions. The user remains the maître d'oeuvre, the only one able to take the final decision. The numerous research activities in this field during the last decade have proven that retrieval based on the visual content was feasible. Nevertheless, current practice shows that a usability gap remains between the designers of these techniques/methods and their potential users.

One of the main goals of our research group is to reduce the gap between the real usages and the functionalities resulting from our research on visual content-based information retrieval. Thus, we apply ourselves to conceive methods and techniques that can address realistic scenarios, which often lead to exciting methodological challenges.

Among the "usage" objectives, an important one is the ability, for the user, to express his specific visual interest for a part ofa picture. It allows him to better target his intention and to formulate it more accurately. Another goal in the same spirit is to express subjective preferences and to provide the system with the ability to learn those preferences. When dealing with any of these issues, we keep in mind the importance of the response time of such interactive systems. Of course, what value the response time should have and how critical it is depends heavily on the domain (specific or generic) and on the cost of the errors.

Our research work is then at the intersection of several scientific specialties. The main ones are image analysis, pattern recognition, statistical learning, human-machine interaction and database systems.

Our work is structured into the following main themes :

Image indexing : this part mainly concerns modeling the visual aspect of images, by means of image analysis techniques. It leads to the design of image signatures that can then be obtained automatically.

Clustering and statistical learning : generic and fundamental methods for solving problems of pattern recognition, which are central in the context of image indexing.

Interactive search and personalization : to let the system take into account the preferences of the user, who usually expresses subjective or high-level semantic queries.

Cross-media indexing, and in particular bimodal text + imageindexing, which addresses the challenge of combining those two media for a more efficient indexing and retrieval.

More generally, the research work and the academic and industrial collaborations of the IMEDIA team aim to answer the complex problem of the intelligent access to multimedia content.

Scientific Foundations Introduction

We group the existing problems in the domain of content-based image indexing and retrieval in the following themes : image indexing, pattern recognition, personalization and cross-media indexing. In the following we give a short introduction to each of these themes.

Modeling, construction and structuring of the feature space Nozha Boujemaa Valérie Gouet-Brunet Ithéri Yahiaoui Hichem Houissa Sabri Bougorbel Jean-Philippe Tarel Nizar Grira Anne Verroust-Blondet Hichem Sahbi Jean-Paul Chièze visual appearance image features and signatures indexing of visual content image analysis pattern recognition visual similarity matching Content-based indexing

the process of extracting from a document (here a picture) compact and structured significant visual features that will be used and compared during the interactive search.

The goal of the IMEDIA team is to provide the user with the ability to do content-based search into image databases in a way that is both intelligent and intuitive to the users. When formulated in concrete terms, this problem gives birth to several mathematical and algorithmic challenges.

To represent the content of an image, we are looking for a representation that is both compact (less data and more semantics), relevant (with respect to the visual content and the users) and fast to compute and compare. The choice of the feature space consists in selecting the significant features, the descriptorsfor those features and eventually the encoding of those descriptors as image signatures.

We deal both with generic databases, in which images are heterogeneous (for instance, search of Internet images), and with specific databases, dedicated to a specific application field. The specific databases are usually provided with a ground-truth and have an homogeneous content (faces, medical images, fingerprints, etc.)

Note that for specific databases one can develop dedicated and optimal features for the application considered (face recognition, etc.). On the contrary, generic databases require generic features (color, textures, shapes, etc.).

We must not only distinguish generic and specific signatures, but also local and global ones. They correspond respectively to queries concerning parts of pictures or entire pictures. In this case, we can again distinguish approximate and precise queries. In the latter case one has to be provided with various descriptions of parts of images, as well as with means to specify them as regions of interest. In particular, we have to define both global and local similarity measures.

Also, since the arrival of Anne Verroust-Blondet, we have been investigating the problem of 3D model description, in order to complete our approach of the description of the visual appearance in 2D and 3D.

When the computation of signatures is over, the image database is finally encoded as a set of points in a high-dimensional space : the feature space.

A second step in the construction of the index can be valuable when dealing with very high-dimensional feature spaces. It consists in pre-structuring the set of signatures and storing it efficiently, in order to reduce access time for future queries (tradeoff between the access time and the cost of storage). In this second step, we have to address problems that have been dealt with for some time in the database community, but arise here in a new context : image databases. The diversity of the feature spaces we deal with force us to design specific methods for structuring each of these spaces. A collaboration on this topic is under way with Michel Scholl (INRIA/CNAM).

Pattern recognition and statistical learning

Statistical learning and classification methods are of central interest for content-based image retrieval .

We consider here both supervised and unsupervised methods. Depending on our knowledge of the contents of a database, we may or may not be provided with a set of labeled training examples. For the detection of knownobjects, methods based on hierarchies of classifiers have been investigated. In this context, face detection was a main topic, as it can automatically provide a high-level semantic information about video streams. For a collection of pictures whose content is unknown, e.g. in a navigation scenario, we are investigating techniques that adaptatively identify homogeneous clusters of images, which represent a challenging problem due to feature space configuration.

Statistical learning and object detection Donald Geman Hichem Sahbi Sabri Boughorbel Jean-Philippe Tarel Statistical learning algorithmic optimization kernel methods

Object detection is the most straightforward solution to the challenge of content-based image indexing. Classical approaches (artificial neural networks, support vector machines, etc.) are based on induction, they construct generalization rules from training examples. The generalization error of these techniques can be controlled, given the complexity of the models considered and the size of the training set.

Our research on object detection addresses the design of invariant kernels and algorithmically efficient solutions. We have developed several algorithms for face detection based on a hierarchical combination of simple two-class classifiers. Such architectures concentrate the computation on ambiguous parts of the scene and achieve error rates as good as those of far more expensive techniques. The computational efficiency we are looking for has the effect of a regularization constraint : it favors structurally simple classifiers, which have good generalization properties.

Beside this work focusing on the trade-off between error rate and computational cost, we are working on the design of invariant kernels for vision. We have worked on the scale invariance of kernel methods based on the triangular kernel, and we have unified kernel methods and the matching of points of interest by designing matching kernels.

These high invariance of matching schemes to the view-based representation underlying support vector machines or other kernel methods.

Clustering methods Nozha Boujemaa Nizar Grira Michel Crucianu Hichem Sahbi Itheri Yahiaoui clustering membership number of classes pattern recognition competitive agglomeration

Unsupervized clustering techniques automatically define categories and are for us a matter of visual knowledge discovery. We need them in order to :

Solve the "page zero" problem by generating a visual summary of a database that takes into account all the available signatures together.

Perform image segmentation by clustering local image descriptors.

Structure and sort out the signature space for either global or local signatures, allowing a hierarchical search that is necessarily more efficient as it only requires to "scan" the representatives of the resulting clusters.

Given the complexity of the feature spaces we are considering, this is a very difficult task. Noise and class overlap challenge the estimation of the parameters for each cluster. The main aspects that define the clustering process and inevitably influence the quality of the result are the clustering criterion, the similarity measure and the data model.

We investigate a family of clustering methods based on the competitive agglomeration that allows us to cope with our primary requirements : estimate the unknown number of classes, handle noisy data and deal with classes (by using fuzzy memberships that delay the decision as much as possible).

Interactive search and personalization Marin Ferecatu Yuchun Fang Donald Geman Nozha Boujemaa Michel Crucianu Hichem Houissa Jean-Paul Chièze Jean-Philippe Tarel interaction with the user expression of preferences subjective clustering semantic gap relevance feedback statistical learning

We are studying here the approaches that allow for a reduction of the "semantic gap" There are several ways to deal with the semantic gap. One prior work is to optimize the fidelity of physical-content descriptors (image signatures) to visual content appearance of the images. The objective of this preliminary step is to bridge what we call the numerical gap. To minimize the numerical gap, we have to develop efficient images signatures. The weakness of visual retrieval results, due to the numerical gap, is often confusingly attributed to the semantic gap. We think that providing richer user-system interaction allows user expression on his preferences and focus on his semantic visual-content target.

Rich user expression comes in a variety of forms :

allow the user to notify his satisfaction (or not) on the system retrieval results–method commonly called relevance feedback. In this case, the user reaction expresses more generally a subjective preference and therefore can compensate for the semantic gap between visual appearance and the user intention,

provide precise visual query formulation that allows the user to select precisely its region of interest and pull off the image parts that are not representative of his visual target,

provide a mechanism to search for the user mental image when no starting image example is available. Several approaches are investigated. As an example, we can mention the logical composition from visual thesaurus. Besides, learning methods related to information theory are also developed for efficient relevance feedback model in several context study including mental image retrieval.

Cross-media indexing Marin Ferecatu Valerie Gouet-Brunet Nozha Boujemaa Michel Crucianu Hichem Sahbi hybrid indexing and search textual annotation information theory

We have described, up to now, our research approaches in using the visual content alone. But when additional information is available, it may prove complementary and potentially valuable in improving the results returned to the user. We may cite here metadata(file name, date of creation, caption, etc.) but also the textual annotations that are sometimes available. We must note that annotations usually carry high-level information related to a prior knowledge of the context. The use of these sources of information implies that we can speak of multimedia indexing.

We can think of several approaches for combining textual and visual information in the context of indexing and retrieval. As examples, we may cite the automatic textual annotation of images based on similarities between visual signatures or the propagation of textual annotations relying on the interaction between textual ontologies and visual ontologies. We also investigate methods that allow automatic textual annotation from visual content analysis. This part of our research activities is yet another solution for the reduction of the "semantic gap".

Application Domains Application Domains

Security applicationsExamples : Identify faces or digital fingerprints (biometry). Biometry is an interesting specific application for both a theoretical and an application (recognition, supervision, ...) point of view. Two PhDs were defended on themes related to biometry. Our team also worked with a database of images of stolen objects and a database of images after a search (for fighting pedophilia). We are currently collaborating with the Ministry of the Interior.

MultimediaExamples : Look for a specific shot in a movie, documentary or TV news, present a video summary. Our team has a collaboration with the TV channel TF1 in the context of a RIAM project. Text annotation is still very important in such applications, so that cross-media access is crucial.

Scientific applicationsExamples : environmental images databases : fauna and flora; satellite images databases : ground typology; medical images databases : find images of a pathological character for educational or investigation purposes. We have an ongoing project on multimedia access to biodiversity collections.

Culture, art and educationExamples : encyclopaedic research, query by example of paintings or drawings, query by a detail of an image. IMEDIA has been contacted by the French ministry of culture and by museums for their image archives.

Finding a specific texture for the textile industry, illustrating an advertisement by an appropriate picture. IMEDIA is working with a picture library that provides images for advertising agencies.

TelecommunicationsExamples : image representation and content-based queries stand as the basis of MPEG-4 and MPEG-7. IMEDIA does not contribute to their normative aspects but is interested in the latest results related to the MPEG-7 group. Note that the signatures developped by IMEDIA can be used with this norm.

Software IKONA/MAESTRO Software Nicolas Hervé Mathieu Coutaud Jean-Paul Chièze Marin Ferecatu User interface image retrieval by content CBIR relevance feedback

IKONA is a framework for building Content Based Image Retrieval software prototypes. It has been designed and implemented in our team during the last four years . The current version is fully generic and is highly adaptable to any CBIR scenario thanks to its level of abstraction. As a research environment, IKONA offers support to the researchers in their work by providing stable and tested tools. As an application, it can easily be deployed and used by non-specialist users.

IKONA is based on a client/server architecture. The communication between the two components is achieved through a proprietary network protocol. It is a set of commands the server understands and a set of answers it returns to the client. The communication protocol is extensible, i.e. it is easy to add new functionalities without disturbing the overall architecture. It is also modular and therefore can be replaced by any new or existing protocol dealing with multimedia information retrieval.

The main processes are on the server side. They can be separated in two main categories :

offline processes : data analysis, features extraction and structuration

online processes : answer the client requests

The images are characterized with Globalsignatures that are implemented in the server :

Generic signatures : Color, Shape and Texture features investigated at the Imedia Group.

Specific signatures : Faces and signatures for fingerprints.

Annotations : Some keywords.

Besides, two localsignatures are included : The region-based description and the point-based one. The server uses image signatures and offers several types of query paradigms, available to the user through the graphical interfaces of the clients :

query by global example : The user selects an entire image as visual query.

partial queries : the user is looking for regions in images that are visually similar to a the selected region.

relevance feedback on global and partial query : the user interacts with the system in a feedback loop, by giving positive and negative examples to help the system identify the category of images she/he is interested in ;

mental image search : Two different methods are investigated. The first is Target Image Search with relevance feed-back model based on mutual information, the second one consist on Logical Query Composition.

We have developped two main clients that can communicate with the server. A good starting point for exploring the possibilities offered by IKONA is our web demo, available at http://www-rocq.inria.fr/cgi-bin/imedia/ikona. This CGI client is connected to a running server with several generalist and specific image databases, including more than 23,000 images. It features query by example searches, switch database functionality and relevance feedback for image category searches. The second client is a desktop application. It offers more functionnalities. More screenshots describing the visual searching capabilities of IKONA are available at http://www-rocq.inria.fr/imedia/cbir-demo.html.

The architecture of this client/server software and several visual signatures were a subject of a deposit to APP.

New Results Construction and organization of the visual feature space Feature spaces structuring Nouha Bouteldja Valérie Gouet-Brunet Michel Scholl local descriptors high-dimensional feature spaces image index management multi-dimensional indexing

Several categories of image descriptors are studied in the IMEDIA group. Some of them, the global descriptors in particular, allow the interrogation of high-dimensional databases (about 500,000 images) in real-time with standard hardware. Other descriptors, such as the local descriptors involving points of interest, currently only allow the interrogation of small databases (about 3,000 images). Our objective is to fit to scale the descriptors developed at IMEDIA. For the moment, we focus on the local approaches, that do not allow real-time responses for the databases encountered in various applications.

Behaviour of SR-trees with points of interest distributions

This year, our first contribution was to experimentally assess whether the so-called curse of dimensionalityphenomenon is reached with various categories of distributions related to dimensions encountered with local descriptors (usually between 8 and 30). We compare the performance of a single sphere query when the collection is indexed by a tree structure (an SR-tree in our experiments) to that of a sequential scan. The tested distributions were :

synthetic clustered distributions ( D_C)

synthetic uniform distributions ( D_U)

real distributions ( D_R, obtained by an extraction and a local description of points of interest from a set of generalist images).

Some of the experiments realized are illustrated in figure ; it shows the ratio of the sequential scan CPU time over the CPU time obtained with the SR-tree traversal, versus the number of neighbors for D_Rand D_C.


D _R	D _C

Clearly, for the real dataset as well as for the clustered distribution, the curse of dimensionality is not reached. The speed up for dimension d= 29is significant even for large values of the number of neighbors. In contrast, as experimented earlier in the literature, there is no gain with the uniform data set (not shown here). These trends were confirmed by the study of the ratio of nodes acceded during the tree traversal. This ratio remains small for the D_Rand D_Cdistributions. All the experiments performed show that when the dimension of the space is moderate, i.e. between 8 and 30, the tree structure indeed performs well : it exhibits a significant gain wrt to sequential scan even in a 29-dimensional space.

Multiple queries

When considering objects or parts of image described with a set of local descriptors, searching in the feature space is usually done independently and sequentially for each local descriptor. We have studied multiple queriesapproaches , existing in the Database community, before applying and adapting them to the retrieval of groups of local descriptors. Two directions have been investigated :

Reduction of the I/O costs, by studying a new approach for searching in the multidimensional structure. A lot of multidimensional structures exist ; the structure considered is the SR-Tree one, which achieves good performances for feature spaces based on local descriptors (see previous section);

Reduction of the CPU costs, by considering relations existing on distances computed between several query points and points of the feature space, for instance the triangular inequality. The structure considered is also the SR-tree one, but the proposed improvements could be applied to any tree structure. We have revisited the two lemmas proposed in and proposed three novel ones.

Figure illustrates our approach : it presents CPU time consumed for similarity search with multiple queries, by using or not some of the lemmas. The evaluation has been done within a dataset of 1 million of real points of interest. The results show that the joint use of such lemmas allows to accelerate the search, whatever the dimensions.

All these studies are done in collaboration with the CEDRIC/Vertigo research group, during the thesis of Nouha Bouteldja in the CEDRIC laboratory, and were accepted for publication in an international conference .

Visual saliency Julien Law-To Olivier Buisson Valérie Gouet-Brunet Nozha Boujemaa Local descriptors spatio-temporal descriptors visual saliency real-time monitoring

This work is done in collaboration with the INA (French National Institute of Audiovisual), within the scope of the CIFRE thesis of Julien Law-To. The main application considered is real-time monitoring of huge databases of videos (about 100000 hours). At present, we focus more particularly in video content-based copy detection (CBCD).

In order to protect the INA from the piracy of its videos, we have to link the TV broadcast on one side to the video database on the other side (monitoring) in order to find identical sequences. The CBCD system must be robust to common transformations used in the TV post-production such as zooming, cropping, shifting, etc.

Local descriptors have been proved to be very useful for image indexing with application to object or sub-image retrieval. In the Computer Vision community, a number of recent techniques have been proposed to identify points of interest or regions of interest in images. If directly applied to image sequences, one of the drawbacks of such descriptors is its spatio-temporal redundancy. When considering applications like real-time monitoring, it is necessary to build a very compact description. In this context, two kinds of promising approaches can be investigated :

The spatio-temporal information can be exploited jointly, instead of working on the time dimension and the spatial dimension separately. See for example the work of Laptev on space-time interest points ;

To be more compact and more significant, the image description should involve new signatures that should be inspired by pre-attentive human vision and the focalization of attention. Such features have been widely studied by the neurophysiological community and can be modelized mathematically. See for example the work of Heidemann on focus-of-attention from local color symmetries.

This year, we have investigated the first class of approaches. Studying the trajectories of the points of interest is an efficient solution to avoid temporal redundancy and, moreover an interesting way to strongly fingerprint the video sequence. By modelizing and labelling the different kinds of behaviours of those points according to their trajectories, the CBCD system can be improved. Such a content-based video description is richer and more compact than the usual ones. It is also generic then allows to investigate new possibilities for video retrieval, such as :

Specific queries on video sequences, like object retrieval with a particular behaviour;

Segmentation of the video based on the points of interest behaviour (camera motion, cut, credits, ...).

Based on this description, the current experiments show an improvement on the quality of the monitoring process and on its flexibity. This work has been submitted to the international conference CVPR 2006.

Silhouette shape descriptor for image retrieval Itheri Yahiaoui Nozha Boujemaa Nicolas Hervé content-based image retrieval shape descriptor partial queries

Within the BIOTIM project, we have designed a new shape descriptor called Directional Fragment Histogram (DFH) , . This shape descriptor is computed using the outline of the plant. Both local and global directional information are coded. Local information are computed from segments taken randomly from the external contour. The accumulation of all possible segments of a specific length provides the global information. Several versions of this descriptor are studied and tested on different image databases. The implemented DFH versions are based respectively on the freeman chain-codes, the angles of the gradient, the curvatures and the combination of both angles of the gradient and curvatures. In our evaluation, the following two kinds of imge database are used :

Botanical databases

The Arabidopsis database provided by the INRA Institute. It consists of about 400 images.

The Swedish leaves database taken from the web site of the "the Swedish Museum of Natural History" . This database is composed of 1125 images and 15 classes, each classe represents a tree species.

The Smithsonian leaves database provided from the collaboration with the NSF projet "An Electronic Field Guide : Plant Exploration in the 21st Century". It contains 134 classes including 1520 images : see http://www.cfar.umd.edu/ gaaga/leaf/leaf.html.

Generic Objects databases

The Mpeg7 CE-Shape-1database taken from the web page of Professor Latecki : http://www.cis.temple.edu/ latecki/research.html, it contains 1400 images and 70 classes.

The Kimia databases is composed of three subdatasets. They contain respectively 99, 256 and 1032 silhouette images : see http://www.lems.brown.edu/vision/software/index.html.

The ETH-80 database includes 3280 images classified into 80 objects : see http://www.vision.ethz.ch/projects/categorization/eth80-db.html.

We evaluate the DFH gradient version as well as some existing shape descriptors on the Swedish database. These shape descriptors are IKONA shape descriptor based on the Hough transform , Edge Orientation Histogram (EOH) , MPEG-7 CSS descriptor and CCH descriptor . The figures represent respectively precision and ROC graphs.


(a)	(b)

As illustrated in these graphs, our DFH shape descriptor is more performant than the others. Noting that the CSS-MPEG7 descriptor gives comparable performances. However, its computation time is about 250 times compared to the DFH. Both CCH and EOH present lower results because they focus only on the global shape without taking into account local distribution of directions. The Figure shows "partial queries" results on two leaves taken from the swedish database.


(a)	(b)

The performances of the different versions of DFH are depending on shape database contenent. In future work, we intend to improve our DFH descriptor to handle this limitation.

Adaptive satellite images segmentation by level set multiregion competition Olfa Besbes Nozha Boujemaa Ziad Belhadj Unité URISA, École Supérieure des Communications de Tunis (SUP'COM), Cité Technologique des Communications de Tunis, 2088 Tunisie. level set theory adaptive multispectral image segmentation textured / non-textured regions discrimination power multiregion competition

The development and application of various remote sensing platforms result in the production of huge amounts of satellite image data. Therefore, there is an increasing need for effective querying and browsing in these image databases. Region-Based Image Retrieval (RBIR) is a powerful tool since it allows to search images containing similar objects of a reference image. It requires the satellite image to be segmented into a number of regions. Segmentation consists in partitioning the image into non-overlapping regions that are homogeneous with regards to some characteristics such as spectral and texture. Remote sensed images contain both textured and non-textured regions. This is even more true today with high resolution images such as IKONOS, SPOT-5 and QuickBird data. In order to cope with this content heterogeneity, we propose an adaptive variational segmentation algorithm This work is partially supported by QuerySat project and INRIA STIC project.using active curves evolution via level set . It is based on combining cues of spectral and texture according to their discrimination power for each region. The feature vector consists of spectral channels and 4 texture features given by the second order matrix and TV flow based local scale feature where a coupled nonlinear diffusion was applied . Motivated by Fisher-Rao's linear discriminant analysis, we define two region's weights which code respectively the relevance of spectral and texture cues. Therefore, regions with or without texture are processed in the same framework. The obtained segmentation criterion is minimized via curves evolution within an explicit correspondence between the interiors of evolving curves and regions of segmentation. Thus, an unambiguous segmentation to a given arbitrary number of regions is obtained by the multiregion competition algorithm . Experimental results on various panchromatic and multispectral images are shown in figures , and . In a future work, we intend to use the constructed nonlinear scale-space to provide a multiscale satellite images segmentation. Our aim is to decompose satellite image content into a hierarchy of attributed regions describing semantically topological relations and properties.

3D indexing : 2D/3D shape descriptors Mohamed Chaouch Anne Verroust-Blondet 3D indexing 3D model retrieval 3D descriptor 2D/3D descriptor silhouette depth-buffer image

We have implemented and compared the efficiency of several existing 2D/3D shape descriptors, i.e. descriptors built from 2D views of 3D models. We have improved the efficiency of the descriptors based on silhouettes or depth-buffer images. The 2D shape signatures are extracted from classical Fourier Transform (1DFFT for silhouettes and 2DFFT for depth-buffer images). We have introduced a new approach based on relevance index, which takes into account the diversity of information contained in the projection images. It was evaluated on the Princeton database (907 models) and these retrieval results (cf. Figure ) show its performance and robustness in the 3D-model retrieval process. This work is described in and is supported by the European Network of Excellence DELOS II within "Description, Matching and Retrieval by Content of 3D Objects" Project (Task 3.8 of JPA2).

Geometric consistency of local descriptors Alexis Joly Valérie Gouet-Brunet image retrieval local descriptors geometry geometric consistency

There is an increasing variety of content-based image retrieval scenarios involving the use of local descriptors (object class recognition , object and scene recognition , content-based copy detection ). Enhancing the performance of these scenarios by using the geometric distribution or the relative positions of the local descriptors is an active research area. During this year, we have shown that in the copy detection scenario, the robust estimation of a global geometric transformation model after the search is widely profitable to improve the discrimination of the detection (paper submitted to IEEE transactions on multimedia). However, for other scenarios, using the geometry remains a challenging task : Including the geometric distribution in the descriptor itself often leads to a lake of robustness during the search of similar local descriptors whereas post-processing techniques are generally highly time consuming and thus limited to very small data sets. Moreover, in most of them, the geometric consistency is limited to rigid transformation models which do not allow to enforce the matching when two geometric distributions are dependent but not linearely linked. For a few months, we are investigating the use of non parametric geometric consistency measurements such as mutual information and robust correlation ratio and we plane to combine them with some robust local geometric properties that could be included in the descriptor itself in order to limit the number of matches during the second step.

Discriminant local features selection Alexis Joly Nozha Boujemaa Shin'ichi Satoh image retrieval local features discriminant density estimation

This work was done in collaboration with the NII (National Institute of Japan) within the scope of the visit of Alexis Joly at the NII (july 2005). Local features are well-suited to content-based image retrieval because of their locality, their local uniqueness and their high information content . However, as they are selected only according to the local information content in the image, there is no guaranty that they will be distinctive in a large set of images. A local feature corresponding to a high saliency in the image can be highly redundant in some specific databases, such as the TV news database stored at NII in which textual characters are extremely frequent. To overcome this issue, we propose to select relevant local features directly according to their discrimination power in a specific set of images. By computing the density of the local features in a source database with a new fast non parametric density estimation technique, it is indeed possible to select quickly the most rarelocal features in a large set of images. Figure illustrates the difference between the 20 most salient points of an image and the 20 most rare points according to their density in a large image database.

Interactive cross-modal retrieval Region labelling using a point-based coherence criterion Hichem Houissa Nozha Boujemaa Visual thesaurus region labelling

Query By Visual Thesaurus is a new query alternative that overcomes the absence of starting example image and offers the possibility to combine multiple visual patches in order to retrieve the target mental image. The Visual Thesaurus is obtained by means of region categorization into visual patches that stand for the database regions representatives.

We introduce a novel semantic region labelling criterion based on point of interest dispersion. This point-based coherence criterion (PCC) leads to label regions through topological and spatial dispersion of points of interest . This dual region/points description strengthens the "page zero" construction process and is not computationally expensive since PCC is evaluated on single coarse regions using Harris color points detector that catches the local photometric variability on small sites (few pixels). The use of points of interest was motivated by their ability to finely describe coarsely segmented region. Thus our approach tends to add se- mantic knowledge information on low level features by labelling rough regions into homogenous and textured classes. The novelty of our approach is the unsupervised region categorization and the construction of a visual region summary that handles both photometric attributes and point-based description.

Gouet-Brunet and Boujemaa proposed a color version of the Harris detector for image description. The idea of the Harris detector is that interest points are local maxima related to the second moment matrix $\mu$ ( x, $\sigma$ _I, $\sigma$ _D)( ) and the gradient distribution in a local neighborhood of a point x.

$Im1 ${{\#956 (x,}\#963 _I,\#963 _D{)=g(}\#963 _I{)*\mfenced o=[ c=] \mtable{...}}}$$

where $\sigma$ _Iis the integration scale, $\sigma$ _Dis the derivation scale, gthe Gaussian and Lthe image smoothed by a Gaussian for each channel i $\in$ { R, G, B}.

As far as points of interest are concerned, a homogenous region is more likely to contain less points than a textured one because points catch the local photometric variability around a very small site. As a matter of fact, assumed homogenous patches should be well described by means of classical color moments, whereas textured or non homogenous regions are better characterized by points attributes.

Let P= { P_k, k $\in$ {1, .., i}}be the set of points of interest detected on the region R_j. We superimpose a grid Cell= { Cell_{(
i,
j)}}on the region.

The idea of PCC is to compute the number of effective points contributing to describe a texture from those that are only due to coarse segmentation. To do so, we build a point histogram where bin ( i, j)reflects the number of interest points located in the ( i, j)-th cell (see equation ).

$Im2 ${{bin(i,j)}=\#8721 _k\#948 _{ijP_k}}$$

$\delta$ _{i
j
P
_k} P _k $\in$ C e l l _{(
i,
j)}

Once all points visited, the histogram is binarizedi.e. all filled cells are switched on, empty others are considered switched off (Figure ). Each cell is visited once so that we keep only those containing at least one point of interest (effective cell)( ).

$Im3 ${{bin(i,j)}=\mfenced o={ c=. \mtable{...}}$$


(a) Points of interest are located on the contours due to rough region	(b) Points of interest are confined in small parts of the region because of coarse segmentation	(c) Points of interest cover the whole region denoting of a textured structure

For visual thesaurus construction, we started with regions clustering using mean colors descriptors. The resulting first level of classification brings together all visually similar regions into a set of representatives. This step is following by PCC computation and labelling for all classes. PCC highly discriminates between different regions structure labelling them either with homogenousor textured. Figure shows, for example, three classes obtained by mean color classification; each class is divided into 2 classes using PCC : homogenous and textured .


(a) Examples of homogenous labelled regions : they encapsulate perceptually homogenous regions as well as those resulting from points accumulation over borders and grouping on small details	(b) Examples of textured labelled regions : they gather textured patches and fully covered regions denoting of an important photometric variation

Cross-modal image retrieval with active relevance feedback Marin Ferecatu Nozha Boujemaa Michel Crucianu image retrieval local descriptors geometry geometric consistency

Many of the available image databases have keyword annotations associated with the images. As keywords and visual features provide complementary information, using both sources of information is an advantage in many applications. We address the challenge of semantic gap reduction using a hybrid visual and conceptual representation of the content within an active relevance feedback context. In we introduce a new feature vector, based on the keyword annotations available for the images, which makes use of conceptual information extracted from an external lexical database This work was partially supported by funding from EC Network of Excellence MUSCLE (FP6-507752). To obtain a scalable solution for representing sets of keywords as reliable and comprehensible feature vectors, we suggest to select a limited set of core concepts, and to associate to every such concept a dimension in the feature vector.

We rely on an external ontology, defining semantic relations between concepts, to find good candidates for the core concepts and to define the feature vectors for sets of keywords. WordNet is a well-known general purpose ontology that organizes nouns, verbs, adjectives and adverbs into synsets (set of words having similar meaning), each representing one underlying lexical concept. The concepts are linked by semantic relations of various types, such as synonymy, hypernymy, hyponymy, etc.

The core conceptswe need for building the conceptual feature vectors should allow us to evaluate the conceptual similarity between keywords wthat are mapped to different concepts c( w)in the ontology. We must then rely on the hypernymy/hyponymy subgraph in WordNet linking the concepts associated to all the keywords in the database to the most generic concepts. For every concept corresponding to a keyword annotating an image, we find all the paths in the ontology that lead to the most generic concepts. The paths obtained for all the keywords in the database define the hypernyms graph. A small set (compared to the number of different keywords) of core concepts is then selected; good candidates are super-concepts of several c( w)concepts that are relatively close to these; also, the core concepts must be balanced among all the branches containing c( w)concepts.

For any image in the database, we project the keywords associated to it to the set of core concepts, through the use of several semantic similarity functions. The resulting feature vectors have the advantage of being compact, comprehensible (each dimension corresponds to a core concept) and easy to integrate in any CBIR system. Experimental evidence shows that the joint use of the visual descriptor and the new conceptual descriptor dramatically improves the quality of the results, both in a Query By Example (QBE) context and with Relevance Feedback (RF), as illustrated in the following figures. A detailed presentation of our method can be found in . Our relevance feedback framework was described in

Semantic cartography of a database from a user's query Anne Verroust-Blondet Marie-Luce Viaud data visualization Euler diagrams hypergraph planarity

We have pursued the work performed with Marie-Luce Viaud from INA (Institut National de l'Audiovisuel) within a collaboration agreement between INRIA and INA : in order to propose a graphic interface allowing the user to elaborate new strategies while searching in a database, as presented in , a tool automatically computing a "Euler like" diagrammatic representation of the result is under construction. This year, we have implemented a system drawing automatically a planar extended Euler diagram from any configuration involving less than nine sets. Each set corresponds to the database documents having a given term in its associated description. Our results have been presented at Euler Diagrams 2005, whorkshop co-organized with INA (cf. examples in figure ).

Clustering and learning Active semi-supervised clustering Nizar Grira Michel Crucianu Nozha Boujemaa clustering semi-supervised clustering pairwise-constraints active learning

Content-based image retrieval (CBIR) can be much improved by providing a relevant organization of image collections. While it is easy to apply standard unsupervised clustering algorithms to the descriptors of the images in a database, the results of this fully automatic categorization are rarely satisfactory. The problem we are interested in may be enlightened as follows. Simple semantic information in the form of pairwise (must-link or cannot-link) constraints between data items or class labels for some items is available and the structure of the entire data has to be discovered with respect to this semantic knowledge.

We assume that users can easily evaluate whether two images should be in the same category or rather in different categories, so they can easily define the constraints mentioned above. Following previous work by Demiriz et al. , Wagstaff et al. or Basu et al. , in we introduced Pairwise-Constrained Competitive Agglomeration (PCCA), a fuzzy semi-supervised clustering algorithm. PCCA is reminded in the next paragraph. In the original version of PCCA we did not make further assumptions regarding the data, so the pairs of items for which the user is required to define constraints are randomly selected. But in many cases, such assumptions regarding the data areavailable. We argue in that quite general assumptions let us perform a more adequate, activeselection of the pairs of items and thus significantly reduce the number of constraints required for achieving a desired level of performance.

In , we proposed an algorithm named PCCA that belongs to the family of search-based semi-supervised clustering methods. It is based on the Competitive Agglomeration (CA) algorithm , a fuzzy partitional algorithm that does not require the user to specify the number of clusters to be found. Using the same notations as in , the PCCA objective function minimize :

$Im4 $\mtable{...}$$

under the constraint $Im5 ${\#8721 _{k=1}^Cu_{ik}=1,~for~i\#8712 {1,...,N}}$$ .

It can be shown (see ) that the equation for updating memberships is

$Im6 $\mtable{...}$$

The first term in ( ), u_rs^FCM, comes from FCM and only focusses on the distances between data items and prototypes. The second term, u_rs^Constraints, takes into account the supervision : memberships are reinforced or deprecated according to the pairwise constraints given by the user. The third term, u_rs^Bias, leads to a reduction of the cardinality of spurious clusters, which are discarded when their cardinality drops below a threshold.

To make this semi-supervised clustering approach attractive for the user, it is important to minimize the number of constraints he has to provide for reaching some given level of quality. This can be done by asking the user to define must-link or cannot-link constraints for the pairs of data items that are expected to have the strongest corrective effect on the clustering algorithm (i.e. that are maximally informative).

When using PCCA we consider that the similarities between data items provide relatively reliable information regarding the target categorization and the constraints only help finding the most relevant clusters. There is then little uncertainty in identifying well-separated compact clusters. To be maximally informative, supervision effort (i.e. constraints) should rather be spent for defining those clusters that are neither compact, nor well-separated from their neighbours. One can note that this is consistent with the findings in regarding unsupervised clustering.

Following this remarks, we consider that the least well-defined cluster at iteration tis the one with the smallest density at that iteration.

When the least well-defined cluster is found, we need to identify the data items near its boundary. In the fuzzy setting, one can consider that a data item represented by the vector x_ris assigned to cluster sif u_rsis the highest among its membership degrees. The data items at the boundary are those having the lowest membership values to this cluster among all the items assigned to it.

Ambiguousness criterion

As we already mentioned that in real configurations, there's very often no sharp boundary between clusters so that a fuzzy partition is often better suited for the determination of membership-based boundaries. To this end, after finding the least well-defined cluster with the upper criterion, we consider a virtual boundary that is only defined by a membership threshold and will usually be larger than the true one (this is why we call it "extended" boundary). The items whose membership values are closest to this threshold are considered to be on the boundary and the user is asked to provide constraints directly between these items. We should note that the extended boundary will probably contain ambiguous points shared with near clusters.

Non-redundancy criterion

Non-redundancy is complementary to ambiguousness in maximizing the amount of information provided by the constraints. The non-redundancy criterion iteratively chooses from an augmented set of feature points lying on the boundary of the least defined cluster vectors x_jthat maximize the lowest of the values of the distances d( x_i, x_j)for all the x_iitems already included in the selected non-redundant set. This can be expressed as :

$Im7 $\mtable{...}$$

where Sis the augmented set of points and x_iare the already chosen points.

In the following, we use the name AFCC (Active Fuzzy Competitive Clustering) for the resulting algorithm.

Experimental evaluation

We compared AFCC to original PCCA, to the CA algorithm (unsupervised clustering) and to PCKmeans (semi-supervised clustering).

The comparison shown here is performed on a ground truth database composed of images of different phenotypes of Arabidopsis thaliana, corresponding to slightly different genotypes. This scientific image database is issued from studies of gene expression. A sample of the images is shown in .

Figure presents the dependence between the percentage of well-categorized data points and the number of pairwise constraints considered. We provide as a reference the graphs for the CA algorithm and for K-means, both ignoring the constraints (unsupervised learning). The correct number of classes was directly provided to K-means and PCKmeans. CA, PCCA and AFCC were initialized with a significantly larger number of classes and found the appropriate number by themselves.

These experimental results clearly show that the user can significantly improve the quality of the categories obtained by providing a simple form of supervision, the pairwise constraints. With a similar number of constraints, PCCA performs significantly better than PCKmeans by making a better use of the available constraints. The fuzzy clustering process directly takes into account the pairwise constraints thanks to the signed constraint terms in the equation for updating the memberships. The active selection of constraints (AFCC) further reduces the number of constraints required for reaching such an improvement. The number of constraints becomes very low with respect to the number of items in the dataset.

Kernels for similarity learning The intermediate matching kernel Sabri Boughorbel Jean Philippe Tarel Nozha Boujemaa image classification interest points matching kernels

Object classification with images represented by sets of local features is a challenging task for kernel based methods. We introduce a new kernel which operates on sets of vectors, named the intermediate matching kernel , . It is based on a new and flexible matching approach which improves results over GCS kernel . The matching procedure is guided by an intermediate set of vectors. Indeed, an explicit mapping $\Phi$ _mdepending on a vector $Im8 ${m\#8712 \#8477 ^d}$$ defines the core of the proposed kernel. One possible choice, but not the only one, for $\Phi$ _mis to associate, for each element of the input space X(a set of vectors from $Im9 $\#8477 ^d$$ ), the nearest vector from $Im10 $\#119987 $$ to m :

$Im11 $\mtable{...}$$

This mapping is performed for $Im10 $\#119987 $$ and $Im12 $\#119987 ^\#8242 $$ which are the two sets of vectors to be compared, then the matched pair of vectors is compared using any positive definite kernel koperating on vectors such as the RBF kernel. A sequence of parametric mappings $Im13 ${{\#934 _m_1,\#8943 ,\#934 _m_p}}$$ are separately applied for the two sets of vectors. The intermediate matching kernel is finally obtained as a sum over the matched pairs. More formally, for any set of vectors $Im14 ${\#8499 =\mfenced o={ c=} m_1,\#8943 ,m_p,~m_i\#8712 \#8477 ^d}$$ , the intermediate matching kernel can be defined as :

$Im15 ${K_\#8499 {(\#119987 ,}\#119987 ^\#8242 {)=}\#8721 _{i=1}^p{k(}x_m_i^*,x_m_i^{\#8242 *}{)=}\#8721 _{i=1}^p{k(}\#934 _m_i{(\#119987 ),}\#934 _m_i{(}\#119987 ^\#8242 {))}}$$

By construction, $Im16 $K_\#8499 $$ is positive definite. An important fact to notice is that the positive definiteness of the intermediate matching kernel is insured when the matching set $Im17 $\#8499 $$ is chosen independently from $Im10 $\#119987 $$ and $Im12 $\#119987 ^\#8242 $$ . We define $Im17 $\#8499 $$ as the set formed with the center classes obtained by clustering all vectors of the training sample. However, other approaches can be used.

Validation and test errors comparisons for the different kernels on a database of 1000 images from ETH80.
	Local jets
Kernels	valid. error	test error
Matching kernel (loser-take-nothing)	19.66 ±0.20	19.66 ±0.40
Matching kernel (winner-take-all)	13.46 ±0.87	13.1 ±0.69
GCS-based kernel	8.93 ±0.53	9.33 ±1.00
Intermediate matching kernel	8.33 ±0.54	8.93 ±0.16

Table summarizes performance comparisons with other kernels that operate on sets of vectors. The task is image classification with images represented by set of jets around interest point. The Intermediate Matching kernel yielded the best test errors with 8.33%. In this experiment, the size of the matching set is 40.

The LCCP for the selection of kernel parameters Sabri Boughorbel Jean Philippe Tarel Nozha Boujemaa model selection kernel parameters svm gradient descent

Tuning hyper-parameters is a necessary step to improve learning algorithm performances. For SVM, adjusting kernel parameters may increases results drastically. Parameter tuning is usually performed using cross-validation by sweeping the parameter space. When the number of parameters increases, such as in the case of multiple kernel parameter, the complexity of such grid search is exponential with respect to the number of optimized parameters. The gradient descent approach introduced in reduces significantly search steps of optimal parameters. We define the LCCP (Log Convex Concave Procedure) derived from the CCCP optimization framework (Convex ConCave Procedure) for optimizing kernel parameters, by minimizing the radius-margin bound. To apply the LCCP, we prove, for a good choice of kernels, that the radius is log convex and the margin is log concave . The LCCP is more efficient than gradient descent technique since it insures that the minimized criterion decreases monotonically and converges to a local minimum without searching the size step. We apply the LCCP for the optimization of parameters $\theta$ _iin the following kernel :

$Im19 ${K_{L_1,\#952 }{(x,}x^\#8242 {)=-}\#8721 _{i=1}^n\mfrac {{|}x_i-x_i^\#8242 {|}}\#952 _i{,~\#952 =(}\#952 _i{),1\#8804 i\#8804 n}}$$

which is an extension of the L₁-distance kernel :

$Im20 ${K_L_1{(x,}x^\#8242 {)=-}\#8721 _{i=1}^n{|}x_i-x_i^\#8242 {|.}}$$

We also recall the L₂-distance kernel :

$Im21 ${K_L_2{(x,}x^\#8242 {)=-}\#8721 _{i=1}^n{(}x_i-{x^\#8242 }_i)^2,}$$

The above two kernels can be shown to be conditionally positive definite , .

Test error's comparison of the single parameter L 1 -distance kernel (), L 2 -distance kernel () and L 1 -distance kernel with multiple parameters. n denotes the number of parameters for multi-parameter's kernel. LCCP is used for optimizing the radius margin bound.
	Thyroid	Titanic	Heart	Breast-cancer
K_L₁( )	5.77%	22.68%	20.65%	28.97%
K_L₂( )	11.21%	22.56%	18.23%	29.77%
$Im22 $K_{L_1,\#952 }$$ ( )	6.20%	22.08%	17.34%	27.12%
n	5	3	13	9

Tab. summarizes the average test errors for different data sets and shows clearly the interest of the multiple kernel parameters, despite a preliminary weightening of the databases.

Kernel PCA for similarity invariant shape recognition and its application to leaf species Hichem Sahbi kernels invariant shape

There is a more and more interest in shape recognition with the growing need for retrieval applications such as finding leaf species. In this work, we introduced a new shape description method targeted for 2 Dshape contours. A contour is a manifold $Im23 ${\#8466 =\mfenced o={ c=} {(x(s),y(s))~\#8712 ~}\#8477 ^2{,~~s\#8712 [0,1]}}$$ , where sis referred to as the curvilinear abscissa and $Im24 $\#119982 $$ as a uniform sampling of $Im25 $\#8466 $$ , according to s(i.e., s= k $\Delta$ , $Im26 ${k\#8712 \#8469 ^+}$$ , $Im27 ${\#916 =1/|\#8466 |}$$ ).

It is shown in our work , that the triangular kernel, defined as $Im28 ${{k(x,}x^\#8242 {)=-\#8214 x-}x^\#8242 \#8214 ^p}$$ , p $\in$ ]0, 2[achieves similarity and particularly scale invariance for many kernel methods including support vector machine and kernel principal component analysis (KPCA). Given a shape contour $Im24 $\#119982 $$ , the top deigenvalues, of KPCA on $Im24 $\#119982 $$ , are used as its shape description. These eigenvalues are rotation and translation invariant when using the Gaussian kernel and can be normalized to be also scale invariant when using the triangular kernel. Notice that computing this description does not necessitate sampling a curve according to an orderedcurvilinear abscissa which might be unavailable and difficult to find mainly for complex contours.

This shape description is also scale invariant when using the linear kernel, nevertheless the dimension of the underlying eigenspace will not exceed two As the training examples live in 2 D, we have at most two non null eigenvalues when solving KPCA using the linear kernel., so the eigenvalues of this eigenspace will not be sufficient to discriminate different and complex shapes especially those containing fine details as shown in the experiments (cf. Fig , right). While this problem might be overcome when using the Gaussian kernel, KPCA is not similarity invariant, when using this kernel and it necessitates finding different scales for different curves, so it is meaningless to compare the underlying eigenvalues. The triangular kernel gathers two properties : achieving scale (and similarity) invariance while ensuring a "rich" description as the number of eigenvalues increases when the cardinality of a contour $Im24 $\#119982 $$ increases.

We ran our experiments on Smithsonian and Swedish databases consisting respectively of 1525 and 1125 images of leaves. The Smithsonian dataset has 135 categories with different cardinalities while the Swedish set has 15 classes, each one contains 75 images. Notice that only the external contours are used for shape description even though this is not sufficient to make the "best" predictions as color and internal structures are also prominent. Figure ( ) shows the precision-recall curves for different values of the regularizer parameter pof the triangular kernel. The precision reaches the highest value for p= 1.9and drops dramatically when pis close to its upper bound (i.e., p= 2). Figure ( ) shows the precision-recall on the Swedish dataset for d= 100dimensions using our shape description. Comparisons, with the representative state of the art work, are reported using linear PCA ( d= 2) and other shape descriptors including the edge orientation histogram (EOH) , Hough and Curvature Scale Space (CSS) . When the recall is less than 30%, the precision of KPCA is better than the other descriptors including CSS.


(a)	(b)


(a)

Software IKONA/MAESTRO software Nicolas Hervé Mathieu Coutaud Jean-Paul Chièze user interface image retrieval by content CBIR relevance feedback

This year, the Ikona/Maestro software has been fully re-engineered to improve its modularity. Most of the changes have been done on the server part. They relate to the internal architecture. In this new version, the interfaces of all the main components have been unified. This changes will make easier the integration of new descriptors, new query paradigms and new functionnalities.

aceMedia software integration Nicolas Hervé image retrieval by content CBIR aceMedia

In the aceMedia european integrated project, IMEDIA is in charge of the developement of the "Intelligent Search and Retrieval" application module. This module brings together the software of four research teams that work on different multimedia information retrieval paradigms. The first version has been delivered in June. It has been integrated in a global client-side application : the PCS User Interface. Further improvements on this module, as well as its integration in a web-based application are planned for the next year.

Other Grants and Activities National Initiatives Industrial contract with INA [2004-2007]

A co-supervision of a Phd within CIFRE context. The main topic is about optimal fine visual signatures for monitiring of INA video collections.

BIOTIM Project (exploiting Text-IMage ressources in BIOdiversity) within the national initiative "Masses of data"[2003-2006]

The partners of this project are the IMEDIA and ATOLL teams of INRIA Rocquencourt, the CEDRIC laboratory of the CNAM Paris, the LIFO laboratory of the University of Orléans, the Institute of Research for Development (IRD) and the National Institute for Research in Agriculture (INRA). BIOTIM is coordinated by IMEDIA. The project is financially supported by the French National Science Fund (FNS).

QuerySat Project within the national initiative "ACI Masses de Données")[2004-2007]

This project concerns the conception and developpement of content description methods for aerial and satellite images indexing and retrieval by content. This work is done jointly with ARIANA project Team (Sophia Antipolis), ENST-Paris (CNRS) and URISA research team from Sup'Com (School of Engineering - Tunis). One of the objectives is to make connection with symbolic and semantics features queries in the context of satellite image repositories.

InfoMagic[2005-2008]

This project is a part of IMVN (image, video et vie numérique) competitiveness pole in the region Ile de France. It aims to develop a framework for advanced multimedia search engine. The main partner is Thales.

European Initiatives Integrated European Project "AceMedia"[2004-2007]

"Integrating knowledge, semantics and content for user-centred intelligent media services" in the 6th Framework Program. The consortium of this project is composed of 15 industrial and academic European partners (Alinari, Belgavox, DCU, France Telecom, Fraunhofer, INRIA, ITI, Motorola, Philips, QMUL, Telefonica, Thomson, UAM, UKarlsruhe).

European Network of Excellence "MUSCLE"[2004-2007]

"Multimedia Understanding through Semantics, Computation and Learning" in the 6th Framework Programme. This network of excellence is composed of 42 European academic institutions. Nozha Boujemaa chairs the Workpackage "Single Media Processing" and is deputy scientific coordinator of the network.

European Network of Excellence "DELOS2 [2004-2007]"

"Network of excellence on Digital Libraries" in the 6th Framework Programme. This network of excellence is composed of 44 European academic institutions for the period 2004-2007.

International Initiatives ViMining Project INRIA-NII (INRIA Associated Team Program)[2004-2005]

ViMining is an associated research team composed of IMEDIA group and the team of Pr. Shin'ichi Satoh from the National Institute of Informatics (NII), Japan. The major topics of common interest are : detection and description of semantic video events; organisation of the feature space; cross-media indexing and retrieval.

For more information, see http://www-rocq.inria.fr/imedia/vimining/index.htmland http://www-direction.inria.fr/international/EQUIPES_ASSOCIEES/index.eng.htm

STIC Project INRIA-Tunisian universities "INISAT"[2003-2006]

This project involves the URISA research team from the school of engineering Sup'Com in Tunis. This project aims at developping unsupervised classification methods in order to segment satellite images and organize visual database indexes.

Information about past and on-going projects are also detailed at http://www-rocq.inria.fr/imedia/projects.html.

Dissemination Leadership with scientific community Nozha Boujemaa

Keynote speaker for the international conference ESA-EUSC'05: Image Information Mining - Theory and Application to Earth Observation ;

Scientific expert for the European commission, invited to the consultation workshop "Challenges of Future Search Engines" held at Brussels during September 2005 for the preparation of FP7 calls ;

Scientific expert for NWO Research agency (Netherlands - CATCH projects) ;

Organizer of a special session "Machine Learning for Visual Information Retrieval" for the 7th ACM SIGMM International Workshop on Multimedia Information Retrieval (ACM-MIR'05) ;

"Tutorials Chair" for ECDL'05 besides member of several TPC among them ICME'05, MIR'05...

Deputy scientific coordinator of Muscle NoE (Network of Exellence FP6), member of steering committee and scientific coordinator of WP5 "Single Modality". Co-organizer (with Valerie Gouet) of several scientific meeting for WP5. ;

In charge of international relation at INRIA Rocquencourt and member of "Bureau du Comité des Projets" till october 2005 ;

Since September 2005, member of the "National Evaluation Commission" of INRIA.

PhD Jury member :

Chairman : INSA Lyon.

Reviewer : Ecole Centrale Lyon, Université de Nice, Université De Reims.

HDR Jury member :

Reviewer : Antoine Tabbone - Université de Nancy.

Valérie Gouet-Brunet

Member of the scientific commission (section CNU 27) of CNAM ;

Member of the CNAM/CEDRIC laboratory council ;

Scientific expert for the french research program ARA "Masses of Data" ;

Deputy scientific leader of the work package WP5 in the Muscle Network of Excellence ;

Leader of the WP5/task 2 in the Muscle Network of Excellence ;

Co-organiser (with Nozha Boujemaa) of the WP5 Third Scientific meeting in the Muscle Network of Excellence, 27-29 April 2005, Paris ;

Co-organiser (with Nozha Boujemaa) of the WP5 First focus meeting in the Muscle Network of Excellence, 1-2 December 2005, Rocquencourt ;

Conference program committee member of the International Workshop on Computer Vision meets Databases (CVDB'05), in conjunction with ACM Sigmod, June 17, 2005, Baltimore, MD, USA ;

Jury member of several engineer diplomas at CNAM ;

Reviewer for conference papers : ICME'05, CVDB'05, Acivs'05 ;

Reviewer for a journal paper : IVC.

Nicolas Hervé

Active member of the JPSearch ad-hoc group (ISO/IEC JTC 1/SC 29/WG 1 - Coding of still pictures) that aims to produce standards to facilitate management, search, and retrieval of content in the form of still pictures.

Jean-Philippe Tarel

Journal Reviewer : Journal of Mathematical Imaging and Vision ;

Jury member :

PhD Sabri Boughorbel, Paris Orsay University, 13 juillet, 2005.

Member of program commity of 11th International Conference on Computer Analysis of Images and Patterns (CAIP'2005). Member of reading commity of Colloque du Groupement de Recherche En Traitement du Signal et de l'Image (GRETSI'2005).

Anne Verroust-Blondet

AFIG President (Association Française d'informatique Graphique) ;

Member of the Executive Committee (Conseil d'Administration) of the French chapter of Eurographics ;

Co-organizer of Euler Diagrams 2005 workshop ( http://www-rocq.inria.fr/imedia/euler2005/) with Marie-Luce Viaud of INA.

Teaching Sabri Boughorbel

30h TPs on Java, 1st year, IUT Velizy, May 2005.

Valérie Gouet-Brunet

192 hours in the Computer Science Department of CNAM;

National responsible for the course "Computer Vision" of the Master research STIC - Computer Science of CNAM (6 ECTS - 60 hours);

In charge of the course "Image indexing and retrieval" in the master SAR of Paris 6 (7,5 hours).

Jean-Philippe Tarel

6h courses on "Object Detection", Master STIC, CNAM, november 2005.

24h TP on "Computer Vision", Master Sciences de l'ingénieur, Pierre and Marie Curie University, october 2005.

3h Course on "Object Detection", Master Sciences de l'ingénieur, Pierre and Marie Curie University, november 2005.

Anne Verroust-Blondet

Course (9H) on Computational Geometry in the option "Computer Vision" in last year of the engineering degree course of the ENSTA school (École Nationale Supérieure de Techniques Avancées, Paris).

An Interactive System for Mental Face Retrieval Yuchun Fang Y. Donald Geman D. Nozha Boujemaa N. 7th ACM SIGMM International Workshop on Multimedia Information Retrieval, Singapore in conjunction with ACM Multimedia 2005 November 10–11 2005 Kernels for Image Classification with Support Vector Machines Sabri Boughorbel S. Ph. D. Thesis Faculté d'Orsay 2005 Image retrieval with active relevance feedback using both visual and keyword-based descriptors Marin Ferecatu M. Ph. D. Thesis Université de Versailles Saint-Quentin-en-Yvelines July 2005 Encyclopédie de l'Informatique V. Gouet-Brunet V. Recherche par contenu visuel dans les grandes collections d'images Vuibert 2005 Fuzzy Clustering With Pairwise Constraints for Knowledge-Driven Image Categorization Nizar Grira N. Michel Crucianu M. Nozha Boujemaa N. 1350-245X IEEE Proceedings - Vision, Image and Signal Processing to appear 2006 A Hierarchy of Support Vector Machines for Pattern Detection H. Sahbi H. D. Geman D. 1532-4435 Machine Learning Research 2005 Using Euler Diagrams in Traditional Library Environment J. Thièvre J. M.-L. Viaud M.-L. A. Verroust-Blondet A. 1571-0661 Electronic Notes in Computer Science 134 June 2005 189–202 Encyclopédie de l'Informatique A. Verroust-Blondet A. M.-L. Viaud M.-L. Introduction à la synthèse d'images Collectif Vuibert to appear, 2005 Conditionally Positive Definite Kernels For SVM Based Image Recognition S. Boughorbel S. J.P Tarel J. N. Boujemaa N. International Conference on Multimedia and Expo (ICME'05), The Netherlands 2005 The GCS Kernel For SVM Based Image Recognition S. Boughorbel S. J.P Tarel J. N. Boujemaa N. The International Conference on Artificial Neural Networks (ICANN'05), Poland 2005 The Generalized Histogram Intersection For Image Recognition S. Boughorbel S. J.P Tarel J. N. Boujemaa N. The International Conference on Image Processing (ICIP'05), Italy 2005 The Intermediate Matching Kernel For Local Image Features S. Boughorbel S. J.P Tarel J. N. Boujemaa N. The Proceedings of International Joint Conference on Neural Networks (IJCNN'04), Canada, Montreal 2005 The LCCP For Optimizing Kernel Parameters S. Boughorbel S. J.P Tarel J. N. Boujemaa N. The International Conference on Artificial Neural Networks (ICANN'05), Poland 2005 Evaluation of strategies for multiple sphere queries with local image descriptors N. Bouteldja N. V. Gouet-Brunet V. M. Scholl M. Accepted to the IS and T/SPIE Conference on Multimedia Content Analysis, Management, and Retrieval, San Jose CA, USA to appear January 2006 An Interactive System for Mental Face Retrieval Yuchun Fang Y. Donald Geman D. Nozha Boujemaa N. 7th ACM SIGMM International Workshop on Multimedia Information Retrieval, Singapore in conjunction with ACM Multimedia 2005 November 10–11 2005 Experiments in Mental Face Retrieval Yuchun Fang Y. Donald Geman D. Audio- and Video-based Biometric Person Authentication (AVBA'2005), New York July 20–22 2005 Hybrid Visual and Conceptual Image Representation within Active Relevance Feedback Context Marin Ferecatu M. Nozha Boujemaa N. Michel Crucianu M. Proceedings of the 7th ACM SIGMM International Workshop on Multimedia Information Retrieval to appear November 2005 Active semi-supervised clustering for image database categorization Nizar Grira N. Michel Crucianu M. Nozha Boujemaa N. Content-Based Multimedia Indexing (CBMI'05) June 2005 Active Semi-Supervised Fuzzy Clustering for Image Database Categorization Nizar Grira N. Michel Crucianu M. Nozha Boujemaa N. 7th ACM SIGMM International Workshop on Multimedia Information Retrieval (MIR'05) to appear November 2005 Semi-supervised fuzzy clustering with pairwise-constrained competitive agglomeration Nizar Grira N. Michel Crucianu M. Nozha Boujemaa N. IEEE International Conference on Fuzzy Systems (Fuzz'IEEE 2005) May 2005 Coherence criterion for region labelling and description H. Houissa H. N. Boujemaa N. Proceedings of the International Conference on Machine Intelligence (ICMI'05), Tozeur, Tunisia to appear 2005 Region labelling using a Point-based Coherence Criterion H. Houissa H. N. Boujemaa N. IS and T/SPIE Conference on Multimedia Content Analysis, Management, and Retrieval 2006, San-Jose, CA, USA to appear 2006 Discriminant Local Features Selection using Efficient Density Estimation in a Large Database A. Joly A. O. Buisson O. Int. Workshop on Multimedia Information Retrieval 2005 Content-based Video Copy Detection in Large Databases: A Local Fingerprints Statistical Similarity Search Approach A. Joly A. C. Frélicot C. O. Buisson O. Int. Conf. on Image Processing 2005 Advanced Semantic Authentication of Face Images H. Liu H. H. Sahbi H. L. Croce-Ferri L. M. Steinebach M. WIAMIS05, 6th International Workshop on Image Analysis for Multimedia Interactive Services, Montreux, Switzerland April 2005 Validity of Fuzzy Clustering using Entropy Regularization H. Sahbi H. N. Boujemaa N. IEEE International Conference on Fuzzy Systems (Fuzz'IEEE 2005), Reno, USA May 2005 Affine Invariant Shape Description Using the Triangular Kernel and its Application to Leaf Recognition H. Sahbi H. Proceedings of the 4th International Workshop on Content-Based Multimedia Indexing, Riga, Latvia June 2005 Invariant Shape Description and Its Application for Leaf Recognition H. Sahbi H. Proceedings of the IEEE Workshop on Content Based Image Retrieval 2005 Applying Interest Operators in Semi-Fragile Video Watermarking S. Thiemert S. H. Sahbi H. M. Steinebach M. IS and T/SPIE Conference on Security, Steganography and Watermarking of Multimedia Contents (SPIE), San Jose, California January 2005 Using Entropy for Image and Video Authentication Watermarks S. Thiemert S. H. Sahbi H. M. Steinebach M. IS and T/SPIE Conference on Security, Steganography and Watermarking of Multimedia Contents (SPIE), San Jose, California to appear January 2006 Drawing diagrams from labelled graphs J. Thièvre J. A. Verroust-Blondet A. M.-L. Viaud M.-L. Euler Diagrams'2005, Paris September 2005 Indexation 3D : un tour d'horizon A. Verroust-Blondet A. Journées AFIG 2005, Strasbourg 2005 Visual similarity search in botanical collections I. Yahiaoui I. N. Boujemaa N. N. Hervé N. International Conference on Machine Intelligence (ICMI'05), Tozeur, Tunisia to appear November 2005 Content-based image retrieval in botanical collections for gene expression studies I. Yahiaoui I. N. Boujemaa N. IEEE International Conference on Image Processing (ICIP'05) September 2005 Adaptive Satellite Images Segmentation by Level Set Multiregion Competition. O. Besbes O. Z. Belhadj Z. N. Boujemaa N. to appear Research Report INRIA 2005 Indexation 3D : Descripteurs 2D/3D basés sur les images de profondeur M. Chaouch M. September 2005 Rapport de Master, Master Recherche IAD, Université Paris VI Semi-Supervised Clustering by Seeding S. Basu S. A. Banerjee A. R. J. Mooney R. J. Proceedings of 19th International Conference on Machine Learning (ICML-2002) 2002 19–26 The Pruning Power: Theory and Heuristics for Mining Databases with Multiple kNearest-Neighbor Queries C. Bohm C. B. Braunmuller B. H.P. Kriegel H. Proceedings of the International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2000), Greenwich, United Kingdom 2000 "Sur la classification non-exclusive en analyse d'images" N. Boujemaa N. Habilitation à diriger des recherches Université de Versailles-Saint-Quentin 2000 Ikona: Interactive specific and generic image retrieval N. Boujemaa N. J. Fauqueur J. M. Ferecatu M. F. Fleuret F. V. Gouet-Brunet V. B. Le Saux B. H. Sahbi H. International workshop on Multimedia Content-Based Indexing and Retrieval (MMCBIR'2001) 2001 Efficiently supporting multiple similarity queries for mining in metric databases B. Braunmuller B. M. Ester M. H.P. Kriegel H. J. Sander J. 16th Int. Conf. on Data Engineering, San Diego, CA 2000 256-270 Unsupervised Segmentation Incorporating Colour, Texture, and Motion. T. Brox T. M. Rousson M. R. Deriche R. J. Weickert J. Research Report 4760 INRIA March 2003 Choosing Multiple Parameters for Support Vector Machines O. Chapelle O. V. Vapnik V. O. Bousquet O. S. Mukherjee S. Machine Learning 46 1-3 2002 131–159 Semi-supervised clustering using genetic algorithms A. Demiriz A. K. Bennett K. M. Embrechts M. C. H. Dagli C. H. et al. Intelligent Engineering Systems Through Artificial Neural Networks 9 ASME Press 1999 809–814 Selection of Scale-Invariant Parts for Object Class Recognition Gyuri Dorkó G. Cordelia Schmid C. International Conference on Computer Vision 1 2003 634–640 New Image Retrieval Paradigm: Logical Composition of Region Categories J. Fauqueur J. N. Boujemaa N. In IEEE International Conference on Image Processing 2003 Retrieval of difficult image classes using SVM-based relevance feedback Marin Ferecatu M. Michel Crucianu M. Nozha Boujemaa N. Proceedings of the 6th ACM SIGMM International Workshop on Multimedia Information Retrieval October 2004 23 – 30 Détection hiérarchique de visages par apprentissage statistique F. Fleuret F. Ph. D. Thesis Université Paris-VI, Paris 2000 Scale-Invariance of Support Vector Machines Based on the Triangular Kernel F. Fleuret F. H. Sahbi H. Third International Workshop on Statistical and Computational Theories of Vision (part of ICCV) 2003 Orientation histograms for hand gesture recognition W.T. Freeman W. M. Roth M. International Workshop on Automatic Face and Gesture Recognition 1995 296–301 Clustering by competitive agglomeration H. Frigui H. R. Krishnapuram R. Pattern Recognition 30 7 1997 1109–1119 Object-based queries using color points of interest V. Gouet-Brunet V. N. Boujemaa N. In Workshop on Content-Based Access of Image and Video Libraries (CVPR) 2001 Focus-of-Attention from Local Color Symmetries. Heidemann Gunther H. IEEE Transactions on Pattern Analysis and Machine Intelligence 26 7 2004 817-830 A combined corner and edge detector C. Harris C. M. Stephens M. A combined corner and edge detector 1988 147-151 Active data clustering Thomas Hofmann T. Joachim M. Buhmann J. M. Advances in Neural Information Processing Systems (NIPS) 10 1997 528–534 Shape recognition of irregular objects J. Iivarinen J. A. Visa A. Intelligent Robots and Computer Vision XV: Algorithms, Techniques, Active Vision, and Materials Handling, SPIE 1996 Image retrieval using color and shape A. K. Jain A. K. A. Vailaya A. Pattern Recognition 29 8 August 1996 1233-1244 Space-Time Interest Points I. Laptev I. T. Lindeberg T. Proceedings of the International Conference on Computer Vision 2003 432-439 Distinctive Image Features from Scale-Invariant Keypoints D. G. Lowe D. G. ijcv 60 2 2004 91–110 Introduction to MPEG-7 B.S. Manjunath B. Philippe Salembier P. Thomas Sikora T. Wiley 2002 Image Partitioning by Level Set Multiregion Competition A.-R. Mansouri A.-R. A. Mitiche A. C. Vazquez C. ICIP October 2004 2721–2724 A performance evaluation of local descriptors K. Mikolajczyk K. C. Schmid C. IEEE Trans. on Pattern Analysis and Machine Intelligence 27 10 2005 1615–1630 Robust and Efficient Shape Indexing through Curvature Scale Space F. Mokhtarian F. S. Abbasi S. J. Kittler J. Proceedings of British Machine Vision Conference 1996 53–62 Computer Vision Classification of Leaves from Swedish Trees O. Soderkvist O. 2001 Rapport de Master, Linkoping University Clustering with instance-level constraints K. Wagstaff K. C. Cardie C. Proceedings of the 17th International Conference on Machine Learning 2000 1103–1110 A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. Roger Weber R. Hans-Jörg Schek H.-J. Stephen Blott S. VLDB 1998 194-205 The Concave-Convex Procedure A. L. Yuille A. L. Anand Rangarajan A. Neural Computation 15 4 2003 915–936