Linked media appears today as a major challenge, with numerous potential applications in all areas of multimedia. The strong increase of ubiquitous access to the Internet and the resulting convergence of media on the network open countless opportunities for linked media and reinforce the key role of such a challenge. New applications centered on the notion of linked media are emerging today, such as second screen applications and recommendation services. However, because of the lack of adequate technology, linking related content is mostly deferred to human operators in current applications or to user behavior analysis, e.g., via collaborative filtering, thus indirectly considering the content. This fact severely limits the opportunities offered by a web of media, in terms of creativity, scalability, representativeness and completeness, thus negatively impacting the spread of linked media and the development of innovative services in the Internet of media.
Most of the research effort in automatic multimedia content analysis has been devoted so far to describing and indexing content on which core tasks around information retrieval and recommendation are built to develop multimedia applications. This general philosophy mostly reposes on a vision where documents are considered as isolated entities, i.e., as a basic unit which is indexed or analyzed regardless of other content items and of context. Considering documents in isolation has enabled key progress in content-based analysis and retrieval on a large scale: e.g., design of generic descriptors, efficient techniques for content-based analysis, fast retrieval methodology. But ignoring the links, implicit or explicit, between content items also appears as a rather strong assumption with straightforward consequences on algorithms and applications, both in terms of performance and in terms of possibilities.
Linkmedia investigates a number of key issues related to multimedia collections structured with explicit links: Can we discover what characterizes a collection and makes its coherence? Are there repeating motifs that create natural links and which deserve characterization and semantic interpretation? How to explicitly create links from pairwise distances? What structure should a linked collection have? How do we explain the semantic of a link? How explicit links can be used to improve information retrieval? To improve user experience? In this general framework, the global objective of Linkmedia is to develop the scientific, methodological and technological foundations facilitating or automating the creation, the description and the exploitation of multimedia collections structured with explicit links. In particular, we target a number of key contributions in the following areas:
designing efficient methods dedicated to multimedia indexing and unsupervised motif discovery: efficiently comparing content items on a large scale and finding repeating motifs in an unsupervised manner are two key ingredients of multimedia linking based on a low-level representation of the content;
improving techniques for structuring and semantic description: better description of multimedia content at a semantic—i.e., human interpretable—level, making explicit the implicit structure when it exists, is still required to make the most of multimedia data and to facilitate the creation of links to a precise target at a semantic level;
designing and experimenting approaches to multimedia content linking and collection structuring: exploiting low-level and semantic content-based proximity to create explicit links within a collection requires specific methodology departing from pairwise comparison and must be confronted with real data;
studying new paradigms for the exploitation of linked multimedia content as well as new usages: explicit links within media content collections change how such data is processed by machines and ultimately consumed by humans in ways that have yet to be invented and studied.
Linkmedia is a multidisciplinary research team, with multimedia data as the main object of study. We are guided by the data and their specificity—semantically interpretable, heterogeneous and multimodal, available in large amounts, unstructured and disconnected—, as well as by the related problems and applications.
With multimedia data at the center, orienting our choices of methods and algorithms and serving as a basis for experimental validation, the team is directly contributing to the following scientific fields:
multimedia: content-based analysis; multimodal processing and fusion; multimedia applications;
computer vision: compact description of images; object and event detection;
natural language processing: topic segmentation; information extraction;
information retrieval: high-dimensional indexing; approximate k-nn search; efficient set comparison.
Linkmedia also takes advantage of advances in the following fields, adapting recent developments to the multimedia area:
signal processing: image processing; compression;
machine learning: deep architectures; structured learning; adversarial learning;
security: data encryption; differential privacy;
data mining: time series mining and alignment; pattern discovery; knowledge extraction.
Research activities in Linkmedia are organized along three major lines of research which build upon the scientific domains already mentioned.
As an alternative to supervised learning techniques, unsupervised approaches have emerged recently with the goal of discovering directly patterns and events of interest from the data, in a totally unsupervised manner. In the absence of prior knowledge on what we are interested in, meaningfulness can be judged based on one of three main criteria: unexpectedness, saliency and recurrence. This last case posits that repeating patterns, known as motifs, are potentially meaningful, leading to recent work on the unsupervised discovery of motifs in multimedia data , , .
Linkmedia seeks to develop unsupervised motif discovery approaches which are both accurate and scalable. In particular, we consider the discovery of repeating objects in image collections and the discovery of repeated sequences in video and audio streams. Research activities are organized along the following lines:
developing the scientific basis for scalable motif discovery: sparse histogram representations; efficient co-occurrence counting; geometry and time aware indexing schemes;
designing and evaluating accurate and scalable motif discovery algorithms applied to a variety of multimedia content: exploiting efficient geometry or time aware matching functions; fast approximate dynamic time warping; symbolic representations of multimedia data, in conjunction with existing symbolic data mining approaches;
developing methodology for the interpretation, exploitation and evaluation of motif discovery algorithms in various use-cases: image classification; video stream monitoring; transcript-free natural language processing (NLP) for spoken document.
Content-based analysis has received a lot of attention from the early days of multimedia, with an extensive use of supervised machine learning for all modalities , . Progress in large scale entity and event recognition in multimedia content has made available general purpose approaches able to learn from very large data sets and performing fairly decently in a large number of cases. Current solutions are however limited to simple, homogeneous, information and can hardly handle structured information such as hierarchical descriptions, tree-structured or nested concepts.
Linkmedia aims at expanding techniques for multimedia content modeling, event detection and structure analysis. The main transverse research lines that Linkmedia will develop are as follows:
context-aware content description targeting (homogeneous) collections of multimedia data: latent variable discovery; deep feature learning; motif discovery;
secure description to enable privacy and security aware multimedia content processing: leveraging encryption and obfuscation; exploring adversarial machine learning in a multimedia context; privacy-oriented image processing;
multilevel modeling with a focus on probabilistic modeling of structured multimodal data: multiple kernels; structured machine learning; conditional random fields.
Creating explicit links between media content items has been considered on different occasions, with the goal of seeking and discovering information by browsing, as opposed to information retrieval via ranked lists of relevant documents. Content-based link creation has been initially addressed in the hypertext community for well-structured texts and was recently extended to multimedia content , , . The problem of organizing collections with links remains mainly unsolved for large heterogeneous collections of unstructured documents, with many issues deserving attention: linking at a fine semantic grain; selecting relevant links; characterizing links; evaluating links; etc.
Linkmedia targets pioneering research on media linking by developing scientific ground, methodology and technology for content-based media linking directed to applications exploiting rich linked content such as navigation or recommendation. Contributions are concentrated along the following lines:
algorithmic of linked media for content-based link authoring in multimedia collections: time-aware graph construction; multimodal hypergraphs; large scale k-nn graphs;
link interpretation and characterization to provide links semantics for interpretability: text alignment; entity linking; intention vs. extension;
linked media usage and evaluation: information retrieval; summarization; data models for navigation; link prediction.
Regardless of the ingestion and storage issues, media asset management—archiving, describing and retrieving multimedia content—has turned into a key factor and a huge business for content and service providers. Most content providers, with television channels at the forefront, rely on multimedia asset management systems to annotate, describe, archive and search for content. So do archivists such as the Institut National de l'Audiovisuel, the Nederlands Instituut voor Beeld en Geluid or the British Broadcast Corporation, as well as media monitoring companies, such as Yacast in France. Protecting copyrighted content is another aspect of media asset management.
One of the most visible application domains of linked multimedia content is that of multimedia portals on the Internet. Search engines now offer many features for image and video search. Video sharing sites also feature search engines as well as recommendation capabilities. All news sites provide multimedia content with links between related items. News sites also implement content aggregation, enriching proprietary content with user-generated content and reactions from social networks. Most public search engines and Internet service providers offer news aggregation portals.
The convergence between television and the Internet has accelerated significantly over the past few years, with the democratization of TV on-demand and replay services and the emergence of social TV services and multiscreen applications. These evolutions and the ever growing number of innovative applications incurred offer a unique playground for multimedia technologies. Recommendation plays a major role in connected TV. Enriching multimedia content, with explicit links targeting either multimedia material or knowledge databases, appears as a key feature in this context, at the core of rich TV and second screen applications.
On-line courses are rapidly gaining interest with the recent movement for massive open on-line courses (MOOCs). Such courses usually aggregate multimedia material, such as a video of the course with handouts and potentially text books, exercises and other related resources. This setting is very similar to that of the media aggregation sites though in a different domain. Automatically analyzing and describing video and textual content, synchronizing all material available across modalities, creating and characterizing links between related material or between different courses are all necessary features for on-line courses authoring.
TermEx is a domain-independent terminology extraction system based on natural language processing and information retrieval concepts. This year, a new version (2.0) has been implemented that corresponds to a major rewriting in Python3 with support for English (in addition to French) and faster processing of documents in batch.
In 2015, TermEx has been licensed to a large company as a key component of the archiving process.
The experimental multimedia indexing platform (PIM) consists of dedicated equipments to experiment on very large collections of multimedia data. In 2015, no major evolution of PIM occurred and activities on the platform mainly consisted on maintenance. Due to the departure of Sébastien Campion, our former PIM manager, we have also initiated a reorganization of the responsibilities, in collaboration with SED.
Available at http://
In collaboration with Michael Houle, National Institute for Informatics (Japan).
Some of our research work was concerned with the estimation of continuous intrinsic dimension (ID), a measure of intrinsic dimensionality recently proposed by Houle. Continuous ID can be regarded as an extension of Karger and Ruhl's expansion dimension to a statistical setting in which the distribution of distances to a query point is modeled in terms of a continuous random variable. This form of intrinsic dimensionality can be particularly useful in search, classification, outlier detection, and other contexts in machine learning, databases, and data mining, as it has been shown to be equivalent to a measure of the discriminative power of similarity functions. In , we proposed several estimators of continuous ID that we analyzed based on extreme value theory, using maximum likelihood estimation, the method of moments, probability weighted moments, and regularly varying functions. Experimental evaluation was performed using both real and artificial data.
LSH is a popular framework to generate compact representations of multimedia data, which can be used for content based search. However, the performance of LSH is limited by its unsupervised nature and the underlying feature scale. In , we proposed to improve LSH by incorporating two elements: supervised hash bit selection and multi-scale feature representation. First, a feature vector is represented by multiple scales. At each scale, the feature vector is divided into segments. The size of a segment is decreased gradually to make the representation correspond to a coarse-to-fine view of the feature. Then each segment is hashed to generate more bits than the target hash length. Finally the best ones are selected from the hash bit pool according to the notion of bit reliability, which is estimated by bit-level hypothesis testing. Extensive experiments have been performed to validate the proposal in two applications: near-duplicate image detection and approximate feature distance estimation. We first demonstrate that the feature scale can influence performance, which is often a neglected factor. Then we show that the proposed supervision method is effective. In particular, the performance increases with the size of the hash bit pool. Finally, the two elements are put together. The integrated scheme exhibits further improved performance.
Most image encodings achieve orientation invariance by aligning the patches to their dominant orientations and translation invariance by completely ignoring patch position or by max-pooling. Albeit successful, such choices introduce too much invariance because they do not guarantee that the patches are rotated or translated consistently. In this work, we propose a geometric-aware aggregation strategy, which jointly encodes the local descriptors together with their patch dominant angle and/or location . The geometric attributes are encoded in a continuous manner by leveraging explicit feature maps. Our technique is compatible with generic match kernel formulation and can be employed along with several popular encoding methods, in particular bag of words, VLAD and the Fisher vector. The method is further combined with an efficient monomial embedding to provide a codebook-free method aggregating local descriptors into a single vector representation. Invariance is achieved by efficient similarity estimation of multiple rotations or translations, offered by a simple trigonometric polynomial. This strategy is effective for image search, as shown by experiments performed on standard benchmarks for image and particular object retrieval, namely Holidays and Oxford buildings.
M. Sc. Internship of Corentin Hardy, in collaboration with René Quiniou, Inria Rennes, DREAM research team, within the framework of the STIC AmSud Maximum project and of the MOTIF Inria Associate Team.
Analyzing multimedia data is a challenging problem due to the quantity and complexity of such data. Mining for frequently recurring patterns is a task often ran to help discovering the underlying structure hidden in the data. This year, we have explored how data symbolization and sequential pattern mining techniques could help for mining recurring patterns in multimedia data. In , we have shown that even if sequential pattern mining techniques are very helpful in terms of computational efficiency, the data symbolization step is a crucial step to find for extracting relevant audio patterns.
M. Sc. Internship of Amélie Royer, ENS Rennes.
Clustering algorithms exploit an input similarity measure on the samples, which should be fine-tuned with the data format and the application at hand. However, manually defining a suitable similarity measure is a difficult task in case of limited prior knowledge or complex data structures for example. While supervised classification systems require a set of samples annotated with their ground-truth classes, recent studies have shown it is possible to exploit classifiers trained on an artificial annotation of the data in order to induce a similarity measure. In this work, we have proposed a unified framework, named similarity by iterative classifications (SIC), which explores the idea of diverting supervised learning for automatic similarity inference. We studied several of its theoretical and practical aspects. We also have implemented and evaluate SIC on three tasks of knowledge discovery on multimedia content. Results show that in most situations the proposed approach indeed benefits from the underlying classifier's properties and outperforms usual similarity measures for clustering applications.
Work in collaboration with Cassio Elias dos Santos Jr. and William Robson Schwartz, in the framework of the Inria Associate Team MOTIF and of the STIC AmSud project Maximum.
Taking advantage of recent results on large-scale face comparison with partial least square, we developed various approaches for multimodal person discovery in TV broadcasts in the framework of the MediaEval 2015 international benchmark . The task consists in naming the persons on screen that are speaking with no prior information, leveraging text overlays, speech transcripts as well as face and voice comparison. We investigated two distinct aspects of multimodal person discovery. One refers to face clusters, which are considered to propagate names associated with faces in one shot to other faces that probably belong to the same person. The face clustering approach consists in calculating face similarities using partial least squares and a simple hierarchical approach. The other aspect refers to tag propagation in a graph-based approach where nodes are speaking faces and edges link similar faces/speakers. The advantage of the graph-based tag propagation is to not rely on face/speaker clustering, which we believe can be errorprone. The face clustering approach ranked among the top results in the international benchmark.
In collaboration with Jean Carrive and Félicien Vallet, Institut National de l'Audiovisuel.
In , we addressed the problem of unsupervised program structuring with minimal prior knowledge about the program. We extended previous work to propose an approach able to identify multiple structures and infer structural grammars for recurrent TV programs of different types. The approach taken involves three sub-problems: i) we determine the structural elements contained in programs with minimal knowledge about which type of elements may be present; ii) we identify multiple structure for the programs if any and model the structures of programs; iii) we generate the structural grammar for each corresponding structure. Finally, we conducted use-case based evaluations on real recurrent programs of three different types to demonstrate the effectiveness of the proposed approach.
Distributional thesauri are useful in many tasks of natural language processing. In , , we address the problem of building and evaluating such thesauri with the help of information retrieval (IR) concepts. Two main contributions are proposed. First, in the continuation of previous work, we have shown how IR tools and concepts can be used with success to build thesauri. Through several experiments and by evaluating directly the results with reference lexicons, we show that some IR models outperform state-of-the-art systems. Secondly, we use IR as an application framework to indirectly evaluate the generated thesaurus. Here again, this task-based evaluation validate the IR approach used to build the thesaurus. Moreover, it allows us to compare these results with those from the direct evaluation framework used in the literature. The observed differences question these evaluation habits.
In collaboration with Sébastien Lefèvre from Obelix Team (IRISA).
In this work, we explored the application of a tree-based feature extraction algorithm for the widely-used MSER features, and proposed a tree-of-shapes based detector of maximally stable regions. Changing an underlying component tree in the algorithm allows considering alternative properties and pixel orderings for extracting maximally stable regions. Performance evaluation was carried out on a standard benchmark in terms of repeatability and matching score under different image transformations, as well as in a large scale image retrieval setup, measuring mean average precision. The detector outperformed the baseline MSER in the retrieval experiments .
We also proposed a local region descriptor based on 2D shape-size pattern spectra, calculated on arbitrary connected regions, and combined with normalized central moments. The challenges when transitioning from global pattern spectra to the local ones were faced, and an exhaustive study on the parameters and the properties of the newly constructed descriptor was conducted. The descriptors were calculated on MSER regions, and evaluated in a simple retrieval system. Competitive performance with SIFT descriptors was achieved. An additional advantage of the proposed descriptors is their size which is less than half the size of SIFT , .
In collaboration with Mihir Jain (University of Amsterdam, The Netherlands) and Patrick Bouthemy (Team-project SERPICO, Inria Rennes, France)
Even though the importance of explicitly integrating motion characteristics in video descriptions has been demonstrated by several recent papers on action classification, our current work concludes that adequately decomposing visual motion into dominant and residual motions, i.e., camera and scene motion, significantly improves action recognition algorithms. This holds true both for the extraction of the space-time trajectories and for computation of descriptors. We designed in a new motion descriptor—the DCS descriptor—that captures additional information on local motion patterns enhancing results based on differential motion scalar quantities, divergence, curl and shear features. Finally, applying the recent VLAD coding technique proposed in image retrieval provides a substantial improvement for action recognition. These findings are complementary to each other and they outperformed all previously reported results by a significant margin on three challenging datasets: Hollywood 2, HMDB51 and Olympic Sports as reported in (Jain et al. (2013)).
Recently, word embedding representations have been investigated for slot filling in spoken language understanding (SLU), along with the use of neural networks as classifiers. Neural networks, especially recurrent neural networks, which are adapted to sequence labeling problems, have been applied successfully on the popular ATIS database. In , we make a comparison of this kind of models with the previously state-of-the-art conditional random fields (CRF) classifier on a more challenging SLU database. We show that, despite efficient word representations used within these neural networks, their ability to process sequences is still significantly lower than for CRF, while also having a drawback of higher computational costs, and that the ability of CRF to model output label dependencies is crucial for SLU.
Topic segmentation traditionally relies on lexical cohesion measured through word re-occurrences to output a dense segmentation, either linear or hierarchical. We have proposed a novel organization of the topical structure of textual content . Rather than searching for topic shifts to yield dense segmentation, our algorithm extracts topically focused fragments organized in a hierarchical manner. This is achieved by leveraging the temporal distribution of word re-occurrences, searching for bursts, to skirt the limits imposed by a global counting of lexical re-occurrences within segments. Comparison to a reference dense segmentation on varied datasets indicates that we can achieve a better topic focus while retrieving all of the important aspects of a text.
Work performed with Cassio Elias dos Santos Jr. during his 3 months visit, in collaboration with William Robson Schwartz (UFMG, Brasil), in the framework of the Inria Associate Team MOTIF.
Face recognition has been largely studied in past years. However, most of the related work focus on increasing accuracy and/or speed to test a single pair probe-subject. In , we introduced a novel method inspired by the success of locality sensing hashing applied to large general purpose datasets and by the robustness provided by partial least squares analysis when applied to large sets of feature vectors for face recognition. The result is a robust hashing method compatible with feature combination for fast computation of a short list of candidates in a large gallery of subjects. We provided theoretical support and practical principles for the proposed hashing method that may be reused in further development of hash functions applied to face galleries. Comparative evaluations on the FERET and FRGCv1 datasets demonstrate a speedup of a factor 16 compared to scanning all subjects in the face gallery.
Nowadays, many NLP problems are modelized as supervised machine learning tasks, especially when it comes to information extraction. Consequently, the cost of the expertise needed to annotate the examples is a widespread issue. Active learning offers a framework to that issue, allowing to control the annotation cost while maximizing the classifier performance, but it relies on the key step of choosing which example will be proposed to the expert. In , we have examined and proposed such selection strategies in the specific case of conditional random fields which are largely used in NLP. On the one hand, we have proposed a simple method to correct a bias of certain state-of-the-art selection techniques. On the other hand, we have detailed an original approach to select the examples, based on the respect of proportions in the datasets. These contributions are validated over a large range of experiments implying several tasks and datasets, including named entity recognition, chunking, phonetization, word sense disambiguation.
When real applications are working with automatic speech transcription, the first source of error does not originate from the incoherence in the analysis of the application but from the noise in the automatic transcriptions. In , we present a simple but effective method to generate a new transcription of better quality by combining utterances from competing transcriptions. We have extended a structured named entity (NE) recognizer submitted during the ETAPE challenge. Working on French TV and radio programs, our system revises the transcriptions provided by making use of the NEs it has detected. Our results suggest that combining the transcribed utterances which optimize the F-measure, rather than minimizing the WER scores, allows the generation of a better transcription for NE extraction. The results show a small but significant improvement of 0.9 % SER against the baseline system on the ROVER transcription. These are the best performances reported to date on this corpus.
In collaboration with Philippe-Henri Gosselin (ETIS team, ENSEA, Cergy, France)
We consider a pipeline for image classification or search based on coding approaches like bag of words or Fisher vectors. In this context, the most common approach is to extract the image patches regularly in a dense manner on several scales. In , we propose and evaluate alternative choices to extract patches densely. Beyond simple strategies derived from regular interest region detectors, we propose approaches based on super-pixels, edges, and a bank of Zernike filters used as detectors. The different approaches are evaluated on recent image retrieval and fine-grain classification benchmarks. Our results show that the regular dense detector is outperformed by other methods in most situations, leading us to improve the state of the art in comparable setups on standard retrieval and fined-grain benchmarks. As a byproduct of our study, we show that existing methods for blob and super-pixel extraction achieve high accuracy if the patches are extracted along the edges and not around the detected regions.
In collaboration with Michael Rabbat (McGill University, Montréal, Canada)
We considered the image retrieval problem of finding the images in a dataset that are most similar to a query image. Our goal is to reduce the number of vector operations and memory for performing a search without sacrificing accuracy of the returned images. We adopt a group testing formulation and design the decoding architecture using either dictionary learning or eigendecomposition. The latter is a plausible option for small-to-medium sized problems with high-dimensional global image descriptors, whereas dictionary learning is applicable in large-scale scenario. We evaluate our approach both for global descriptors obtained from SIFT and CNN features. Experiments with standard image search benchmarks, including the Yahoo100M dataset comprising 100 million images, show that our method gives comparable (and sometimes superior) accuracy compared to exhaustive search while requiring only 10 % of the vector operations and memory. Moreover, for the same search complexity, our method gives significantly better accuracy compared to approaches based on dimensionality reduction or locality sensitive hashing .
In collaboration with Anthony Bourrier and Patrick Pérez (Technicolor, Rennes, France), Florent Perronnin (Xerox, Grenoble, France) Rémi Gribonval (Team-project PANAMA, Inria Rennes, France).
Many approximate nearest neighbor search algorithms operate under memory constraints, by computing short signatures for database vectors while roughly keeping the neighborhoods for the distance of interest. Encoding procedures designed for the Euclidean distance have attracted much attention in the last decade. In the case where the distance of interest is based on a Mercer kernel, we propose a simple, yet effective two-step encoding scheme: first, compute an explicit embedding to map the initial space into a Euclidean space; second, apply an encoding step designed to work with the Euclidean distance. Comparing this simple baseline with existing methods relying on implicit encoding, we demonstrate better search recall for similar code sizes with the chi-square kernel in databases comprised of visual descriptors, outperforming concurrent state-of-the-art techniques by a large margin .
In collaboration with Yannis Avrithis (National Technical University of Athens, Greece)
Our work considers a family of metrics to compare images based on their local descriptors. It encompasses the VLAD descriptor and matching techniques such as Hamming Embedding. Making the bridge between these approaches leads us to propose a match kernel that takes the best of existing techniques by combining an aggregation procedure with a selective match kernel. The representation underpinning this kernel is approximated, providing a large scale image search both precise and scalable, as shown by our experiments on several benchmarks. We show that the same aggregation procedure, originally applied per image, can effectively operate on groups of similar features found across multiple images. This method implicitly performs feature set augmentation, while enjoying savings in memory requirements at the same time. Finally, the proposed method is shown effective for place recognition, outperforming state of the art methods on a large scale landmark recognition benchmark.
In collaboration with Miajing Shi, visiting Ph. D. student from Pekin University, and Yannis Avrithis (National Technical University of Athens, Greece)
Recent works show that image comparison based on local descriptors is corrupted by visual bursts, which tend to dominate the image similarity. The existing strategies, like power-law normalization, improve the results by discounting the contribution of visual bursts to the image similarity. We proposed to explicitly detect the visual bursts in an image at an early stage. We compare several detection strategies jointly taking into account feature similarity and geometrical quantities. The bursty groups are merged into meta-features, which are used as input to state-of-the-art image search systems such as VLAD or the selective match kernel. Then, we show the interest of using this strategy in an asymmetrical manner, with only the database features being aggregated but not those of the query. Extensive experiments performed on public benchmarks for visual retrieval show the benefits of our method, which achieves performance on par with the state of the art but with a significantly reduced complexity, thanks to the lower number of features fed to the indexing system , .
In collaboration with N. Grabar (STL), T. Hamon (LIMSI), and S. Le Maguer (Univ. Saarland).
The right of patients to access their clinical health record is granted by the code of Santé Publique. Yet, this piece of content remains difficult to understand. We propose different IR experiments in which we use queries defined by patients in order to find relevant documents , . We use the Indri search engine, based on statistical language modeling, as well as semantic resources. More precisely, our approaches are chiefly based on the terminological variation (e.g., synonyms, abbreviations) to link between expert and patient languages. Various combinations of resources and Indri settings are explored, mostly based on query expansion.
In the framework of our participation to the DeFT 2015 text-mining challenge, we have developped sentiment-analysis methods for tweets . Several sub-tasks have been considered: i) valence classification of tweets and ii) fine-grained classification of tweets (which includes two sub-tasks: detection of the generic class of the information expressed in a tweet and detection of the specific class of the opinion/sentiment/emotion. For all three problems, we adopt a standard machine learning framework. More precisely, three main methods are proposed and their feasibility for the tasks is analyzed: i) decision trees with boosting (bonzaiboost), ii) naive Bayes with Okapi and iii) convolutional neural networks (CNNs). Our approaches are voluntarily knowledge free and text-based only, we do not exploit external resources (lexicons, corpora) or tweet metadata. It allows us to evaluate the interest of each method and of traditional bag-of-words representations vs. word embeddings. Methods using simple ML frameworks and IR-based similarity metrics have been demonstrated to yield the best results.
Work performed in the framework of the CNRS PICS MMAnalytics, and in collaboration with Marcel Worring, Univeristy of Amsterdam (The Netherlands)
Digital photo collections—personal, professional, or social—have been
growing ever larger, leaving users overwhelmed. It is therefore
increasingly important to provide effective browsing tools for photo
collections. Learning from the resounding success of multi-dimensional
analysis (MDA) in the business intelligence community for on-line
analytical processing (OLAP) applications, we proposed a
multi-dimensional model for media browsing, called M
In collaboration with Sien Moens (Katholieke Universiteit Leuven, Belgium), Éric Jamet and Martin Ragot (Univ. Rennes 2, France).
In the context of the the CominLabs project "Linking media in acceptable hypergraphs" dedicated to the creation of explicit and meaningful links between multimedia documents or fragments of documents, we have introduced a typology of possible links between contents of a multimedia news corpus . While several typologies have been proposed and used by the community, we argue that they are not adapted to rich and large corpora which can contain texts, videos, or radio stations recordings. We have defined a new typology, as a first step towards automatically creating and categorizing links between documents' fragments in order to create new ways to navigate, explore, and extract knowledge from large collections.
We also investigated video hyperlinking based on speech transcripts, leveraging a hierarchical topical structure to address two essential aspects of hyperlinking, namely, serendipity control and link justification . We proposed and compared different approaches exploiting a hierarchy of topic models as an intermediate representation to compare the transcripts of video segments. These hierarchical representations offer a basis to characterize the hyperlinks, thanks to the knowledge of the topics which contributed to the creation of the links, and to control serendipity by choosing to give more weights to either general or specific topics. Experiments have been performed on BBC videos from the Search and Hyperlinking task at MediaEval. Link precisions similar to those of direct text comparison have been achieved however exhibiting different targets along with a potential control of serendipity.
The Search and Anchoring in Video Archives task at MediaEval addressed two issues: The Search part aims at returning a ranked list of video segments that are relevant to a textual user query; The Anchoring part focuses on identifying video segments that would encourage further exploration within the archive. Capitalizing on the experience acquired in previous participations, we implemented a two step approach for both sub-tasks . The first step, common to both, consists in generating a list of potential anchor segments and response-query segments relying on a hierarchical topical structuring technique. In the second step, for each query, the best 20 segments are selected according to content-based comparisons, while for the anchor detection sub-task, the segments are ranked based on a cohesion measure. The use of a hierarchical topical structure helps to propose segments of variable length at different levels of details with precise jump-in points for them. More, the algorithm deriving the structure relies on the burstiness phenomenon in word occurrences which gives an advantage over the classical bag-of-words model.
In collaboration with X. Tannier (LIMSI), A. Vilnat (LIMSI) and B. Arnulphy (ANR).
Identifying events from texts is an information extraction task necessary for many NLP applications. Through the TimeML specifications and TempEval challenges, it has received some attention in the last years; yet, no reference result is available for French. In , we try to fill this gap by proposing several event extraction systems, combining for instance Conditional Random Fields, language modeling and k-nearest-neighbors. These systems are evaluated on French corpora and compared with state-of-the-art methods on English. The very good results obtained on both languages validate our whole approach.
Video hyperlinking, TRECVid
Search and anchoring, Mediaeval Multimedia International Benchmark
Multimodal person discovery in broadcast TV, Mediaeval Multimedia International Benchmark
DeFT 2015 text-mining challenge
Teddy Furon spent 20 % of his time during 6 months to transfer research result to IRT B-com
CIFRE Ph. D. contract with Institut National de l'Audiovisuel (Bingqing Qu)
CIFRE Ph. D. contract with Technicolor (Himalaya Jain)
Ph. D. contract with Alcatel-Lucent Bell Labs (Raghavendran Balu) in the framework of the joint Inria-Alcatel Lucent lab.
Duration: 4 years, started in April 2014
Partners: Telecom Bretagne (IODE), Univ. Rennes II (CRPCC, PREFics), Univ. Nantes (LINA/TAL)
LIMAH aims at exploring hypergraph structures for multimedia collections, instantiating actual links reflecting particular content-based proximity—similar content, thematic proximity, opinion expressed, answer to a question, etc. Exploiting and developing further techniques targeting pairwise comparison of multimedia contents from an NLP perspective, LIMAH addresses two key issues: How to automatically build from a collection of documents an hypergraph, i.e., graph combining edges of different natures, which provides exploitable links in selected use cases? How collections with explicit links modify usage of multimedia data in all aspects, from a technology point of view as well as from a user point of view? LIMAH studies hypergraph authoring and acceptability taking a multidisciplinary approach mixing ICT, law, information and communication science as well as cognitive and ergonomy psychology.
Duration: 3 years, started in May 2012
Partner: Xerox Research Center Europe
The FIRE-ID project considers the semantic annotation of visual content, such as photos or videos shared on social networks, or images captured by video surveillance devices or scanned documents. More specifically, the project considers the fine-grained recognition problem, where the number of classes is large and where classes are visually similar, for instance animals, products, vehicles or document forms. We also assumed that the amount of annotated data available per class for the learning stage is limited.
Duration: 3 years, started in September 2012
Partners: Morpho, Univ. Caen GREYC, Telecom ParisTech
Content-based retrieval systems (CBRS) are becoming the main multimedia security technology to enforce copyright laws or to spot illegal contents over the Internet. However, CBRS were not designed with privacy, confidentiality and security in mind. This comes in serious conflict with their use in these new security-oriented applications. Privacy is endangered due to information leaks when correlating users, queries and the contents stored-in- the-clear in the database. This is especially the case of images containing faces which are so popular in social networks. Biometrics systems have long relied on protection techniques and anonymization processes that have never been used in the context of CBRS. The project seeks to a better understanding of how biometrics related techniques can help increasing the security levels of CBRS while not degrading their performance.
Duration: 3 years, started in Feb. 2015
Partners: AriadNext, IRCGN, École Nationale Supérieure de Police
The IDFRAud project consists in proposing an automatic solution for ID analysis and integrity verification. Our ID analysis goes through three processes: classification, text extraction and ID verification. The three processes rely on a set of rules that are externalized in formal manner in order to allow easy management and evolving capabilities. This leads us to the ID knowledge management module. Finally, IDFRAud addresses the forensic link detection problem and to propose an automatic analysis engine that can be continuously applied on the detected fraud ID database. Cluster analysis methods are used to discover relations between false IDs in their multidimensional feature space. This pattern extraction module will be coupled with a suitable visualization mechanism in order to facilitate the comprehension and the analysis of extracted groups of inter-linked fraud cases.
Duration: 2.5 years, started in May 2015
Partners: Eurecom, Avisto Telecom, Wildmoka, Envivio
Television is undergoing a revolution, moving from the TV screen to multiple screens. Today's user watches TV and, at the same time, browses the web on a tablet, sends SMS, posts comments on social networks, searches for complementary information on the program, etc. Facing this situation, NexGen-TV aims at developing a generic solution for the enrichment, the linking and the retrieval of video content targeting the cost-cutting edition of second screen and multiscreen applications for broadcast TV. The main outcome of the project will be a software platform to aggregate and distribute video content via a second-screen edition interface connected to social media. The curation interface will primarily make use of multimedia and social media content segmentation, description, linking and retrieval. Multiscreen applications will be developed on various domaine, e.g., sports, news.
Title: Unsupervised motif discovery in multimedia content
International Partner (Institution - Laboratory - Researcher):
Pontifícia Universidade Católica de Minas Gerais, Brasil - VIPLAB - Silvio Jamil Guimãraes
Universidade Federal Minas Gerais, Brasil - NPDI - Arnaldo Albuquerque de Araújo
Duration: 2014 - 2017
See also: http://
Motif aims at studying various approaches to unsupervised motif discovery in multimedia sequences, i.e., to the discovery of repeated sequences with no prior knowledge on the sequences. On the one hand, we will develop symbolic approaches inspired from work on bioinformatics to motif discovery in the multimedia context, investigating symbolic representations of multimedia data and adaptation of existing symbolic motif discovery algorithms. On the other hand, we will further develop cross modal clustering approaches to repeated sequence discovery in video data, building upon previous work.
National Institute for Informatics, Japan
University of Amsterdam, The Netherlands
Katholieke Universiteit Leuven, Belgium
National Technical University of Athens, Greece
PICS CNRS MM-Analytics
Title: Fouille, visualisation et exploration multidimensionnelle de contenus multimédia ; Multi-Dimensional Multimedia Browsing, Mining, Analytics (num 6382).
International Partner (Institution - Laboratory - Researcher):
Reykjavík University, Iceland - Björn Þór Jónsson
Jan. 2014 – Dec. 2016
STIC AmSud MAXIMUM Unsupervised Multimedia Content Mining
International coordinator: Guillaume Gravier, CNRS – IRISA, France
Scientific coordinators : Arnaldo de Albuquerque Araújo (Universidade Federal de Minas Gerais, Computer Science Department, NPDI); Benjamin Bustos (Universidad de Chile, Department of Computer Science, PRISMA); Silvio Jamil F. Guimarães (Pontifícia Universidade Católica de Minas Gerais, VIPLAB)
Jan. 2014 - Dec. 2015
France Berkeley Fund Graph-NN: Computing and Manipulating Very Large Graphs of Nearest Neighbors
International coordinator: Laurent Amsaleg, CNRS – IRISA, France
Scientific coordinators : Michael Franklin (AMPLab, UC Berkeley)
Jun. 2015 - Dec. 2015
Bùi Văn Thạch (Ph.D. Student)
Date: Oct 2015 - Nov 2015
Institution: National University of Sokendai, Japan
Ahmet Iscen
Date: Apr 2015 - Jun 2015
Institution: McGill University, Montreal, Canada
Balu Raghavendran
Date: Jul 2015 - Sep 2015
Institution: University of California Berkeley (United States of America)
Teddy Furon co-organized a GdR-ISIS workshop on Biometrics, Multimedia Indexing and Privacy.
Guillaume Gravier was general chair of the ACM Multimedia Third Workshop on Speech, Language and Audio in Multimedia (SLAM 2015) and is president of the steering committee of the workshop.
Pascale Sébillot was a member of the acting presidency of Conf. Francophone en Traitement Automatique des Langues Naturelles.
Pascale Sébillot is a member of the permanent steering committee of Conf. Francophone en Traitement Automatique des Langues Naturelles.
Vincent Claveau was area chair of Conf. Francophone en Traitement Automatique des Langues Naturelles.
Guillaume Gravier was chair of the program committee of the ACM Multimedia Third Workshop on Speech, Language and Audio in Multimedia (SLAM 2015).
Pascale Sébillot was area chair of Conf. Francophone en Traitement Automatique des Langues Naturelles.
Laurent Amsaleg was a PC member of: ACM Intl. Conf. on Multimedia; VISI, ACM Intl. Conf. on Multimedia Retrieval; IEEE Intl. Conf. on Multimedia and Exhibition; Intl. Conf. on Multimedia Modeling; Intl. Workshop on Content-Based Multimedia Indexing; Intl. Conf. on Similarity Search and Applications; Intl. Conf. on Signal Image Technolopgy & Internet-based Systems.
Vincent Claveau was a PC member of: Conf. en Recherche d’Information et Applications; Intl. Conf. on Web Intelligence.
Teddy Furon was a PC member of: IEEE Work. on Information Forensics and Security; Intl. Conf. on Acoustics, Speech and Signal Processing; Intl. Conf. on Multimedia, Communication and Computing; ACM Intl. Conf. on Multimedia; European Signal Processing Conf.
Guillaume Gravier was a PC member of: ACM Intl. Conf. on Multimedia; IEEE Intl. Conf. on Multimedia and Exhibition; Annual Conf. of the Intl. Speech Communication Association; IEEE Intl. Workshop on Multimedia Signal Processing; Intl. Workshop on Content-Based Multimedia Indexing; Intl. Conf. on Knowledge and Systems Engineering; Intl. Conf. on Statistical Language and Speech Processing.
Ewa Kijak was a PC member of Intl. Workshop on Content-Based Multimedia Indexing.
Christian Raymond was a PC member of: Annual Conf. of the Intl. Speech Communication Association; Conf. Francophone en Traitement Automatique des Langues Naturelles.
Pascale Sébillot was a PC member of Intl. Conf. on Terminology and Artificial Intelligence.
Vincent Claveau was a reviewer of: Intl. Conf. on Machine Learning.
Vincent Claveau is member of the editorial board of the journal Traitement Automatique des Langues.
Teddy Furon was member of the editorial board of IEEE Trans. on Information Forensics and Security (up to March 2015).
Christian Raymond is member of the editorial board of the online journal Discours.
Pascale Sébillot is: editor of the Journal Traitement Automatique des Langues; member of the editorial committee of the Journal Traitement Automatique des Langues.
Laurent Amsaleg reviewed for Knowledge and Information Systems.
Vincent Claveau reviewed for: Multimedia Tools and Applications, Traitement Automatique des Langues.
Teddy Furon reviewed for: IEEE Trans. on Information Forensics and Security; IEEE Trans. on Multimedia, Data Mining and Knowledge Discovery; ACM Trans. on Information Systems; IEEE Trans. on Circuits and Systems for Video Technology; Digital Signal Processing Journal; Applied and Computational Harmonic Analysis Journal; IET Information Security Journal.
Guillaume Gravier reviewed for: IEEE Trans. on Audio Speech and Language; IEEE Trans. on Image Processing; EURASIP Journal on Audio, Speech, and Music Processing; Journal of Computer Science and Technology; Multimedia Tools and Applications;
Christian Raymond reviewed for Computer Speech and Language.
Pascale Sébillot was member of the reading committee for several issues of the Journal Traitement Automatique des Langues.
Vincent Claveau gave an invited talk about biomedical NLP in Rennes University Hospital's computer science department.
Vincent Claveau is finance head of the Association pour la Recherche d'Informations et ses Applications (ARIA).
Vincent Claveau is deputy head of the GdR MaDICS, a CNRS inter-lab initiative to promote research about Big Data and Data Science.
Guillaume Gravier is president of the Association Francophone de la Communication Parlée (AFPC), French-speaking branch of the Intl. Speech Communication Association.
Guillaume Gravier is co-founder and general chair of the ISCA SIG Speech, Language and Audio in Multimedia.
Guillaume Gravier is member of the Community Council of the Mediaeval Multimedia Evaluation series.
Guillaume Gravier is the technical representative of Inria in the cPPP Big Data Value Association, actively working on technical aspects of data analytics.
Vincent Claveau served as expert for the ERC Consolidator grant programme, for the FNRS (Belgian Funding agency), for the Programme Hubert Curien.
Teddy Furon reviewed projects for Alpes Grenoble Innovation and French National Research Agency (ANR).
Teddy Furon is the scientific adviser for startup company Lamark (20 % of his time since July 2015).
Guillaume Gravier was vice-president of the Scientific Evaluation Committee of the National Research Agency for the theme 'HCI, Content, Knowledge, Big Data, Simulation, HPC'.
Pascale Sébillot reviewed projects for the Natural Sciences and Engineering Research Council of Canada.
Guillaume Gravier is a member of the Board of the technology cluster Images & Réseaux.
Guillaume Gravier is a member of the Board of the Comité des Projets of Inria - Rennes Bretagne Atlantique.
Pascale Sébillot is a member of the Conseil National des Universités.
Pascale Sébillot is a member of the theses advisory committee of the Matisse doctoral school.
For reseachers, all activities are given. For professors and assistant professors, only courses at the M. Sc. level are listed.
Master: Laurent Amsaleg, Multidimensional indexing, 13h, M2R, University Rennes 1, France
Master: Vincent Claveau, Data-Based Knowledge Acquisition: Symbolic Methods, 20h, M1, INSA de Rennes, France
Master: Vincent Claveau, Text Mining, 36h, M2, Univ. Rennes 1, France
Master: Vincent Claveau, Machine Learning for symbolic and sequential data, 7h, M2, Univ. Rennes 1, France
Master: Vincent Claveau, Information Retrieval, 15h, M2, ENSSAT, France
Master: Vincent Claveau, Information Retrieval, 13h, M2, Univ. Rennes 1, France
Licence: Teddy Furon, Probabilities, 40h, L1, Agrocampus Rennes, France
Licence: Guillaume Gravier, Databases, 30h, L2, INSA Rennes, France
Licence: Guillaume Gravier, Probability and statistics, 10h, L3, INSA Rennes, France
Master: Guillaume Gravier, Data analysis and probabilistic modeling, 30h, M2, University Rennes 1, France
Master: Ewa Kijak, Image processing, 64h, M1, ESIR, France
Master: Ewa Kijak, Supervised learning, 15h, M2R, University Rennes 1, France
Master: Ewa Kijak, Statistical data mining, 13h, M2, University Rennes 1, France
Master: Ewa Kijak, Indexing and multimedia databases, 15h, M2, ENSSAT, France
Master: Ewa Kijak, Computer vision, 15h, M2, ESIR, France
Master: Simon Malinowski, Short-term time series prediction, 29h, M1, Univ. Rennes 1
Master: Simon Malinowski, Supervised Learning, 10h, M2, Univ. Rennes 1
Master: Pascale Sébillot, Advanced Databases and Modern Information Systems, 70h, M2, INSA Rennes, France
Master: Pascale Sébillot, Data-Based Knowledge Acquisition: Symbolic Methods, 18h, M1, INSA Rennes, France
Master: Pascale Sébillot, Logic Programming, 12h, M1, INSA Rennes, France
PhD: Mohammed-Haykel Boukadida, Video summarization based on constraint programming, defended Dec. 2015, Patrick Gros
PhD: Bingqing Qu, Structure discovery in collections of recurrent TV programs, defended Dec. 2015, Guillaume Gravier
PhD: Anca Roxana Simon, Semantic structuring of video collections from speech: segmentation and hyperlinking, defended Dec. 2015, Guillaume Gravier and Pascale Sébillot
PhD in progress: Raghavendran Balu, Privacy-preserving data aggregation and service personalization using highly-scalable data indexing techniques, started Oct. 2013, Teddy Furon and Laurent Amsaleg
PhD in progress: Rémi Bois, Navigable directed multimedia hypergraphs: construction and exploitation, started October 2014, Guillaume Gravier and Pascale Sébillot
PhD in progress: Petra Bosilj, Content based image indexing and retrieval using hierarchical image representations, started October 2012, Ewa Kijak and Sebastien Lefèvre (with OBELIX, IRISA team)
PhD in progress: Ricardo Carlini Sperandio, Unsupervised motif mining in multimedia time series, started August 2015, Laurent Amsaleg and Guillaume Gravier
PhD in progress: Ahmet Iscen, Continuous memories for representing sets of vectors and image collections, started September 2014, Hervé Jégou and Teddy Furon
PhD in progress: Grégoire Jadi, Opinion mining in multimedia data, started October 2014, Vincent Claveau, Béatrice Daille (LINA, Nantes) and Laura Monceaux (LINA, Nantes)
PhD in progress: Raheel Kareem Qader, Phonology modeling for emotional speech synthesis, started January 2014, Gwénolé Lecorvé and Pascale Sébillot (with EXPRESSION, IRISA Team)
PhD in progress: Cédric Maigrot, Detecting fake information on social networks, started October 2015, Laurent Amsaleg, Vincent Claveau and Ewa Kijak
PhD in progress: Vedran Vukotič, Deep neural architectures for automatic representation learning from multimedia multimodal data, started October 2014, Guillaume Gravier and Christian Raymond
Laurent Amsaleg
PhD, Herwig Lejsek, Reykjavík University
Teddy Furon
PhD, Wei Fan, University of Grenoble
Guillaume Gravier
HDR, reviewer, Slim Essid, Telecom ParisTech
PhD, president, Grégor Dupuis, Université du Maine
Ewa Kijak
PhD, Cyrille Beaudry, Université de la Rochelle
Simon Malinowsky
PhD, Racha Khelif, Université de Franche-Comté
Pascale Sébillot
PhD, reviewer, Munshi Asadullah, Univ. Paris-Sud
PhD, reviewer, Sondes Bannour, Univ. Paris-Nord
PhD, president, Valéria Lelli Leitão Dantas, , Univ. Rennes 1
PhD, president, Bingqing Qu, Univ. Rennes 1
PhD, Mohamed Haykel Boukadida, Univ. Rennes 1
Pascale Sébillot: Invited speaker (6h) Introduction to issues and solutions in NLP, URFIST Bretagne et Pays de la Loire seminar, June 2015.