SomeRDFS in the Semantic Web

gemo Management of Data and Knowledge Distributed Over the Web SYM Serge Abiteboul INRIA Chercheur

Futurs

DR-INRIA oui Ioana Manolescu INRIA Chercheur

Futurs

CR-INRIA Luc Segoufin INRIA Chercheur

Futurs

DR-INRIA oui Philippe Chatalic UnivFr Enseignant

Futurs

Assistant Professor, Univ. Paris 11 Philippe Dague UnivFr Enseignant

Futurs

Professor, Univ. Paris 11 oui Hélène Gagliardi UnivFr Enseignant

Futurs

Assistant Professor, Univ. Paris 11 François Goasdoué UnivFr Enseignant

Futurs

Assistant Professor, Univ. Paris 11 Nathalie Pernelle UnivFr Enseignant

Futurs

Assistant Professor, Univ. Paris 11 Chantal Reynaud UnivFr Enseignant

Futurs

Professor, Univ. Paris 11 oui Brigitte Safar UnivFr Enseignant

Futurs

Assistant Professor, Univ. Paris 11 Laurent Simon UnivFr Enseignant

Futurs

Assistant Professor, Univ. Paris 11 Véronique Ventos UnivFr Enseignant

Futurs

Assistant Professor,Univ. Paris 11 Stéphanie Meunier INRIA Assistant

Futurs

ITA oui Marie Domingues INRIA Assistant

Futurs

ITA oui Sophie Cluet INRIA Chercheur

Futurs

Department Director, MESR Tarek Melliti UnivFr Enseignant

Futurs

Assistant Professor, Univ. Evry Val d'Essonne Dan Vodislav UnivFr Enseignant

Futurs

Assistant Professor, CNAM Paris Philippe Rigaux UnivFr Enseignant

Futurs

Professor, U. Paris Dauphine, on sabbatic Marie-Christine Rousset UnivFr Enseignant

Futurs

Professor, Univ. Grenoble oui Laura Brandan Briones CNRS PostDoc

Futurs

Post Doc fellowship Farid Nouioua CNRS PostDoc

Futurs

Post Doc fellowship Neoklis Polyzotis UnivEtrangere Enseignant

Futurs

Assistant Professor, U.C. Santa Cruz, 1 month Hassan Shraim CNRS PostDoc

Futurs

Post Doc fellowship Michalis Vazirgiannis UnivEtrangere Enseignant

Futurs

Professor, U. Athens, Marie-Curie fellowship Victor Vianu UnivEtrangere Enseignant

Futurs

Professor, U.C. San Diego, 2 months Yuhong Yan UnivEtrangere Chercheur

Futurs

Research Officer, NRC Canada, IIT, Fredericton, 1 month Haïfa Zargayouna INRIA PostDoc

Futurs

Post Doc fellowship Omar Aaouatif UnivFr Technique

Futurs

3 months internship Ali Aharbil INRIA Technique

Futurs

3 months internship Anca Ghitescu INRIA Technique

Futurs

from November Mohamed Ouazara INRIA Technique

Futurs

from September Evaldas Taroza INRIA Technique

Futurs

till October Gabriel Vasile INRIA Technique

Futurs

Nada Abdallah UnivFr PhD

Futurs

Allocataire MENRT, Paris 11 Andrei Arion UnivFr PhD

Futurs

Allocataire MENRT, Paris 11 Vincent Armant UnivFr PhD

Futurs

Allocataire MENRT, Paris 11 Pierre Bourhis UnivFr PhD

Futurs

ENS Cachan, since September François-Elie Calvier CNRS PhD

Futurs

Grant BDI CNRS, Paris 11 Bogdan Cautis UnivFr PhD

Futurs

Allocataire MENRT, Paris 11, till September Claire David UnivFr PhD

Futurs

ENS Cachan Charaf Laissoub CNRS PhD

Futurs

Contrat ANR, till April Yingmin Li UnivFr PhD

Futurs

European contract Gia Hien Nguyen CNRS PhD

Futurs

Grant MENRT, Grenoble 1 Bogdan Marinoiu CNRS PhD

Futurs

Grant BDI CNRS, Paris 11 Antonella Poggi AutreAffiliation PhD

Futurs

PhD in cotutelle between U. di Roma and Paris 11, till April Nicoleta Preda UnivFr PhD

Futurs

Allocataire MENRT, Paris 11 Radu Pop AutreAffiliation PhD

Futurs

Cifre with Mandriva Software Cedric Pruski AutreAffiliation PhD

Futurs

PhD in cotutelle between Luxembourg U. and Paris 11 Fatiha Sais UnivFr PhD

Futurs

Contrat FTRD Mathias Samuelides UnivFr PhD

Futurs

ENS Cachan Pierre Senellart UnivFr PhD

Futurs

ENS Ulm Mouhamadou Thiam AutreAffiliation PhD

Futurs

PhD in cotutelle between Gaston Berger U. and Paris 11 Lina Ye UnivFr PhD

Futurs

Allocataire MENRT, Paris 11 Spyros Zoupanos INRIA PhD

Futurs

CORDI Overall Objectives Introduction

See http:// gemo. futurs. inria. fr/

Information available online is more and more complex, distributed, heterogeneous, replicated, and changing. Web services, such as SOAP services, should also be viewed as information to be exploited. The goal of Gemo is to study fundamental problems that are raised by modern information and knowledge management systems, and propose novel solutions to solve these problems.

Highlights of the year

A lot of work has been devoted to the ANR Project WebContent.

Serge Abiteboul has been the recipient of the EADS Award in Computer Science, that is selected by the Academy of Science.

Scientific Foundations Scientific Foundations Databases knowledge representation data integration semantic integration ontology query language query optimization distributed query peer-to-peer (p2p) semi-structured data XML World Wide Web Web services change control logic complexity

A main theme of the team is the integration of information, seen as a general concept, including the discovery of meaningful information sources or services, the understanding of their content or goal, their integration and the monitoring of their evolution over time.

Gemo works on environments that are both powerful and flexible to simplify the development and deployment of applications providing fast access to meaningful data. In particular, content warehouses and mediators offering a wide access to multiple heterogeneous sources provide a good means of achieving these goals.

Gemo is a project born from the merging of INRIA-Rocquencourt project Verso, with members of the IASI group of LRI. It is located in Orsay-Saclay. A particularity of the group is to address data and knowledge management issues by combining techniques coming from artificial intelligence (such as classification) and databases (such as indexing).

Some prospective work is presented in . The goal is to enable non-experts, such as scientists, to build content sharing communitiesin a true database fashion: declaratively. The proposed infrastructure is called a data ring.

Application Domains Application Domains Web telecommunications electronic commerce enterprise portal search engine data warehousing multimedia

Databases do not have specific application fields. As a matter of fact, most human activities lead today to some form of data management. In particular, all applications involving the processing of large amounts of data require the use of databases.

Technologies recently developed within the group focus on novel applications in the context of the Web, telecom, multimedia, enterprise portals, or information systems open to the Web. For instance, in the setting of the EDOS EC Project, we are developing some software for the P2P management the data and metadata of Mandriva Linux distribution.

Software Software

Some recent software developed in Gemo:

ActiveXML: a language and system based on XML documents containing Web service calls. ActiveXML is now in Open Source within the ObjectWeb Forge.

SomeWhere: a P2P infrastructure for semantic mediation.

SomeWhere+: a P2P infrastructure tolerant to inconsistency.

KadoP: a peer-to-peer platform for warehousing of Web resources.

OptimAX: an algebraic cost-based optimizer for ActiveXML.

TaxoMap: a prototype to automate semantic mappings between taxonomies

XTAB2SML: an automatic ontology-based tool to enrich tables semantically

WebQueL: a multi-criteria filtering tool for Web documents, developed in the setting of the e.dot project.

ULoad: a tool for creating and storing XML materialized views, and using them to answer XQuery queries.

GUNSAT: a greedy local search algorithm for propositional unsatisfiability testing

LN2R: a logical and numerical tool for references reconciliation

New Results Theoretical foundations Serge Abiteboul Bogdan Cautis Luc Segoufin Pierre Senellart Victor Vianu Semi-structured data query languages automata verification

One of the reasons for the success of the relational data model was probably its clean theoretical foundations. Obtaining such a clean foundation for the semistructured data model and XML is still an on-going research task.

With XML documents, data may be extracted, queried, or used in navigation, because of its association with a position in a document, rather than because of its actual content. It is thus believed that those foundations will be based on tree automata and on Monadic-Second-Order (MSO) logic making use of the tree structure of XML data. Towards this direction we studied in some complexity issues related to a sequential family of tree automata, which has the same expressive power than unary Transitive-Closure logic. But of course data values cannot be completely ignored. In we show how to use decidable logics over infinite alphabet in the XML context for deciding XML-schema validation and XPath query inclusion.

By essence XML is used in an Internet based environment. On the Web, one may have to process on the fly, heavy streams of information, to support the surveillance of rapidly changing data sources. Also, by the nature of the web, the information is imprecise, incomplete, inconsistent, of uneven quality. The answer to a Web query may include huge number of results (see Google search) and it is typically as important to rank these results than to obtain them.

We have considered streaming XML data with limited memory resources. In this context, we considered in the DTD validation problem: checking whether a XML document conforms a DTD.

It is often desirable that a user only has a partial access to a database and several users see different parts of the databases. The subpart that is seen by a user is called a views. When a user specify its query, this query has to be rewritten according the real database and then evaluated. In we study the language necessary for rewriting a Conjunctive Queries.

We continued in work on probabilistic semi-structured models. We give complexity results about the probabilistic tree model (based on trees where nodes are annotated with conjunctions of probabilistic event variables) that was previously introduced. We identify a very large class of queries for which queries and elementary updates are tractable. We also consider other theoretical issues, as the equivalence of probabilistic trees or the validation of a probabilistic tree against a DTD.

A new challenge is the study of XML when used in the dynamic environment of the Web. As XML is used as an exchange format for data over the Web, systems using XML, such as Web services, must manipulate highly heterogeneous data formats. In order to reduce the risk of failure it is therefore important to be able to perform offline static analysis of the programs developed in such systems. Gemo has started studying problems related to verificationof systems for XML.

Ontology-Based Information Retrieval Nathalie Pernelle Cedric Pruski Chantal Reynaud Mouhamadou Thiam information retrieval query enrichment ontology evolution resource annotation Ontology-Based Queries Enrichment

In order to improve Web Information Retrieval using Ontologies, we proposed an extension and an implementation of OWL for Web Queries Enrichment. This work has been done in the setting of the O3 approach designed by Cedric Pruski in Luxembourg during his Master of Science. O3 uses the WordNet linguistic tool in order to optimize, in terms of relevance, the returned documents when searching the Web. Its main idea consists in enriching, following well-defined rules, the query constructed by users by extracting from WordNet the appropriate vocabulary that characterizes best the search domain. O3 has been formalized using first-order logic and graph theory. This formal framework permitted the rigorous definition of query expansion rules. In parallel, the standardization of OWL has hastened the quick and massive development of OWL ontologies across the Web. This is why, to benefit from both O3 and OWL ontologies, we decided to make O3 compatible with OWL. We studied the possibilities offered by OWL that cope with O3 as well as an extension of the language. We implemented the so called extended OWL through query expansion rules and we made an experimental validation using the TARGET tool 2007 .

Supporting Ontology Evolution

First, we surveyed techniques for ontology evolution 2007 . After identifying the different kinds of evolution the Web is confronted with, we detailed the various existing languages and techniques devoted to Web data evolution, with particular attention to Semantic Web concepts, and how these languages and techniques can be adapted to evolving data in order to improve the quality of Web Information Systems Applications. Second, we proposed a set of modelling features for ontology evolution 2007 . These features have been defined after the rigorous study of the evolution of a particular domain (the domain defined by the WWW series of conference topics) over a ten years period of time. The results of this study lead directly to the definition of the various kinds of evolution that can appear. They allowed the proposition of modelling features that aims at designing evolving ontologies. Indeed, these features will allow us to understand the evolution of ontologies and will aid to predict future versions of ontologies. We highlighted the contribution of such ontologies through an example implementing ontology-based query expansion techniques to improve the relevance of documents when searching the web.

Semantic Annotation

Where data sources are numerous and heterogeneous, a data integration system needs automatic tools to annotate and query semi-structured documents. We propose an automatic approach for semantic annotation of HTML or XML documents . It relies on the model describing the domain of interest. The difficulty lies in the heterogeneous structure of the documents and in that a document contains both structured and unstructured parts. To overcome this problem, we have defined a first set of annotation rules using SWRL. That rules take into account both the semantic relations defined in the model and the heterogeneity of documents structure. The resulting annotated documents are represented in RDF language according to the semantic RDF Schema model which is extended to the annotation task from the domain description. Since october 2007, this work is done in the setting of the SHIRI project which is supported by the DIGITEO Foundation.

Peer-to-Peer Inference Systems Nada Abdallah Vincent Armant François-Elie Calvier Philippe Chatalic Philippe Dague François Goasdoué Gia Hien Nguyen Chantal Reynaud Marie-Christine Rousset Laurent Simon distributed reasoning propositional logic RDF inconsistency ontology alignment distributed diagnosis

Peer-to-Peer Inference Systems (P2PISs) are made of autonomous peers (i.e., built and managed independently) that can communicate in order to perform an inference task at the P2PIS level (e.g., consequence finding or query answering). For that purpose, communication rules between peers are modeled by mappingsthat define semantic relationships between their knowledge.

A crucial aspect of that new setting is that peers are equivalent in functionalities and no actor has a global view of a P2PIS, i.e., there is no centralized control or hierarchical organization in the system. Each peer only knows the knowledge it manages and its mappings with some other peers. This raises exciting non trivial algorithmic issues since, in the literature, reasoning algorithms have been designed with the assumption that the knowledge on which inferences have to be performed is given as an input. New decentralized algorithms have to be designed with the idea that only a subset of the global knowledge is available to a peer as an input (i.e., the peer's knowledge and mappings), but the algorithms have still to be sound and complete for the inference task w.r.t. the global knowledge of the P2PIS (i.e., the knowledge and mappings of all the peers). The SomeWhere platform has been developed for experimenting with such distributed reasoning tasks. It is a building block of the MEDIAD project with France Telecom R&D, as well as one of the components being integrated in the platform to be produced by the WebContent project.

Consequence Finding

Many challenging Artificial Intelligence (AI) tasks like common sense reasoning, diagnosis, or knowledge compilation can be stated in terms of consequence finding. That key inference basically consists in deriving theorems of interest that are intentionally characterized within a logical theory. Such theorems can be those in terms of a fixed language, those resulting from some incoming knowledge in the theory, etc.

Recently, we have designed the first peer-to-peer inference systems for consequence finding, in which each peer manages a clausal theory of propositional logic in terms of its own set of propositional variables. A peer establishes (or suppresses) a mapping by adding to (or removing from) its theory a clause made of some of its variables and some variables from other peers, those peers being notified of the operation. For those systems, we have proposed the Decentralized Consequence finding Algorithm ( DeCA) that performs a decentralized resolution procedure in order to compute clausal implicates (i.e., consequences) of a clause submitted to a peer w.r.t. a P2PIS, including allthe proper prime ones(i.e., the strongest consequences).

A key point in the design of the above P2PISs is that mappings are undirected, i.e., any peer involved in a mapping can use it to propagate knowledge to the other peers participating in the mapping. Therefore, such systems model autonomous components that communicate through interfaces that are both input and output.

We have recently proposed an alternative design of P2PISs in which mappings are directed. A mapping is stated between two peers, but only one of them can use the mapping to propagate knowledge to the other. From a practical viewpoint, a mapping from a peer to another specifies some knowledge that the former peer has to observe and the knowledge it must notify to the latter peer if the observed knowledge holds. Such new P2PISs are of great interest in order to apply AI reasoning because they can model many real applications in which autonomous components communicate through interfaces that are either input or ouput, like distributed functions in Automotive Engineering, distributed control systems for industrial machinery or processes in Automation, etc. For those systems, we have proposed a new Decentralized Consequence finding Algorithm for directed mappings ( DeCA _K) that computes clausal implicates of a clause submitted to a peer w.r.t. a P2PIS, including allthe proper prime ones.

In P2PIS retaining the classical semantics for mappings (i.e. undirected view), the ability of each peer to freely add new mappings with other peers may have affect the consistency of the global resulting theory. This cannot be avoided because of the decentralised nature of the architecture. In order to prevent the trivialisation of the reasoning in such cases, we have designed a method able to detect incrementally all possible minimal causes of inconsistency and to store them in a distributed way in the P2PIS. Furthermore, we have proposed a new distributed consequence finding algorithm ( WFDeCA) able to perform well founded reasoning despite the presence of possible inconsistencies. These algorithms have been implemented and an experimental evaluation is underway. One noticeable feature of WFDeCAis that different consequences, though all well founded, may have different supports that are not necessarily consistent with each other. In such cases, it is up to user's responsibility to choose between consequences having incompatible supports. One possible choice criteria is to prefer the most trustableconsequences. We are currently investigating on different trust models that have been proposed for P2P file sharing system and consider their possible adaptation the task of distributed consequence finding.

Distributed Diagnosis

The logical theory of consistency-based diagnosis has been worked out in the eighties in the centralized case. It starts from a model, assumed to be given, of the behavior of the (component based-) system in consideration (correct behavior and possibly some faulty behaviors if known by advance) and aims at maintaining consistency between the current hypotheses of behavioral modes of the components (correct or faulty) and the observations (e.g. sensors measurements). It is stated in a logical framework, where the model SD - for System Description - and the observations OBS are expressed in first order logic, the mode of each element in COMPS being explicitly represented thanks to the predefined Ab (for Abnormal) predicate (so, ¬Ab(c) means that component c is correct and Ab(c) that component c is faulty). Diagnostic reasoning is a typical example of non monotonic reasoning: initially all components are assumed to be correct, up to the moment this becomes inconsistent with observations. Then consistency between the model and the observations is restored by changing some component mode assignment from correct to faulty (in general, a principle of parsimony is applied and we are interested only in minimal - for cardinality or for set inclusion - sets of faulty components). Technically, from a logical inference point of view, computation of the diagnoses (complete components modes assignments consistent with observations) relies on calculus of (prime) implicates and implicants of SD $\cup$ OBS in terms of the target language built from the Ab(c), for c in COMPS. This diagnostic activity can be done off line - from a given set of observations - or on line in a general monitoring framework where new observations occur along time, and where the real (unknown) mode of each component can itself vary along time (from correct to faulty but also from faulty to correct in case of transient faults). Assuming centralized system, centralized model, centralized diagnostic algorithm is a severe restriction for several real case applications: the system can be "naturally" distributed (telecommunication networks, Web services, etc.), the system can be too huge or complex to have a unique storable and accessible global model, privacy issues can prevent the existence of such a global model, the diagnostic algorithm can take advantage to perform decentralized local diagnosis and its implementation to be decentralized on several control units. This is why decentralized diagnosis receives a growing interest from some years. The work that has been initiated is an attempt to design, implement and test distributed consistency-based diagnosis algorithms in a logical framework, relying on previous work conducted inside Gemo on P2PISs, in particular what concerns consequence finding and handling of inconsistencies. In this P2P framework, each peer represents a subsystem and its local theory is the propositionalized subsystem description, the mappings (shared variables) expressing connections between subsystems. Observation peers (sensors) have a local theory limited to a propositional symbol expressing the measurement's value. The algorithm currently developed for generating minimal diagnoses relies on a distributed computation of (restrictions to the target language of) implicants of the global (unknown as a whole) theory . Several problems will have to be addressed in the future: incrementality w.r.t. increasing asynchronous observations; characterization of all diagnoses in presence of fault models; on line monitoring and diagnosis with observations varying asynchronously along time; repair by reconfiguring the system (changing mappings); open world (addition or suppression of peers), etc.

Mapping distributed ontologies

In the setting of the MediaD project we address the problem of discovering mappings between distributed ontologies in the setting of SomeRDFS, a peer data management system (PDMS)derived from SomeWhere. Since the setting of PDMS is particular, we proposed techniques to take advantage of SomeRDFS reasoning in order to help discovering mappings between the knowledge of each peer, i.e. ontologies, that can be mapping shortcuts or new mappings 2007 . The aim of the proposed techniques is to discover elements that are relevant to be mapped. These elements will then be aligned applying usual alignment techniques. The implementation of this work is in progress.

Thematic Web Warehousing François-Elie Calvier Hélène Gagliardi Nathalie Pernelle Chantal Reynaud Marie-Christine Rousset Fatiha Sais Brigitte Safar Haïfa Zargayouna Warehouse thematic information ontology alignment techniques Reference reconcilation

We are working on the reference reconciliation problem. It consists in deciding whether different identifiers refer to the same data, i.e. correspond to the same world entity (the same hotel, the same person, ...). We have developped a logical and numerical approach named LN2R (L2R + N2R) which is automatic and guided by the semantic of an RDFS+ schema. L2R is logic-based , . In the N2R method, the semantics of the schema is exploited by an informed similarity measure which is used by a numerical computation of the similarity of reference pairs. This numerical computation is expressed in an equation system that is non linear. We have shown on one benchmark dataset that we can obtain better results than supervised approach. We have also studied the scalability of such approaches , . This work is done in the setting of the Picsel3 project.

Mapping between ontologies

This work has been initiated in the setting of the e.dot project. We worked on the mappings between different taxonomies in order to access to several sources from a unique querying system. We explored some alignment techniques to generate semantic mappings automatically. The originality of the approach is to be a combination of terminological, structural and semantic techniques well-suited to the mapping of taxonomies which are schemas with very poor definitions of concepts, so mainly defined with reference to the terminology. A prototype, TaxoMap, finds mappings or suggests indicators to help users find mappings 2007 .We continue our work on TaxoMap in the setting of the WebContent project. First, we investigated techniques which rely on an additional source, called background knowledge. We made a comparative analysis of works using background knowledge 2007 . We studied the difficulties encountered when using Wordnet 2007 and we showed how the Taxomap system can avoid these difficulties 2007 . Further work has been done on adapting TaxoMap for the Ontology Alignment Evaluation Initiative (OAEI 2007) campaign. So we participated in the OAEI 2007 campaign 2007 which consists of applying matching systems to ontology pairs and evaluating their results. Moreover, TaxoMap has been tested and evaluated together with OLA jointly developed by the teams at Diro, university of Montreal and at INRIA Rhône-Alpes (EXMO group) in the setting of the WebContent project. The following corpora have been chosen: a corpus delivered by EADS in the aeronautics field, OAEI Benchmark test and AGROVOC-NAL, two very rich thesauri used in the "food" corpus in the OAEI 2007 campaign. These experiments have shown the complementary nature of the two tools and have emphasized two main difficulties: the alignment of very large ontologies and the evaluation of the results when no reference mappings are provided.

XML query optimization Serge Abiteboul Andrei Arion Ioana Manolescu Spyros Zoupanos Semi-structured Data Query Optimization Materialized views for XML queries

The problem of XML query evaluation still poses significant challenges. In particular, the complexity of the XQuery language, standardized by the W3C, makes it very difficult to devise efficient storage and optimization strategies. We have proposed a new language for describing materialized XML views, which can be used to speed up the processing of XML queries. We have devised associated algorithms for rewriting XQuery queries based on this rich view language .

While materialized views can speed up query processing, their practical applicability requires several developments. First, they have to be maintained in the event of updates applied to the underlying documents. The internship of Abhipreet Das (IIT Bombay) has focused on proposing algorithms for incrementally propagating updates to the materialized views. Second, view selection may be cumbersome to the user, therefore automated view selection mechanisms are needed. The internship of Nikhil Pandey (IIT Bombay) has lead to some work in this area, however the problem was not fully solved.

Algebraic optimization for ActiveXML

The ActiveXML language (AXML in short) allows describing complex distributed data manipulation tasks. Each such task could be executed in many ways producing the same results but with very different performance. We have made important progress in laying out an algebraic formalism for optimizing AXML document evaluation, more precisely on specifying a small set of special Web services dedicated to distributed evaluation and on their usage within the optimizer. The first prototype of an AXML optimizer, OptimAX, has been developed and demonstrated . The optimizer is integrated with a new version of an AXML peer, developed mostly this year by E. Taroza.

Performance evaluation methodology

Performance evaluation is a natural component in many data-oriented works such as those carried on in Gemo. However, the complexity of the languages we target, such as XQuery, and the complexity of settings in which our techniques are deployed, such as peer-to-peer systems, make the task of performance evaluation very complex. For instance, in a peer-to-peer XML data management setting, one has to distinguish the impact of the underlying peer network from that of data indexing, from that of query evaluation algorithms, and finally from the optimizer quality. Benchmarks are essential tools for performance evaluation. We have proposed a benchmark for XML data management in P2P, named P2PTester , designed to ease and systematize the task of performance evaluation. Performance evaluation in the large raises lively discussion; a panel organized in the VLDB conference on this topic received significant attention . Participants agreed on the need for a more thorough procedure both for performing performance evaluation and for ensuring such evaluations are repeatable.

Self tuning

We started some collaborative work with UCSC and U.Tel Aviv on a framework for Continuous On-Line Tuning , a novel self-tuning framework that continuously monitors the incoming queries and adjusts the system configuration in order to maximize query performance. The key idea behind Colt is to gather performance statistics at different levels of detail and to carefully allocate profiling resources to the most promising candidate configurations. Moreover, Colt uses effective heuristics to self-regulate its own performance, lowering its overhead when the system is well tuned and being more aggressive when the workload shifts and it becomes necessary to re-tune the system. We considered the design of the generic Colt system, and its specialization to the important problem of selecting an effective set of indices for a relational query load. We developped an implementation of the proposed framework in the PostgreSQL database system and evaluated its performance experimentally. Our results validate the effectiveness of Colt in self-tuning a relational database, demonstrating its ability to modify the system configuration in response to changes in the query load. Moreover, Colt achieves performance improvements that are comparable to more expensive off-line techniques, thus verifying the potential of the on-line approach in the design of self-tuning systems.

XML Warehousing in P2P Serge Abiteboul Ioana Manolescu Nicoleta Preda Melanie Weis Warehouse XML P2P XML indexing: KadoP

We have worked on the optimization of KadoP, a peer-to-peer platform for building and managing warehouses of Web resources. KadoPrelies on a Distributed Hash Table implementation (namely, FreePastry) to keep the network of peers connected, and to build a shared global resource index, and on the ActiveXML platform to store, query, and maintain the index. Furthermore, KadoPis able to process simple queries carrying over resources distributed in the whole network. A main goal is to be able to index not only extensional XML data but also intensional one and in particular Web services.

A recent development of the system includes two techniques meant to handle efficiently long posting lists exchanged during query processing. The first technique relies on a distributed search structure that parallelizes the transfer of long posting lists, while the second enables to reduce the transferred lists at the expense of some precision. These techniques are described in .

We have also participated to the development of a prototype for measuring the performance of P2P queries .

XML data cleaning: XClean

In the context of XML data warehousing, it often happens that different XML representations of a same object appear in the sources. In this context, it becomes necessary to identify common entities in the XML sources and propose a consolidated version thereof. We have proposed the XClean framework for declaratively specifying data cleaning processes, which are then compiled into XQuery queries . M. Weis has developed a prototype implementing this framework, which has been demonstrated .

Monitoring and Web services Serge Abiteboul Laura Brandan Briones Pierre Bourhis Philippe Dague Yingmin Li Bogdan Marinoiu Tarek Melliti Lina Ye Web services monitoring model-based diagnosis diagnosability repair repairability self-healing formal models Error diagnosis and self-healing

This work, that began at the end of 2005, is carried out in the framework of the European project WS-DIAMOND, up to mid 2008. It is well-known that self-healing software is one of the challenges for IST research. This project aims to take a step in this direction by developing a framework for self-healing Web Services. The goal is to produce:

an operational framework for self-healing service execution of conversationally complex Web services, where monitoring, detection and diagnosis of anomalous situations, due to functional (in particular semantic) or non-functional errors (e.g., Quality of Service), is carried on and repair/reconfiguration is performed, thus guaranteeing reliability and availability of Web services;

a methodology and tools for service design that guarantee effective and efficient diagnosability/repairability during execution;

demonstration of these results on real applications.

Our main involvement in this project is about model-based diagnosis of cooperative Web services, i.e. apply to P2P distributed software systems the techniques developed in Artificial Intelligence and successfully applied to engineered centralized hardware systems. Our two other contributions concern formal models for Web services, as the method rests entirely on the existence of adequate behavioral models to which actual observations are compared, and study of diagnosability at the design stage, which is the common trend to diagnosis activities in all branches of industry.

During the two first years, the following work has been achieved:

Developing an observation and data log platform for basic Web services.

An extension of the Web service deployment specification (WSDD file) is defined, allowing the developer to specify for each operation what are the informations to log and the privacy police of their accessibility. The standard AXIS deployment platform is enriched by an observation handler generator and an information Web service generator. Each time a basic WS is invoked, its associated information WS is invoked too and records in databases (via an interface with MySQL) all its inputs, outputs and error messages specified in the WSDD extension, with the given privacy policy. This can be applied to the information WS itself, which is thus self-observed. All these extensions and log capabilities were implemented in Java. The logged information will be used by the diagnosis algorithm to identify the primary cause(s) of a detected symptom.

Modeling BPEL Web services for diagnosis.

A method to generate automatically a diagnosis model, in the form of data dependency relations (analogous to dynamic slicing methods in software debugging), for orchestrated complex Web services has been developed. BPEL (Business Process Execution Language) basic and structured activities are first modeled with Petri nets, places being used to represent data and transitions to represent activities. For that, control places, in charge of transmitting activation, are added to data places (in particular an input and output activation places) and reading arcs (along which tokens are not propagated) are added to normal arcs. Operational dependency between the transitions executions is thus captured. In order to capture data dependency (which is essential for diagnosis of semantic faults), each transition of the Petri net is enriched with a set of basic data dependency relations expressing that an input is just forwarded to output, or that an output is created by the operation, or that an output is elaborated from one or several inputs. In order to aggregate such enriched Petri nets, composition rules of these basic relations are defined, for different modes (sequential, alternative, hierarchical through data structures). Based on these rules, an algorithm is designed that builds the data dependency model of an orchestrated BPEL service from the analysis of its BPEL code and the models exposed by the private services it invokes. This data dependency model is expressed as a set of propositional Horn clauses that will be used by the diagnosis algorithm. The enriched Petri net generator, which takes as input a BPEL code and produces as output its enriched Petri net model in the form of an xml file, and the diagnostic model compiler, which takes as input an enriched Petri net model and produces as output its associated diagnostic knowledge base as a set of causal rules expressed as logical Horn clauses, both in the form of xml files, have been implemented in Java and tested on examples (in particular the Foodshop service used in the WDS-DIAMOND project).

Developing a decentralized diagnostic algorithm

A decentralized on line diagnostic algorithm for BPEL orchestrated Web services has been designed, that relies on the local diagnostic models of each Web service built off line as explained above and on the observations stored on line by the data log platform. A local diagnoser is provided to each BPEL service, that performs local consistency-based diagnosis thanks to the local diagnostic model of the service (initially a diagnostic session is triggered when a local diagnoser is awakened by an exception raised in its associated Web service). The local output diagnosis is made up of possible local faults as input data from users or faulty internal basic Web services (among those invoked by the BPEL service), or of input variables coming from shared variables in another composite Web service. These local diagnosers communicate (in both ways) with a coordinator, in charge of building global diagnoses by merging local ones. This coordinator does not initially have any information about the individual Web services except the shared variables between them, which are obtained off line and are at interface level, satisfying thus privacy issues. The coordinator tries to prolong each local diagnosis containing a suspected input variable coming from another service by invoking the local diagnoser of this service. At the end, global diagnoses thus generated are made up of input faulty data from users, faulty internal basic Web services or faulty interfaces between two Web services (these last ones being able often to be checked for confirmation through logged observation). In fact, the local diagnosers and the coordinator are regarded also as Web services communicating via WSDL messages, thus WSDL standard can be used to describe the diagnosis operation offered by a diagnostic Web service. Up to now, the local diagnosers and the coordinator have been implemented as Java objects, thus basic Web services, and interfaced with the data log platform, and are currently tested on applications, such as the Foodshop service.

In 2007, this work has been published in , , . Direct continuation of this work will include: implementing the diagnostic coordinator as a BPEL Web service; extending the diagnostic architecture to the case of choreographed Web services and testing the whole on real examples. Notice that the thesis work just set about by Vincent Armant about distributed diagnosis in a peer-to-peer framework is expected to be later tested with the local diagnostic knowledge bases of Web services produced here, in order to provide a completely distributed monitoring and diagnostic platform for Web services. Another connected work that just begins is the study of diagnosability (and recoverability) properties at design stage. The aim is to define formal properties of a discrete-event model, together with a predefined set of faulty non observable events and a predefined set of observable events, expressing that a given fault will always be detectable or that two given faults will always be discriminable, and then to design algorithms to check off line these properties on the model. These criteria and checking methods will be adapted for study of Web services diagnosability and recoverability and a methodology for designing Web services applications that respect these criteria will be developed.

P2P Monitoring

We have worked on the conception and implementation of tools for monitoring Peer to Peer Systems.

A system named P2PMonitor has been developed for this purpose. It is a P2P system itself, with peers exchanging messages by Web service calls. This system is based on alerters, that are software modules placed on monitored peers, in charge of the surveillance of particular types of events (e.g. web service calls, database updates etc.).They produce streams of (Active)XML data. Our system implements an algebra over data streams. A declarative language allows the user to specify the complex events of interest and the ways the notifications about these events should be created and sent to her. The system is in charge of choosing the best execution plan and of placing the processors on peers. This work has been published in and .

A subject related to monitoring is view maintenance over active documents. Indeed, the monitoring problem can be seen as aggregating streams into an active document and incrementally evaluating a tree-pattern query over this active document. We have developed algorithmic datalog-based foundations for such an incremental query processing and this work has been published in .

A paper presenting a demonstration scenario for the monitoring system integrating the view maintenance for active documents as a way of defining complex monitoring tasks, has been published in .

Web and Graph Mining Serge Abiteboul Pierre Senellart Michalis Vazirgiannis Graph mining Web mining similarity

We introduce in a new method for finding nodes semantically related to a given node in a hyperlinked graph, namely the Green method, based on classical Markov chains. It is generic, adjustment-free and easy to implement. We test it in the case of the hyperlink structure of the English version of an on-line encyclopedia, namely Wikipedia. We present an extensive comparative study of the performance of our method compared to several other classical methods. The Green method is found to have both the best average results and the best robustness.

In , we review a number of classical text mining approaches to synonym extraction over different kinds of corpora. We also introduce a graph mining technique that discovers related words in a monolingual dictionaries, closely inspired by Kleinberg's hubs and authorities, and discuss the more profound relations between classical text mining problems and graph mining.

Contracts and Grants with Industry Industrial contracts

Gemo has had technical meetings in 2006 with many industrial partners, in particular France Telecom R&D, Xyleme and Mandrakesoft, as well as national organizations, in particular, Institut National de Recherche en Agronomie.

MediaD project with France Telecom

The MediaD project aims at designing a declarative environment, SomeWhere, for building peer-to-peer data management systems based on a simple data model: propositional logic. A peer-to-peer data management system is a valuable alternative to a centralized information integration system like a mediator when the number of sources that have to be integrated becomes huge: building a global mediated schema coping with all the sources peculiarities is hardly possible and inefficient.

The goal of MediaD project is to deploy very large applications that scales to thousands of peers. It is organized in two tracks. The first one is to study query answering possibly in the presence of inconsistency. The second one is to develop techniques for cooperative statement of mappings that relate the knowledge of the different peers within the peer-to-peer data management system.

PICSEL3 project with France Telecom

This project is the continuation of PICSEL2 on scaling up to the Web the mediator approach that has been implemented in PICSEL1.

The goal is twofold. It aims at automating the construction of wrappers which translate user queries into the query language accepted by each source and return answers from the sources in the language of the mediator. This work is concerned with mediation of ontologies. Furthermore, we are interested in reference reconciliation, i.e. identifying when different references in a data set correspond to the same real-world entity.

EC Edos Project

EDOS is a research project funded by the European Commission as a STREP project under the IST activities of the 6th Framework Programme. The project involves universities (Paris 7, Tel Aviv, Geneva), INRIA (Gemo and Cristal teams), research centers (CSP Torino) and private companies (Mandriva, Caixa Magica, Nexedi, Nuxeo, Edge-IT). It is centered around the software management and more particularly, of Mandriva Linux distribution.

In the EDOS Project, the Gemo group focuses on improving the process of data distribution of open source software, a challenging issue because of the scale of the distribution (large number of files and size), its dynamicity, the need for replication for better performance and the autonomy of actors.

The goal is to build a P2P distribution system that improves the classical approach based on hierarchies of mirrors, by providing a better sharing of resources. The system combines the functionalities of content (software) distribution with the idea of exchanging XML data in a P2P environment, in our case metadata about the software modules to be distributed. Metadata includes identifiers (name, version), static (size, license, summary, etc) and dynamic properties of software modules (composition, replica locations, statistics about the distribution process, etc).

We defined the P2P system architecture, based on three categories of actors: Publishers (that introduce new content in the system), Mirrors (trusted peers) and Clients (end users). Peers are organized in two sub-networks: the indexing network, composed of trusted peers (Publishers and Mirrors), storing the distributed index on metadata, and the distribution network, composed of all the peers, storing content replicas. The system's software architecture is based on a Java API implementing content distribution functionalities at several abstraction levels: publishing of new content, metadata indexing and querying, subscription to thematic distribution channels and event notification, download in flash-crowd (one source, many requests at the same time) and off-peak situations (many sources, content updates).

The project was successfully ended in September 2007. The effort in the last period has been directed to the consolidation of the system, to several optimizations, to the integration of security mechanisms, to the development of an advanced GUI and to an evaluation on the Grid'5000 platform. The EDOS content distribution system has been published as an open source project on the INRIA Gforge site ( http:// gforge. inria. fr/ projects/ edos-cds/ ) and a first version is included in the Mandriva distribution. The system has represented a real world application for the Gemo P2P software (KadoP, ActiveXML) and led to many improvements in these modules. The work on the EDOS distribution system has been presented to several conferences: VLDB 2007 (a demonstration of the system), OSS 2007 and FOSDEM 2007 (architecture and functionalities) and BDA 2007 (evaluation on Grid'5000).

RNTL Project WebContent

The WebContent project ( http:// www. webcontent. fr) has completed its first year in July 2007. The goal of WebContent is to build a flexible and generic platform for content management and to integrate Semantic Web technologies in order to show their effectiveness on real applications with strong economic or societal stakes. Gemo activity in WebContent this year has been manifold. In the architecture group (Lot 0), we have secured an agreement on the usage of Web services as means of interconnecting the project components. In the peer-to-peer group (Lot 5), we have completed a total rehaul of the AXML peer developed by Gemo. E. Taroza has proposed and implemented a new peer, more robust, and more modular; for instance, the XML storage services provided by the AXML peer have been isolated as a separated component and delegated to the eXist system. M. Ouazara is currently moving this storage component to MonetDB, the system which was retained by the WebContent consortium. At the same time, we have worked on extending the KadoP system to support XML namespaces. The development of OptimAX, the algebraic optimizer for AXML, also contributes to WebContent. Finally, we are currently extending the SomeRDFS prototype with the ability to translate from SPARQL to XQuery, as required by the WebContent integrated platform. In the semantic enrichment of ontologies and documents group (Lot 3), we evaluated our alignment tool TaxoMap both on corpora delivered by WebContent partners and on tests provided by the 2007'International OAEI campaign. Results of the algorithms used by TaxoMap have been analyzed and comparisons with results obtained by OLA (developed by INRIA Rhône-Alpes, EXMO group) have been provided. Adaptations of TaxoMap have been made for the evaluation, specially to link the application to the Alignment API. The experiments highlighted the need to process large-scale ontologies. We are currently working in this direction.

WS-DIAMOND EU project

WS-DIAMOND (“Web Services - DIAgnosability, MONitoring and Diagnosis”) is a FP6 European project (FET Open Strep) which started on Sept. 1st 2005 and will last until Feb. 29th, 2008. EU funding for University Paris-Sud is 188 kEuros. The project is coordinated by the University of Turin, and involves the Polytechnic University of Milan, the Vrije University of Amsterdam, the University of Vienna, the University of Klagenfurt, and from France the LAAS-CNRS, the University of Rennes 1, and the University of Paris-Sud. Participants from Gemo are Philippe Dague (site leader for U. Paris-Sud), Tarek Melliti (post-doc from Oct. 1st 2005 to Aug. 31th 2006, assistant professor at U. of Evry from Sep. 1st 2006), Yingmin Li (master internship from April 1st 2006 to Sept. 30th 2006, Ph.D. student from Oct. 2006), Lina Ye (master internship from March 19th 2007 to Sept. 18th 2007, Ph.D. student from end of Sept. 2007), Laura Brandan Briones (post-doc from May 2007) and Omar Aaouatif (engineer internship from March 5th 2007 to June 4th 2007).

Other Grants and Activities National Actions

In France, close links exist with groups at Orsay (databases, V. Benzaken and N. Bidoit; bio-informatics, C. Froidevaux; machine learning, M. Sebag), with the Cedric Group at CNAM-Paris; some INRIA groups (Atlas, P. Valduriez, DistribCom, A. Benveniste, at INRIA-Bretagne, Exmo, J. Euzenat, at INRIA Rhone-Alpes, Mostrare at INRIA Futurs Lille); the BIA group at INRA (P. Buche, C. Dervin), the GRIMM of the University of Toulouse Le Mirail (O. Haemmerlé), the LIRIS of the University of Lyon 1 (M. Hacid), the LIRMM of the University of Montpellier (M. Chein, M-L. Mugnier), the LI of the University of Tours (G. Venturini), and the UMPA at École normale supérieure de Lyon (Y. Ollivier).

DocFlow

DocFlow is a research project supported by the ANR Masses de données (2007-2009) with the Distribcom team at INRIA-Rennes (Albert Benveniste) and the Méthodes Formelles group at Labri-Bordeaux (Anca Muscholl). The topic is the analysis, monitoring, and optimization of Web documents and services. It builds on Active XML, a formalism for data exchange across peers developed by Gemo. The project aims at achieving a convergence of data and workflow management over the Web through the concept of active peer-to-peer documents.

ACI Project TraLaLa

TraLaLa stands for XML Transformation Languages: logic and applications. It is funded by the ACI (Action Concertée Incitative) Masses de Données, has started in September 2004 and ended during the summer 2007. The setting is the integration and manipulation of massive data in XML format. We are interested more specifically in the programming and querying languages aspects: expressivity, typing, optimization. We are also interested in studying how this can be done in a context where documents are compressed or in a streaming scenario. The home page of the project could be found at: http:// www. cduce. org/ tralala. html.

ACI Normes Pratiques et Régulations des Politiques Publiques

This ACI started in 2005 and is projected to last three years. This ACI is a collaboration between Benjamin Nguyen (University of Versailles), and François-Xavier Dudouet (CNRS, Laboratoire IRISES). The project has completed this year, but the work carried on has been merged (and continues through) the WebStand project (see below).

ANR JCJC WebStand

The objective of this ANR, that started in 2006, is to analyze the problems surrounding the use of semi-structured databases in social sciences. This ANR regroups both computer science and sociology laboratories. Work done in Gemo which contributes to WebStand includes XML data cleaning , and work on automatic selection and maintenance of materialized XML views. The joint work of the consortium has lead to a publication in a social sciences conference .

SHIRI Digiteo project

SHIRI is a research project funded by the Ile de France region as a Digiteo project which started on Oct. 1st 2007 and will last until Sept. 30th, 2011. It involves two partners of Digiteo, Supelec and the University of Paris-Sud. The aim of SHIRI is to design an annotation system to improve the relevance of the search on the Web when resources contain both semi-structured and textual data.

European Commission Financed Actions

In Europe, close links exist with University of Dortmund (T. Schwentick), University of Athens (M. Vazirgiannis), University of Madrid (A. Gomez-Perez), University of Manchester (I. Horrocks), University of Rome (M. Lenzerini).

Particular projects that we conduct are detailed next.

Marie Curie Fellowship NGWeMiS

NGWeMiS (Next Generation Web Mining and Searching) is a project lead by M. Vazirgianis (U. Athens). The project lies in the area of knowledge extraction and management from the massive and heterogeneous document collections on the World Wide Web. The main objective of the proposed project is the design guidelines and prototypes development for next generation web mining and searching techniques based on the P2P paradigm. The innovation lies in the usage of P2P paradigm in the various levels of web content management and searching, and the study and development of novel similarity measures among web documents that take into account multple facets including structure and semantics iii. clustering the web data and meta data taking into account their P2P organization paradigm.

Bilateral International Relations Cooperation within Europe

Procope

Gemo has a PHC-Procope project with the database group of Thomas Schwentick at Dortmund University, Germany. The project will end in 2008. Its goal is to work on verification and queries in the presence of data values. It produced already several join papers between the two groups.

Polonium

Gemo has a PHC-Polonium project with the group of Lasota Slavomir at Warsaw University, Poland. The project will stop at the end of 2007. Its goal is to work on verification and queries in the presence of data values. It produced already several join papers between the two groups.

Van-Gogh

Gemo has a PHC-Van-Gogh project with the group of Maarten Marx at Amsterdam University, The Netherlands. The project will stop at the end of 2007. Its goal is to work on expressive power and performances of XML query languages.

TARGET

Gemo started a cooperation with the Luxembourg University in November 2005 which lead to a PhD in co-tutelle with Paris-Sud university. The PhD project is TARGET for opTimal Adaptive infoRmation manaGemEnT over the web. It aims at improving web information retrieval by integrating web data evolution, users knowledge evolution and search domain evolution. The PhD student is Cedric Pruski.

University of Oxford

Gemo started a collaboration with Georg Gottlob from University of Oxford on the topic of the definition of the Matchoperator in data exchange. This collaboration led to a three-month stay of Pierre Senellart at University of Oxford.

Cooperation with Senegal

Gemo started a cooperation with the Gaston Berger University last year: a PhD in co-tutelle with Paris-Sud university started in december 2006. The subject of the thesis is the integration of semi-structured data for information retrieval. The PhD student is Mouhamadou Thiam.

Cooperation with the Middle-East

Close links exist with University of Tel-Aviv (T. Milo).

Cooperation with North America

Close links also exist with UC Santa Cruz (N. Polyzotis), U. of Rutgers (A. Borgida), Google Research (O. Benjelloun),

French-US team: GemSaD

Since 2003, Gemo and the data management group at the University of California at San Diego (V. Vianu, A. Deutch, Y. Papakonstantinou) form an associated team funded by INRIA International. This association is expected to last till end 2008. Victor Vianu and Ravi Vijay, a Ph.d student from UCSD spent 3 months in Gemo this summer. Bogdan Cautis spent 1 week in San Diego. The home page of GemSaD can be found at http:// www-rocq. inria. fr/ ~segoufin/ GEMSAD/ . GemSad is also partially supported by the National Science Foundation.

Visiting Professors and Students

This year the following professors visited Verso:

Tova Milo, professor at the University of Tel-Aviv (in February)

Neoklis Polyzotis, professor at the University of Southern California (in September)

Victor Vianu, professor, UC San Diego (July to September)

Yuhong Yan, research officer, NRC Canada, IIT, Fredericton (in December)

The following PhD students came for internships in the group: Ravi Vijay [UCSD, USA; 2 months, PhD internship].

Dissemination Thesis

The following PhD thesis were defended in 2007:

Andrei Arion, XML Access Modules: Towards Physical Data Independence in XML Databases.

Bogdan Cautis, Signing and Reasoning about Tree Updates.

Antonella Poggi (with Universita' degli Studi di Roma "La Sapienza"), Structured and Semi-structured Data Integration.

Fatiha Saïs, Semantic Data Integration guided by an Ontology.

Mathias Samuelides, Tree walking automata with pebbles.

Pierre Senellart, Understanding the Hidden Web.

Participation in Conferences

Gemo project members have co-chaired scientific events:

S. Abiteboul has co-chaired the International Workshop on Data and Service Integration (SDIS'07), in cooperation with VLDB.

I. Manolescu has co-chaired the 10th International Workshop on the Web and Databases (WebDB 2007), in cooperation with ACM SIGMOD.

Members of the project have participated in program committees:

S. Abiteboul

World Wide Web Conference (WWW07)

International Conference on Very Large Databases (VLDB'07)

International Workshop on Web Information and Data Management (WIDM'07)

World Wide Web Conference (WWW08)

ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2008)

Journées Francophones de Bases de Données Avancées 2007

P. Chatalic

Journées Francophones de Programmation par Contraintes (JFPC 2007)

Ph. Dague

20th International Joint Conference on Artificial Intelligence (IJCAI) 2007

18th International Workshop on Principles of Diagnosis (DX) 2007

21th International Workshop on Qualitative Reasoning (QR) 2007

I. Manolescu

33rd Very Large Databases Conference (VLDB 2007)

Conference on Information and Knowledge Management (CIKM 2007)

Web Information and Data Management workshop (WIDM 2007), in cooperation with the CIKM conference

Experimental Evaluation in Databases (ExpDB 2007) workshop, in cooperation with the ACM SIGMOD conference

Journées Francophones de Bases de Données Avancées 2007

F. Goasdoué

16èmes congrès francophone Reconnaissance des Formes et Intelligence Artificielle (RFIA08)

C. Reynaud

Third workshop on Context and Ontology Representation and Reasoning (C&O:RR-2007)

16èmes congrès francophone Reconnaissance des Formes et Intelligence Artificielle, member of the editorial board (RFIA08)

Conférence Extraction et Gestion des Connaissances (EGC07)

17èmes Journées Francophones d'Ingenierie des connaissances (IC07)

1ères Journées Francophones sur les ontologies (JFO2007)

Atelier Modélisation des connaissances (EGC07)

Atelier Ontologies et Gestion de l'Hétérogénéité Sémantique (OGHS'07)

Atelier Ontologies et Textes (TIA07)

M-C. Rousset

International Joint Conference on Artificial Intelligence 2007

International Semantic Web Conference 2007

Atelier Modélisation des connaissances (EGC07)

Atelier Modélisation des connaissances (EGC08)

European Semantic Web Conference 2008

Fatiha Sais

Manifestation des Jeunes Chercheurs en Sciences et Technologies de l'Information et de la Communication (MajecSTIC 2007)

L. Segoufin

ACM Symposium on Principles of Database Systems (PODS'07)

EACSL Conference for Computer Science Logic (CSL'07)

P. Senellart

Text Mining Workshop 2007

L. Simon

International Conference on Theory and Applications of Satisfiability Testing (SAT 2007)

Journées Francophones de Programmation par Contraintes (JFPC 2007)

M. Vazirgiannis

International Conference on User Modeling (UM 2007)

D. Vodislav

Journées Francophones de Bases de Données Avancées 2007

Invited Presentations

Serge Abiteboul has been invited speaker at the Symposium on Theoretical Aspects of Computer Science, STACS 2007 . He has been invited speaker at the PhD Student Workshop of SIGMOD 2007 where he spoke on “Life in Academia”. He has been also invited at the Dagstuhl-Seminar on Programming Paradigms for the Web (2007).

Marie-Christine Rousset presented a tutorial at BDA 2007 on “Building scalable semantic peer-to-peer data management systems: the SomeWhere approach”. She presented a lecture talk at the ACAI 2007 Summer School on “Logic-based techniques for information integration”.

Scientific Animations

Editors

F. Goasdoué

Guest editor of a special issue of Technique et Science Informatiques (TSI) on the Semantic Web, Hermès-Lavoisier.

Member of the reading committee of the book Semantic Web Methodologies for E-Business Applications: Ontologies, Processes and Management Practices, Idea Group Publishing (scheduled for publication in 2008).

I. Manolescu

Guest editor of a special issue of the Elsevier Journal of Information Systems on Performance Evaluation in Database Systems.

C. Reynaud

Journal Electronique d'IA de l'AFIA (JEDAI).

Revue Information - Interaction - Intelligence (RI3).

Revue des Nouvelles Technologies de l'Information, Special issue "Fouille du Web" (RNTI).

M-C. Rousset

Interstices (revue electronique de vulgarisation sur la recherche en informatique): http:// interstices. info/

AI Communications (AICOM)

Electronic Transactions on Artificial Intelligence (ETAI) (for the areas: Concept-based Knowledge Representation and Semantic Web).

Revue Information - Interaction - Intelligence (I3)

L. Simon

Member of the Editorial Board of JSAT (the Journal on Satisfiability, Boolean Modeling and Computation)

Guest Editor of a Special Issue of JSAT on SAT 2006 Competitions and Evaluations.

Philippe Adjiman P. Francois Goasdoué F. Marie-Christine Rousset M.-C. SomeRDFS in the Semantic Web Journal on Data Semantics LNCS 8 2007 158-181 XQueC: A Query-Conscious Compressed XML Database Andrei Arion A. Angela Bonifati A. Ioana Manolescu I. A. Pugliese A. 1533-5399 ACM TOIT 7 2 2007 Context-based caching and routing for P2P web service discovery Christos Doulkeridis C. Vassilis Zafeiris V. Kjetil Nørvåg K. Michalis Vazirgiannis M. Emmanouel A. Giakoumakis E. A. 0926-8782 Distributed and Parallel Databases 21 1 2007 59-84 Web site personalization based on link analysis and navigational patterns Magdalini Eirinaki M. Michalis Vazirgiannis M. 1533-5399 ACM Trans. Trans. Internet Techn. 7 4 2007 Nicolas Guelfi N. Cedric Pruski C. Chantal Reynaud C. Towards the Adaptive Web using Metadata Evolution Handbook of research on Web Information Systems Quality 2007 Techniques structurelles d'alignement pour portails web Chantal Reynaud C. Brigitte Safar B. 1764-1667 RNTI, Revue des Nouvelles Technologies de l'Information 2007 Static Analysis of XML Processing with Data Values Luc Segoufin L. 0163-5808 ACM Sigmod Record 36 1 2007 Automatic discovery of similar words Pierre Senellart P. Vincent D. Blondel V. D. Michael W. Berry M. W. Malu Castellanos M. Survey of Text Mining: Clustering, Classification and Retrieval Second Edition Springer-Verlag January 2008 A Calculus and Algebra for Distributed Data Management S. Abiteboul S. Symposium on Theoretical Aspects of Computer Science (STACS) 2007 International Symposium on Theoretical Aspects of Computer Science 24 STACS Incremental View Maintenance for Active Documents Serge Abiteboul S. Pierre Bourhis P. Bogdan Marinoiu B. National Conference, Bases de Données Avancées 2007 Journées Bases de Données Avancées 23 BDA EDOS Distribution System: a P2P architecture for open-source content dissemination S. Abiteboul S. I. Dar I. R. Pop R. G. Vasile G. D. Vodislav D. Int. Conf. on Open Source Systems 2007 International Conference on Open Source Systems 2007 OSS Snapshot on the EDOS Distribution System Serge Abiteboul S. Itay Dar I. Radu Pop R. Gabriel Vasile G. Dan Vodislav D. Free and Open Source Software Developers' European Meeting 2007 Free and Open Source Software Developer's European Meeting 2007 FOSDEM Large Scale P2P Distribution of Open-Source Software (demo) Serge Abiteboul S. Itay Dar I. Radu Pop R. Gabriel Vasile G. Dan Vodislav D. Nicoleta Preda N. International Conference on Very Large Databases 2007 International Conference on Very Large Data Bases 33 VLDB XML Processing in DHT networks Serge Abiteboul S. Ioana Manolescu I. Neoklis Polyzotis N. Nicoleta Preda N. Chong Sun C. International Conference on Data Engineering also in National Conference, Bases de Données Avancées 07 2008 International Conference on Data Engineering 24 ICDE OptimAX: optimizing distributed continuous queries (demo) Serge Abiteboul S. Ioana Manolescu I. Spyros Zoupanos S. National Conference, Bases de Données Avancées 2007 Journées Bases de Données Avancées 23 BDA Distributed Monitoring of Peer to Peer Systems Serge Abiteboul S. Bogdan Marinoiu B. ACM Int.'l workshop on Web Information and Data Management 2007 ACM International Workshop on Web Information and Data Management 9 WIDM Monitoring Peer to Peer Systems Serge Abiteboul S. Bogdan Marinoiu B. National Conference, Bases de Données Avancées 2007 Journées Bases de Données Avancées 23 BDA Distributed Monitoring of Peer to Peer Systems (demonstration) Serge Abiteboul S. Bogdan Marinoiu B. Pierre Bourhis P. International Conference on Data Engineering To appear 2008 International Conference on Data Engineering 24 ICDE The Data Ring: Community Content Sharing Serge Abiteboul S. Neoklis Polyzotis N. Conference on Innovative Database Systems Research 2007 Conference on Innovative Data Systems Research 3 CIDR Scalability Evaluation of a P2P Content Dissemination System Serge Abiteboul S. Radu Pop R. Gabriel Vasile G. Dan Vodislav D. National Conference, Bases de Données Avancées 2007 Journées Bases de Données Avancées 23 BDA Structured Materialized Views for XML Queries Andrei Arion A. Véronique Benzaken V. Ioana Manolescu I. Yannis Papakonstantinou Y. Very Large Databases Conference 2007 87-98 International Conference on Very Large Data Bases 33 VLDB Comparing apples and oranges: normalized pagerank for evolving graphs Klaus Berberich K. Srikanta J. Bedathur S. J. Gerhard Weikum G. Michalis Vazirgiannis M. WWW 2007 ACM May 2007 1145-1146 International World Wide Web Conference 16 WWW P2PTester: a tool for measuring P2P platform performance Bogdan Butnaru B. Florin Dragan F. Georges Gardarin G. Ioana Manolescu I. Benjamin Nguyen B. Radu Pop R. Nicoleta Preda N. Laurent Yeh L. International Conference on Data Engineering 2007 1501-1502 International Conference on Data Engineering 24 ICDE Découverte de correspondances entre ontologies distribuées Francois-Elie Calvier F.-E. Chantal Reynaud C. Atelier Ontologies et Gestion de l'Hétérogenéite Sémantique, Plate-Forme AFIA 2007 2007 31-40 Plate-Forme AFIA : Atelier Ontologies et Gestion de l'Hétérogenéite Sémantique 2007 Reasoning about XML Update Constraints Bogdan Cautis B. Serge Abiteboul S. Tova Milo T. ACM Conf. on Principles of Database Systems 2007 ACM Conference on Principle of Database Systems 26 PODS Peer-to-Peer Similarity Search in Metric Spaces Christos Doulkeridis C. Akrivi Vlachou A. Yannis Kotidis Y. Michalis Vazirgiannis M. VLDB 2007 VLDB Endowment September 2007 986-997 International Conference on Very Large Data Bases 33 VLDB Les ontologies pour la recherche ciblée d'information sur le Web Nicola Guelfi N. Cedric Pruski C. Chantal Reynaud C. 18èmes Journées Francophones d'Ingénierie des Connaissances, IC'2007 2007 61-72 Journées Francophones d'Ingénierie des Connaissances 18 IC Understanding and Supporting Ontology Evolution by Observing the WWW Conference Nicolas Guelfi N. Cedric Pruski C. Chantal Reynaud C. Int. Workshop on Emergent Semantics and Ontology Evolution associated to ISWC2007 2007 International Workshop on Emergent Semantics and Ontology Evolution 2007 Modeling BPEL Web services for diagnosis: towards self-healing Web services Y. Li Y. T. Melliti T. P. Dague P. Proc. of the 3rd International Conference on Web Information Systems and Technologies (WEBIST'07), Barcelona, Spain March 2007 297-304 International Conference on Web Information Systems and Technologies 3 WEBIST Performance Evaluation and Experimental Assessment - Conscience or Curse of Database Research? (panel) Ioana Manolescu I. Stefan Manegold S. VLDB 2007 1441-1442 International Conference on Very Large Data Bases 33 VLDB Stability Based Sparse LSI/PCA: Incorporating Feature Selection in LSI and PCA Dimitrios Mavroeidis D. Michalis Vazirgiannis M. Machine Learning: ECML 2007 Springer September 2007 226-237 European Conference on Machine Learning 18 ECML Determinacy and Rewriting of Conjunctive Queries Using Views: A Progress Report Alan Nash A. Luc Segoufin L. Victor Vianu V. International Conference on Database Theory (ICDT) 2007 59-73 International Conference on Database Theory 11 ICDT Finding Related Pages Using Green Measures: An Illustration with Wikipedia Yann Ollivier Y. Pierre Senellart P. Proc. AAAI, Vancouver, Canada July 2007 1427–1433 National Conference on Artificial Intelligence 22 AAAI Passage à l echelle de la reconciliation de concepts et de la reconciliation de references : quelques points de comparaisons. Nathalie Pernelle N. Fatiha Saïs F. Workshop DECOR : 'Passage à l'échelle des techniques de découverte de correspondances' of EGC'2007, Namur (Belgium) 2007 EGC Workshop DECOR 2007 DECOR Exploiting WordNet as Background Knowledge Chantal Reynaud C. Brigitte Safar B. International ISWC'07 Ontology Matching (OM-07) Workshop, Busan, Corea 2007 International Workshop on Ontology Matching 2 OM Utilisation de connaissances supplémentaires pour la découverte de mappings dans le système TaxoMap Chantal Reynaud C. Brigitte Safar B. Atelier DECOR, EGC 2007 2007 Journées d'Extraction et Gestion des Connaissances 7 EGC Techniques d'alignement d'ontologies basées sur la structure d'une ressource complémentaire Brigitte Safar B. Chantal Reynaud C. François-Elie Calvier F.-E. 1ères Journées Francophones sur les Ontologies 0ctober 2007 21-35 Journées Francophones sur les Ontologies 1 JFO Complexity of Pebble Tree-Walking Automata Mathias Samuelides M. Luc Segoufin L. Fundamentals of Computation Theory (FCT) 2007 458-469 International Symposium on Fundamentals of Computation Theory 16 FCT Approche logique pour la réconciliation de références Fatiha Saïs F. Nathalie Pernelle N. Marie-Christine Rousset M.-C. Actes of Extraction et Gestion des Connaissances (EGC 2007),Belgium 2007 623-634 Journées d'Extraction et Gestion des Connaissances 7 EGC L2R: a Logical method for Reference Reconciliation Fatiha Saïs F. Nathalie Pernelle N. Marie-Christine Rousset M.-C. Proceedings of the Twenty-second AAAI Conference on Artificial Intelligence (AAAI-07) 2007 National Conference on Artificial Intelligence 22 AAAI Reconciliation de references : une approche adaptee aux grands volumes de données. Fatiha Saïs F. Nathalie Pernelle N. Marie-Christine Rousset M.-C. Proceedings of the fifth Conference on Optimization and Information Systems (COSI), Algeria 2007 Colloque sur l'optimisation et les systemes d'information 5 COSI On-Line Index Selection for Shifting Workloads Karl Schnaitter K. Serge Abiteboul S. Tova Milo T. Neoklis Polyzotis N. International Workshop on Self-Managing Database Systems 2007 International Workshop on Self-Managing Database Systems 2007 SMDB On the Complexity of Managing Probabilistic XML Data Pierre Senellart P. Serge Abiteboul S. Proc. PODS, Beijing, China June 2007 283–292 ACM Conference on Principle of Database Systems 26 PODS Constant-memory validation of streaming XML documents against DTDs Christina Sirangelo C. Luc Segoufin L. International Conference on Database Theory (ICDT) 2007 International Conference on Database Theory 11 ICDT WebdocEnrich : enrichissement semantique flexible de documents semi-structurés Mouhamadou Thiam M. Nathalie Pernelle N. Fatiha Saïs F. Actes of Extraction et Gestion des Connaissances (EGC 2007),Belgium 2007 Journées d'Extraction et Gestion des Connaissances 7 EGC Word Sense Disambiguation with Spreading Activation Networks Generated from Thesauri George Tsatsaronis G. Michalis Vazirgiannis M. Ion Androutsopoulos I. IJCAI 2007 February 2007 1725-1730 International Joint Conference on Artificial Intelligence 20 IJCAI SKYPEER: Efficient Subspace Skyline Computation over Distributed Data Akrivi Vlachou A. Christos Doulkeridis C. Yannis Kotidis Y. Michalis Vazirgiannis M. ICDE 2007 IEEE Computer Society May 2007 416-425 International Conference on Data Engineering 23 ICDE Declarative XML Data Cleaning with XClean Melanie Weis M. Ioana Manolescu I. Conference on Advanced Information Systems Engineering 2007 96-110 International Conference on Advanced Information Systems Engineering 19 CAISE XClean in action (demo) Melanie Weis M. Ioana Manolescu I. Conference on Innovative Database Systems Research 2007 259-262 Conference on Innovative Data Systems Research 3 CIDR Monitoring and Diagnosing Orchestrated Web Service Processes Y. Yan Y. P. Dague P. Proc. of 5th IEEE International Conference on Web Services (ICWS'07), Salt Lake City, Utah, USA IEEE Computer Society July 2007 51-59 IEEE International Conference on Web Services 5 ICWS TaxoMap in the OAEI 2007 alignment contest Haifa Zargayouna H. Brigitte Safar B. Chantal Reynaud C. Ontology Alignment Evaluation Inititiative (OAEI) 2007 Campaign - Workshop on Ontology Matching 2007 International Workshop on Ontology Matching 2 OM Diagnostic distribué à base de modèles dans un système pair-à-pair V. Armant V. Technical report Computer Science Master report, University Paris-Sud, Orsay 2007 Cooperative diagnosis for BPEL Web services L. Ye L. Technical report Computer Science Master report, University Paris-Sud, Orsay 2007 Traiter des corpus d'information sur le Web. Vers de nouveaux usages informatiques de l'enquête Dario Colazzo D. François-Xavier Dudouet F.-X. Ioana Manolescu I. Benjamin Nguyen B. Pierre Senellart P. Antoine Vion A. 2007 Neuvième Congrès de l'Association Francaise de Sciences Politiques (AFSP) A Calculus and Algebra for Distributed Data Management S. Abiteboul S. Symposium on Theoretical Aspects of Computer Science (STACS) 2007 Incremental View Maintenance for Active Documents Serge Abiteboul S. Pierre Bourhis P. Bogdan Marinoiu B. National Conference, Bases de Données Avancées 2007 XQueC: A Query-Conscious Compressed XML Database Andrei Arion A. Angela Bonifati A. Ioana Manolescu I. A. Pugliese A. ACM TOIT 7 2 2007 Structured Materialized Views for XML Queries Andrei Arion A. Véronique Benzaken V. Ioana Manolescu I. Yannis Papakonstantinou Y. Very Large Databases Conference 2007 87-98 EDOS Distribution System: a P2P architecture for open-source content dissemination S. Abiteboul S. I. Dar I. R. Pop R. G. Vasile G. D. Vodislav D. Int. Conf. on Open Source Systems 2007 Snapshot on the EDOS Distribution System Serge Abiteboul S. Itay Dar I. Radu Pop R. Gabriel Vasile G. Dan Vodislav D. Free and Open Source Software Developers' European Meeting 2007 Large Scale P2P Distribution of Open-Source Software (demo) Serge Abiteboul S. Itay Dar I. Radu Pop R. Gabriel Vasile G. Dan Vodislav D. Nicoleta Preda N. International Conference on Very Large Databases 2007 Philippe Adjiman P. Francois Goasdoué F. Marie-Christine Rousset M.-C. SomeRDFS in the Semantic Web Journal on Data Semantics LNCS 8 2007 158-181 Distributed Monitoring of Peer to Peer Systems Serge Abiteboul S. Bogdan Marinoiu B. ACM Int.'l workshop on Web Information and Data Management 2007 Monitoring Peer to Peer Systems Serge Abiteboul S. Bogdan Marinoiu B. National Conference, Bases de Données Avancées 2007 OptimAX: optimizing distributed continuous queries (demo) Serge Abiteboul S. Ioana Manolescu I. Spyros Zoupanos S. National Conference, Bases de Données Avancées 2007 The Data Ring: Community Content Sharing Serge Abiteboul S. Neoklis Polyzotis N. Conference on Innovative Database Systems Research 2007 Scalability Evaluation of a P2P Content Dissemination System Serge Abiteboul S. Radu Pop R. Gabriel Vasile G. Dan Vodislav D. National Conference, Bases de Données Avancées 2007 Diagnostic distribué à base de modèles dans un système pair-à-pair V. Armant V. Technical report Computer Science Master report, University Paris-Sud, Orsay 2007 Comparing apples and oranges: normalized pagerank for evolving graphs Klaus Berberich K. Srikanta J. Bedathur S. J. Gerhard Weikum G. Michalis Vazirgiannis M. WWW 2007 ACM May 2007 1145-1146 P2PTester: a tool for measuring P2P platform performance Bogdan Butnaru B. Florin Dragan F. Georges Gardarin G. Ioana Manolescu I. Benjamin Nguyen B. Radu Pop R. Nicoleta Preda N. Laurent Yeh L. International Conference on Data Engineering 2007 1501-1502 Reasoning about XML Update Constraints Bogdan Cautis B. Serge Abiteboul S. Tova Milo T. ACM Conf. on Principles of Database Systems 2007 Traiter des corpus d'information sur le Web. Vers de nouveaux usages informatiques de l'enquête Dario Colazzo D. François-Xavier Dudouet F.-X. Ioana Manolescu I. Benjamin Nguyen B. Pierre Senellart P. Antoine Vion A. 2007 Neuvième Congrès de l'Association Francaise de Sciences Politiques (AFSP) Découverte de correspondances entre ontologies distribuées Francois-Elie Calvier F.-E. Chantal Reynaud C. Atelier Ontologies et Gestion de l'Hétérogenéite Sémantique, Plate-Forme AFIA 2007 2007 31-40 Peer-to-Peer Similarity Search in Metric Spaces Christos Doulkeridis C. Akrivi Vlachou A. Yannis Kotidis Y. Michalis Vazirgiannis M. VLDB 2007 VLDB Endowment September 2007 986-997 Context-based caching and routing for P2P web service discovery Christos Doulkeridis C. Vassilis Zafeiris V. Kjetil Nørvåg K. Michalis Vazirgiannis M. Emmanouel A. Giakoumakis E. A. Distributed and Parallel Databases 21 1 2007 59-84 Web site personalization based on link analysis and navigational patterns Magdalini Eirinaki M. Michalis Vazirgiannis M. ACM Trans. Trans. Internet Techn. 7 4 2007 Nicolas Guelfi N. Cedric Pruski C. Chantal Reynaud C. Towards the Adaptive Web using Metadata Evolution Handbook of research on Web Information Systems Quality 2007 Les ontologies pour la recherche ciblée d'information sur le Web Nicola Guelfi N. Cedric Pruski C. Chantal Reynaud C. 18èmes Journées Francophones d'Ingénierie des Connaissances, IC'2007 2007 61-72 Understanding and Supporting Ontology Evolution by Observing the WWW Conference Nicolas Guelfi N. Cedric Pruski C. Chantal Reynaud C. Int. Workshop on Emergent Semantics and Ontology Evolution associated to ISWC2007 2007 Modeling BPEL Web services for diagnosis: towards self-healing Web services Y. Li Y. T. Melliti T. P. Dague P. Proc. of the 3rd International Conference on Web Information Systems and Technologies (WEBIST'07), Barcelona, Spain March 2007 297-304 Performance Evaluation and Experimental Assessment - Conscience or Curse of Database Research? (panel) Ioana Manolescu I. Stefan Manegold S. VLDB 2007 1441-1442 Stability Based Sparse LSI/PCA: Incorporating Feature Selection in LSI and PCA Dimitrios Mavroeidis D. Michalis Vazirgiannis M. Machine Learning: ECML 2007 Springer September 2007 226-237 Determinacy and Rewriting of Conjunctive Queries Using Views: A Progress Report Alan Nash A. Luc Segoufin L. Victor Vianu V. International Conference on Database Theory (ICDT) 2007 59-73 Finding Related Pages Using Green Measures: An Illustration with Wikipedia Yann Ollivier Y. Pierre Senellart P. Proc. AAAI, Vancouver, Canada July 2007 1427–1433 Passage à l echelle de la reconciliation de concepts et de la reconciliation de references : quelques points de comparaisons. Nathalie Pernelle N. Fatiha Saïs F. Workshop DECOR : 'Passage à l'échelle des techniques de découverte de correspondances' of EGC'2007, Namur (Belgium) 2007 Exploiting WordNet as Background Knowledge Chantal Reynaud C. Brigitte Safar B. International ISWC'07 Ontology Matching (OM-07) Workshop, Busan, Corea 2007 Techniques structurelles d'alignement pour portails web Chantal Reynaud C. Brigitte Safar B. RNTI, Revue des Nouvelles Technologies de l'Information 2007 Utilisation de connaissances supplémentaires pour la découverte de mappings dans le système TaxoMap Chantal Reynaud C. Brigitte Safar B. Atelier DECOR, EGC 2007 2007 On the Complexity of Managing Probabilistic XML Data Pierre Senellart P. Serge Abiteboul S. Proc. PODS, Beijing, China June 2007 283–292 On-Line Index Selection for Shifting Workloads Karl Schnaitter K. Serge Abiteboul S. Tova Milo T. Neoklis Polyzotis N. International Workshop on Self-Managing Database Systems 2007 Static Analysis of XML Processing with Data Values Luc Segoufin L. ACM Sigmod Record 36 1 2007 Approche logique pour la réconciliation de références Fatiha Saïs F. Nathalie Pernelle N. Marie-Christine Rousset M.-C. Actes of Extraction et Gestion des Connaissances (EGC 2007),Belgium 2007 623-634 L2R: a Logical method for Reference Reconciliation Fatiha Saïs F. Nathalie Pernelle N. Marie-Christine Rousset M.-C. Proceedings of the Twenty-second AAAI Conference on Artificial Intelligence (AAAI-07) 2007 Reconciliation de references : une approche adaptee aux grands volumes de données. Fatiha Saïs F. Nathalie Pernelle N. Marie-Christine Rousset M.-C. Proceedings of the fifth Conference on Optimization and Information Systems (COSI), Algeria 2007 Techniques d'alignement d'ontologies basées sur la structure d'une ressource complémentaire Brigitte Safar B. Chantal Reynaud C. François-Elie Calvier F.-E. 1ères Journées Francophones sur les Ontologies 0ctober 2007 21-35 Constant-memory validation of streaming XML documents against DTDs Christina Sirangelo C. Luc Segoufin L. International Conference on Database Theory (ICDT) 2007 Complexity of Pebble Tree-Walking Automata Mathias Samuelides M. Luc Segoufin L. Fundamentals of Computation Theory (FCT) 2007 458-469 WebdocEnrich : enrichissement semantique flexible de documents semi-structurés Mouhamadou Thiam M. Nathalie Pernelle N. Fatiha Saïs F. Actes of Extraction et Gestion des Connaissances (EGC 2007),Belgium 2007 Word Sense Disambiguation with Spreading Activation Networks Generated from Thesauri George Tsatsaronis G. Michalis Vazirgiannis M. Ion Androutsopoulos I. IJCAI 2007 February 2007 1725-1730 SKYPEER: Efficient Subspace Skyline Computation over Distributed Data Akrivi Vlachou A. Christos Doulkeridis C. Yannis Kotidis Y. Michalis Vazirgiannis M. ICDE 2007 IEEE Computer Society May 2007 416-425 Declarative XML Data Cleaning with XClean Melanie Weis M. Ioana Manolescu I. Conference on Advanced Information Systems Engineering 2007 96-110 XClean in action (demo) Melanie Weis M. Ioana Manolescu I. Conference on Innovative Database Systems Research 2007 259-262 Monitoring and Diagnosing Orchestrated Web Service Processes Y. Yan Y. P. Dague P. Proc. of 5th IEEE International Conference on Web Services (ICWS'07), Salt Lake City, Utah, USA IEEE Computer Society July 2007 51-59 Cooperative diagnosis for BPEL Web services L. Ye L. Technical report Computer Science Master report, University Paris-Sud, Orsay 2007 TaxoMap in the OAEI 2007 alignment contest Haifa Zargayouna H. Brigitte Safar B. Chantal Reynaud C. Ontology Alignment Evaluation Inititiative (OAEI) 2007 Campaign - Workshop on Ontology Matching 2007