Section: Research Program

Querying Heterogeneous Linked Data

Our main objective is to query collections of linked datasets. In the static setting, we consider two kinds of links: explicit links between elements of the datasets, such as equalities or pointers, and logical links between relations of different datasets such as schema mappings. In the dynamic setting, we permit a third kind of links that point to “intentional” relations computable from a description, such as the application of a Web service or the application of a schema mapping.

We believe that collections of linked datasets are usually too big to ensure a global knowledge of all datasets. Therefore, schema mappings and constraints should remain between pairs of datasets. Our main goal is to be able to pose a query on a collection of datasets, while accounting for the possible recursive effects of schema mappings. For illustration, consider a ring of datasets D1, D2, D3 linked by schema mappings M1, M2, M3 that tell us how to complete a database Di by new elements from the next database in the cycle.

The mappings Mi induce three intentional datasets I1, I2, and I3, such that Ii contains all elements from Di and all elements implied by Mi from the next intentional dataset in the ring:

I 1 = D 1 M 1 ( I 2 ) , I 2 = D 2 M 2 ( I 3 ) , I 3 = D 3 M 3 ( I 1 )

Clearly, the global information collected by the intentional datasets depends recursively on all three original datasets Di. Queries to the global information can now be specified as standard queries to the intentional databases Ii. However, we will never materialize the intentional databases Ii. Instead, we can rewrite queries on one of the intentional datasets Ii to recursive queries on the union of the original datasets D1, D2, and D3 with their links and relations. Therefore, a query answering algorithm is needed for recursive queries, that chases the “links” between the Di in order to compute the part of Ii needed for the purpose of query answering.

This illustrates that we must account for the graph data models when dealing with linked data collections whose elements are linked, and that query languages for such graphs must provide recursion in order to chase links. Therefore, we will have to study graph databases with recursive queries, such as rdf graphs with sparql queries, but also other classes of graph databases and queries.

We study schemas and mappings between datasets with different kinds of data models and the complexity of evaluating recursive queries over graphs. In order to use schema mapping for efficiently querying the different datasets, we need to optimize the queries by taking into account the mappings. Therefore, we will study static analysis of schema mappings and recursive queries. Finally, we develop concrete applications in which our fundamental techniques can be applied.