Section: Research Program
Querying Heterogeneous Linked Data
Our main objective is to query collections of linked datasets. In the static setting, we consider two kinds of links: explicit links between elements of the datasets, such as equalities or pointers, and logical links between relations of different datasets such as schema mappings. In the dynamic setting, we permit a third kind of links that point to “intentional” relations computable from a description, such as the application of a Web service or the application of a schema mapping.
We believe that collections of linked datasets are usually too big to ensure a global knowledge of all datasets. Therefore, schema mappings and constraints should remain between pairs of datasets. Our main goal is to be able to pose a query on a collection of datasets, while accounting for the possible recursive effects of schema mappings. For illustration, consider a ring of datasets , , linked by schema mappings , , that tell us how to complete a database by new elements from the next database in the cycle.
The mappings induce three intentional datasets , , and , such that contains all elements from and all elements implied by from the next intentional dataset in the ring:
Clearly, the global information collected by the intentional datasets depends recursively on all three original datasets . Queries to the global information can now be specified as standard queries to the intentional databases . However, we will never materialize the intentional databases . Instead, we can rewrite queries on one of the intentional datasets to recursive queries on the union of the original datasets , , and with their links and relations. Therefore, a query answering algorithm is needed for recursive queries, that chases the “links” between the in order to compute the part of needed for the purpose of query answering.
This illustrates that we must account for the graph data models when dealing with linked data collections whose elements are linked, and that query languages for such graphs must provide recursion in order to chase links. Therefore, we will have to study graph databases with recursive queries, such as rdf graphs with sparql queries, but also other classes of graph databases and queries.
We study schemas and mappings between datasets with different kinds of data models and the complexity of evaluating recursive queries over graphs. In order to use schema mapping for efficiently querying the different datasets, we need to optimize the queries by taking into account the mappings. Therefore, we will study static analysis of schema mappings and recursive queries. Finally, we develop concrete applications in which our fundamental techniques can be applied.