Section: Research Program

Data interlinking

Links are important for the publication of rdf data on the web. We call data interlinking the process of generating links identifying same resource described in two data sets. Data interlinking parallels ontology matching: from two datasets (d and d') it generates a set of links (also called a linkset, L).

We have extended the notion of database keys in a way which is more adapted to the context of description logics and the openness of the semantic web [11] (Time did not permit to input properly all publications in HAL v3. We understand well that these are thus not Inria publications. However, we put them as footnotes in case they may interest the reader. They are all directly available from our team web site.). Like alignments, link keys [3] are assertions across ontologies and are not part of a single ontology. We have introduced the notion of a link key which is a combination of such keys with alignments. More precisely, a link key is an expression Keq,Kin,C such that:

  • Keq is a set of pairs of property expressions;

  • Kin is a set of pairs of property expressions;

  • C is a correspondence between classes.

Such a link key holds if and only if for any pair of resources belonging to the classes in correspondence such that the values of their property in Keq are pairwise equal and the values of those in Kin pairwise intersect, the resources are the same.

As can be seen, link key validity is only relying on pairs of objects in two different data sets. We further qualify link keys as weak, plain and strong depending on them satisfying further constraints: a weak link key is only valid on pairs of individuals of different data sets, a plain link key has to apply in addition to pairs of individuals of the same data set as soon as one of them is identified with another individual of the other data set, a strong link key is a link key which is also a key for each data set, it can be though of as a link key which is made of two keys.

Link keys can then be used for finding equal individuals across the two data sets and generating the corresponding owl:sameAs links.