EN FR
EN FR


Section: Research Program

Data interlinking

Vast amounts of rdf data are made available on the web by various institutions providing overlapping information. To be fully exploited, different representations of the same object across various data sets have to be identified. Data interlinking is the process of generating links identifying the same resource described in two data sets.

We have introduced link keys [4], [1] which extend database keys in a way which is more adapted to rdf and deal with two data sets instead of a single relation. More precisely, a link key is a structure Keq,Kin,C such that:

  • Keq and Kin are sets of pairs of property expressions;

  • C is a pair of class expressions (or a correspondence).

Such a link key holds if and only if for any pair of resources belonging to the classes in correspondence such that the values of their property in Keq are pairwise equal and the values of those in Kin pairwise intersect, the resources are the same. Link keys can then be used for finding equal individuals across two data sets and generating the corresponding owl:sameAs links. Link keys take into account the non functionality of rdf data and have to deal with non literal values. In particular, they may use arbitrary properties and class expressions. This renders their discovery and use difficult.