EN FR
EN FR


Section: New Results

Models and Algorithms for Fact-Checking and Data Journalism

We have advanced toward a generic definition of a computational fact-checking platform, and identified the set of core functionalities it should support: (i) extraction of a claim from a larger document (typically a text published online in some media, social network etc.); this may require identifying the time and space context in which the claim is supposed to hold; (ii) checking the accuracy of the claim against a set of reference data sources; (iii) putting the claim into perspective by checking its significance in a broader context, for instance by checking if the claim still holds after some minor modification of its temporal, spatial or numeric parameters. Checking a claim is not possible in the absence of a set of reference sources, containing data we consider to be true; thus reference source construction, refinement and selection are also central tasks in such an architecture. We have carried this work as part of the ANR ContentCheck project (Section 8.1.1) and also within our associated team with AIST Japan (Section 8.2.1.1). The architecture of the generic platform we envision has been presented in the Paris DB Day event in May 2017, in an ERCIM News [21] and in a keynote [24].

Within this architecture, an important task is to construct reference data sources and to make them more accessible. Toward this goal, we have devised an approach to extract Linked Open Data (RDF graphs) from Excel tables published by INSEE, the French national statistics institute [14]; the resulting data has been published online. Another ongoing line of work explored within the PhD of Ludivine Duroyon concerns establishing new models for temporal beliefs and statements, allowing journalists to increase the value of refernece sources on which to check who said what when.