Section: New Results

Causality, Explainability, and Reliability

As said, the fairness, accountability and transparency of AI/ML need be assessed, measured and enforced to address the ethical impacts of data science on industry and society. TAU has started working toward improving the confidence in ML algorithms through three research directions.


Links between quality of life at work and company performance Within the Amiqap project, a new approach to functional causal modeling from observational data called Causal Generative Neural Networks (CGNN) has been developed [45]. CGNN learns a generative model of the joint distribution of the observed variables, by minimizing the Maximum Mean Discrepancy between generated and observed data. An approximate learning criterion scales the computational cost of the approach to linear complexity in the number of observations. CGNN extensions, motivated by the redundancy of real-world variables, are under-going to achieve a causal model of the corporate- and human resource-related variables at the firm and economic sector levels.

Generating Medical Data This project, in collaboration with RPI (New York), aims to provide medical students with case studies, generated using CGNN . and fully preserving their confidentiality. We are exploring the benefits of using data generated by CGNNs in replacement for real data. Such data will preserve the structure of the original data, but the patient records will not represent real patients.

Missing Data Missing and corrupted data is a pervasive problem in data modeling. Our interest in this problem stems from 2 applications: epidemiology (in collaboration with Alain-Jacques Valleron, INSERM, and RPI New York) and computer vision (in collaboration with Aix-Marseille University and University of Barcelona). As it turns out, missing data is a causality problem  [80]. In a paper under review, we outline the danger of imputing values in risk factor analysis in the presence of missing data. We are also preparing a challenge on the problem of “inpainting’’ to restore images with occlusions and to eliminate captions in movies.

Power Networks Berna Batu (post-doc Inria) explores causal modeling in time series to explain cascades of events. Other applications (e.g., in epidemiology) may develop from this approach.


Explainable Machine Learning for Video Interviews   [21]. The challenge consisted in analyzing 15s videos, (human) annotated with the Big Five persinality traits (Openness to experience, Conscientiousness, Extroversion, Agreableness, and Neurotism – sometines referred to as OCEAN features). Human annotators also voted whether a given candidate should be invited for an interview. As organizers we provided a strong baseline system, which was based on deep learning methods having won part challenges. Onty the winners outperformed quantitatively the basline method.

The winner of the prediction challenge (BU-NKU) performed a very sophisticated analysis, combining face analysis (from the entire video) and scene analysis (from the first image), both analyses contributing to the final decision. Face analysis extracted spatio-temporal featured from a pre-trained convolutional neural network (CNN) and using Gabor filters. Scene analysis features were also extracted with a pre-trained CNN. Acoustic features were extracted with the OpenSMILE tool. From the feature set, the personality traits are predicted with kernel ridge regression and from there on, the “invite for interview” is predicted using Random Forests.

For the explainability challenge, the BU-NKU team performed final predictions with a classifier based on binarized predicted OCEAN scores mapped to the binarized ground truth using a decision tree, a self-explanatory model that can be converted into an explicit recommender algorithm, using the trace of each decision from the root of the tree to the leaf. The verbal explanations are finally accompanied with the aligned image from the first face-detected frame and the bar graphs of corresponding mean normalized scores. Trained on the predicted OCEAN dimensions, this gave over 90% classification accuracy.

Note that another team (TUD), who did not enter the quantitative competition, nevertheless won forst place ex-aequo with the BU-NKU team on the explainability challenge. Interestingly, they added facial features (using OpenFace) and text features (using published “Readability” features) in an effort to capture level of education from the sophistication of language, which was not captured by personality traits. They then used PCA to reduce dimention, and the coefficients of a linear regression model, fed back into the PCA model to generate explanations.

Skin image classification Also, the on-going collaboration with Roman Hossein Khonsari, surgeon at Necker hospital, is continuing, on the topic of skin disease image classification, with the goal of explaining how the trained neural networks produce their predictions, in order to be trusted by users. For this, we analyse the features that are learned, and show which ones are found in each image example.

Model systematic bias and reliability

A related problem is the reliability of models and their robustness to bias. We initiated research on this topic in the context of eliminating bias of Hight Energy Physiscs simulators. Discovering new particules relies on making accurate simulations of particle traces in detectors to diagnose collision events in high energy experiments. We are working on data from the ATLAS experiment at CERN, in collaboration with David Rousseau at the Laboratoire de l'Accelerateur Lineaire (LAL). We produced two preliminary studies on this topic: Adversarial learning to eliminate systematic errors: a case study in High Energy Physics [32] and Robust deep learning: A case study [43].

This line of research will extend to the calibration of other simulators, particularly energy transport and distribution simulators and medical data simulators, which we are working on in the context of other projetcs.

Beyond the calibration of simulators, we are also interested in using such approaches to forter fairness and de-bias data. For instance, in the “personality trait” data mentionned in the previous section, our analysis shows that labelers are biased favorably towards females (vs. males) and unfavorably towards African-American (vs. Caucasian or Asian).