Section: Research Program
Big Data-Driven Design
Big data-driven modelling/assimilation/simulation/design (BD3) is concerned with the calibration and extension of first principle-based models and equations using data (aka data assimilation), and using such models for optimal design. BD3 can significantly decrease time-to-design, through fast interactions between the modelling, predicting, optimizing, controlling and designing stages, sharing their advances (in particular, coupling first principles and data [63], or repairing/extending closed-form models). Besides the predictive modelling aspects, TAU more specifically investigates the generative and adversarial modelling aspects [68], aimed at data assimilation from biased data.
A first challenge is to find an operational umbrella to handle noisy, sparse, unstructured, missing data, possibly issued from different distributions (e.g. simulated vs real-world data). Collaborative filtering, deep learning, and their hybrids can be used to forge scalable unified intermediate representations, with applications in energy and computational social sciences (involving times series, documents, and/or graphs). Related issues regard the interpretation of such latent representations and the decisions based thereupon. Another challenge is to deliver guarantees for the data-driven models and designs. As more intelligence is put in the modelling, more intelligence must be put in the validation, as reminded by Leon Bottou. Along this way, generative models will be used to support the design of "what if" scenarios, to enhance anomaly detection and monitoring via refined likelihood criteria.
Several recent, on-going, or submitted projects witness the links of TAU members with experts from application domains: in High Energy Physics (LAL, CERN), in space weather (CWI), in anomaly detection (Thalès ThereSIS), and, within the ADAMME project (FUI 2016), in automatic image labelling (Armadillo), and in yield management (VoyagesSncf.com Technologies).