EN FR
EN FR


Section: New Results

Interactive Data Exploration at Scale

To respond to increasing user information needs in the era of Big Data, we aim to build interactive data exploration as a new database service, using an approach called “explore-by-example”. In particular, we cast the “explore-by-example” problem in a principled “active learning” framework, and bring the properties of important classes of database queries to bear on the design of new algorithms and optimizations for active learning based database exploration. We introduce a dual-space (data and version space) model for convex pattern queries, leverage the factorized dual-space model and online feature selection to handle high dimensional exploration, and design a new active learning algorithm based on version space reduction. These new techniques allow the database system to not only gain improved accuracy but also overcome fundamental limitations of traditional active learning, in particular, the slow convergence problem. Evaluation results using real-world datasets and user interest patterns show that our new system significantly outperforms state-of-the-art active learning techniques and data exploration systems in accuracy while achieving desired efficiency for interactive performance. In addition, we will extend current data exploration system to handle more complex inputs, such as pictures, by adding a active representation learning phase via neural networks to the existing system. Part of this work was explored during the M2 internship of Alexandre Sevin [25].