Section: New Results
Introduction
In this section, we organize the bulk of our contributions this year along two of our research axes, namely Pattern Mining and Decision Support. Some other contributions lie within the domains of query optimization and machine learning.
Pattern Mining
In the domain of pattern mining we can categorize our contributions along the following lines:
-
Mining of novel types of patterns. This includes mining of negative patterns [24], [14] and periodic patterns [18].
-
Data Mining for the masses. In [11], we propose a communication model that bridges knowledge delivery between data miners and domain users in the field of library science. Our model proposes a five-steps process in order to achieve effective knowledge synthesis and delivery of insights to the domain users.
-
Efficient pattern mining. In [10], we propose a method to sample itemsets efficiently on streaming data. This contribution tackles two limitations of the state of the art in pattern mining: (1) the so-called pattern explosion —the user is confronted to too many patterns—, and (2) the assumption of static data.
-
Data Mining for Data Science. One of the most basic types of patterns is to know if the data makes one single group, i.e., is unimodal, or can be clustered into several groups. In [13], we propose a new statistical test of unimodality, that is both independent of the input distribution and computationally efficient.
Decision Support
In regards to the axis of decision support, our contributions can be organized in two categories: forecasting & prediction, and anomaly detection.
-
Forecasting & prediction. In [15], [12], we propose solutions to automate the task of capacity planning in the context of a large data network as the one available at Orange. The work in [19] offers a tool to predict the nutritional needs of sows in lactation.
-
Anomaly Detection. The work in [20] tackles the problem of fraud detection under imbalanced data.
Others
-
Machine Learning.[16] proposes a novel algorithm to weight the importance of classification errors when training a classifier. [8] proposes a classification algorithm optimized for highly imbalanced data.
-
Query optimization. In [9] we propose a query-load-agnostic caching approach to speed-up provenance-aware queries in RDF data cubes.