CEDAR - 2019 - Annual activity report

CEDAR

CEDAR - 2019

Project-Team Cedar

Team, Visitors, External Collaborators

Overall Objectives

Research Program

Application Domains

Highlights of the Year

New Software and Platforms

New Results

Partnerships and Cooperations

Dissemination

Bibliography

Previous |

Home | Next next

Section: New Results

Scalable storage for polystores

Big data applications routinely involve diverse datasets: relations flat or nested, complex-structure graphs, documents, poorly structured logs, or even text data. To handle the data, application designers usually rely on several data stores used side-by-side, each capable of handling one or a few data models (e.g., many relational stores can also handle JSON data), and each very efficient for some, but not all, kinds of processing on the data.

A current limitation is that applications are written taking into account which part of the data is stored in which store and how. This fails to take advantage of ( $i$ ) possible redundancy, when the same data may be accessible (with different performance) from distinct data stores; ( $i i$ ) partial query results (in the style of materialized views) which may be available in the stores. If data migrates to another store, to take advantage of its performance for a specific task, applications must be re-written; this is tedious and error-prone.

In [11], we present Estocada , a novel approach connecting applications to the potentially heterogeneous systems where their input data resides. Estocada can be used in a polystore setting to transparently enable each query to benefit from the best combination of stored data and available processing capabilities. Estocada leverages recent advances in the area of view-based query rewriting under constraints, which we use to describe the various data models and stored data. Our experiments illustrate the significant performance gains achieved by Estocada .

Previous |

Home | Next next