Section: New Results

Data and Process Sharing

Social-based P2P Data Sharing

Participants : Hinde Bouziane, Michèle Cart, Esther Pacitti, Didier Parigot, Guillaume Verger.

This work focuses on P2P content recommendation for on-line communities. In [20] , we propose P2Prec, a recommendation service for P2P content sharing systems that exploits users' social data. Given a query, P2PRec finds peers that can recommend high quality documents that are relevant for the query. A document is relevant to a query if it covers the same topics. It is of high quality if relevant peers have rated it highly. P2PRec finds relevant peers through a variety of mechanisms including advanced content-based and collaborative filtering. The topics each peer is interested in are automatically calculated by analyzing the documents the peer holds. Peers become relevant for a topic if they hold a certain number of highly rated documents on this topic. To efficiently disseminate information about peers’ topics and relevant peers, we proposed new semantic-based gossip protocols. In our experimental evaluation, using the TREC09 dataset, we showed that using semantic gossip increases recall by a factor of 1.6 compared to well-known random gossiping. Furthermore, P2Prec has the ability to get reasonable recall with acceptable query processing load and network traffic. P2Prec was demonstred in [31] and [47] .

In [30] , we exploit social relationships between users as a parameter to increase the trust of recommendation. We propose a novel P2P recommendation approach (called F2Frec) that leverages content and social-based recommendation by maintaining a P2P and friend-to-friend network. This network is used as a basis to provide useful and high quality recommendations. Based on F2Frec, we propose new metrics, such as usefulness and similarity (among users and their respective friend network). We define our proposed metrics based on users’ topic of interest and relevant topics that are automatically extracted from the contents stored by each user. Our experimental evaluation, using the TREC09 dataset and Wiki vote social network, shows the benefits of our approach compared to anonymous recommendation. In addition, we show that F2Frec increases recall by a factor of 8.8 compared with centralized collaborative filtering.

Satisfaction-based Query Replication

Participant : Patrick Valduriez.

In a large-scale Internet-based distributed, participants (consumers and providers) who are willing to share data are typically autonomous, i.e. they may have special interests towards queries and other participants' data. In this context, a way to avoid a participant to voluntarily leave the system is satisfying its interests when allocating queries. However, participants' satisfaction may also be negatively affected by the failures of other participants. Query replication can deal with providers failures, but, it is challenging because of autonomy: it cannot only quickly overload the system, but also dissatisfy participants with uninteresting queries. Thus, a natural question arises: should queries be replicated? If so, which ones? and how many times?

In [25] , we answer these questions by revisiting query replication from a satisfaction and probabilistic point of view. We propose a new algorithm, called S b QR, that decides on-the-fly whether a query should be replicated and at which rate. As replicating a large number of queries might overload the system, we propose a variant of our algorithm, called S b QR+. The idea is to voluntarily fail to allocate as many replicas as required by consumers for low critical queries so as to keep resources for high critical queries during query-intensive periods. Our experimental results demonstrate that our algorithms significantly outperform the baseline algorithms from both the performance and satisfaction points of view. We also show that our algorithms automatically adapt to the criticality of queries and different rates of participant failures.

View Selection in Scientific Data Warehousing

Participants : Zohra Bellahsène, Rémi Coletta, Imen Mami.

Scientific data generate large amounts of data which have to be collected and stored for analytical purpose. One way to help managing and analyzing large amounts of data is data warehousing, whereby views over data are materialized. However, view selection is an NP-hard problem because of many parameters: query cost, view maintenance cost and storage space. In [36] , we propose a new solution based on constraint programming, which has proven efficient at solving combinatorial problems. This allows using a constraint programming solver to set up the search space by identifying a set of views that minimizes the total query cost. We address view selection under two cases: (1) only the total view maintenance cost needs be minimed, assuming unlimited storage space (meaning that it is not a critical resource anymore); (2) both storage space and maintenance cost must be mimized. We implemented our approach and compared it with a randomized method (i.e., genetic algorithm). We experimentally show that our approach provides better performance resulting from evaluating the quality of the solutions in terms of cost savings. Furthermore, our approach scales well with the query workload.

Scientific Workflow Management

Participants : Ayoub Ait Lahcen, Eduardo Ogasawara, Didier Parigot, Patrick Valduriez.

Scientific workflows have emerged as a basic abstraction for structuring and executing scientific experiments in computational environments. In many situations, these workflows are computationally and data-intensive, thus requiring execution in large-scale parallel computers. However, parallelization of scientific workflows remains low-level, ad-hoc and laborintensive, which makes it hard to exploit optimization opportunities.

To address this problem, we propose in [23] an algebraic approach (inspired by relational algebra) and a parallel execution model that enable automatic optimization of scientific workflows. With our scientific workflow algebra, data is uniformly represented by relations and workflow activities are mapped to operators that have data aware semantics. Our workflow execution model is based on the concept of activity activation, which enables transparent distribution and parallelization of activities;

We conducted a thorough validation of our approach using both a real oil exploitation application and synthetic data scenarios. The experiments were run in Chiron, a data-centric scientific workflow engine implemented at UFRJ to support our algebraic approach. Our experiments demonstrate performance improvements of up to 226% compared to an ad-hoc workflow implementation. This work was done in the context of the Equipe Associée Sarava and the CNPq-INRIA project DatLuge.

In the context of SON, we also proposed a declarative workflow language based on service/activity rules [41] . This language makes it possible to infer a dependency graph for SON applications that provides for automatic parallelization.