## Section: Overall Objectives

### Structure of the team

Our ambitious goal is to provide the algorithmic foundations of large-scale dynamic distributed systems, ranging from abstractions to real deployment. This is reflected in the following objectives:

#### Objective 1: Decentralized personalization

Our first objective is to offer full-fledged personalization in notification systems. Today, almost everyone is suffering from an overload of information that hurts both users and content providers. This suggests that not only will notification systems take a prominent role but also that, in order to be useful, they should be personalized to each and every user depending on her activity, operations, posts, interests, etc. In the Gossple implicit instant item recommender, through a simple interface, users get automatically notified of items of interest for them, without explicitly subscribing to feeds or interests. They simply have to let the system know whether they like the items they receive (typically through a like/dislike button). Throughout the system's operation the personal data of users is stored on their own machines, which makes it possible to provide a wide spectrum of privacy guarantees while enabling cross-application benefits.

Our goal here is to provide a fully decentralized solution without ever requiring users to reveal their private preferences.

#### Objective 2: Personalization: Cloud computing meets p2p

Our second objective is to move forward in the area of
**personalization**. Personalization is one of the biggest
challenges addressed by most large stake holders.

**Hybrid infrastructures for personalisation.** So far, social
filtering techniques have mainly been implemented on centralized
architectures relying on smart heuristics to cope with an increasing
load of information. We argue however that, no matter how smart these
heuristics and how powerful the underlying machines running them, a
fully centralized approach might not be able to cope with the
exponential growth of the Internet and, even if it does, the price to
be paid might simply not be acceptable for its users (privacy,
ecological footprint, etc.).

At the other end of the spectrum, lie fully decentralized systems where the collaborative filtering system is implemented by the machines of the users themselves. Such approaches are appealing for both scalability and privacy reasons. With respect to scalability, storage and computational units naturally grow with the number of users. Furthermore, a p2p system provides an energy-friendly environment where every user can feel responsible for the ecological foot-print of her exploration of the Internet. With respect to privacy, users are responsible for the management of their own profiles. Potential privacy threats therefore do not come from a big-brother but may still arise due to the presence of other users.

We have a strong experience in devising and experimenting with such kinds of p2p systems for various forms of personalization. More specifically, we have shown that personalization can be effective while maintaining a reasonable level of privacy. Nevertheless, frequent connections/disconnections of users make such systems difficult to maintain while addressing privacy attacks. For this reason, we also plan to explore hybrid approaches where the social filtering is performed by the users themselves, as in a p2p manner, whereas the management of connections-disconnections, including authentication, is managed through a server-based architecture. In particular, we plan to explore the trade-off between the quality of the personalization process, its efficiency and the privacy guarantees.

#### Objective 3: Privacy-aware decentralized computations

Gossip algorithms have also been studied for more complex global tasks, such as computation of network statistics or, more generally, aggregation functions of input values of the nodes (e.g., sum, average, or max). We plan to pursue this research direction both from a theoretical and from a practical perspective. We provide two examples of these directions below.

**Computational capabilities of gossip**. On the theoretical
side, we have recently started to study gossip protocols for the
assignment of unique IDs from a small range to all nodes (known as the
*renaming* problem) and computing the rank of the input value of
each node. We plan to further investigate the class of global tasks
that can be solved efficiently by gossip protocols.

**Private computations on decentralized data.** On a more
practical track, we aim to explore the use of gossip protocols for
decentralized computations on privacy sensitive data. Recent research
on private data bases, and on homomorphic encryption, has demonstrated
the possibility to perform complex operations on encrypted data. Yet,
existing systems have concentrated on relatively small-scale
applications. In the coming years, we instead plan to investigate the
possibility to build a framework for querying and performing
operations for large-scale decentralized data stores. To achieve this,
we plan to disseminate queries in an epidemic fashion through a network
of data sources distributed on a large scale while combining privacy
preserving techniques with decentralized computations. This would, for
example, enable the computation of statistical measures on large
quantities of data without needing to access and disclose each single
data item.

#### Objective 4: Information dissemination over social networks

While we have been studying information dissemination in practical settings (such as WhatsUp in Gossple ), modelling such dynamic systems is still in its infancy. We plan to complement our practical work on gossip algorithms and information dissemination along the following axes:

**Rumour spreading** is a family of simple randomized algorithms
for information dissemination, in which nodes contact (uniformly)
random neighbours to exchange information with them. Despite their
simplicity these protocols have proved very efficient for various
network topologies. We are interested in studying their properties in
specific topologies such as social networks be they implicit
(interest-based as in Gossple ) or explicit (where users choose their
friends as in Facebook). Recently, there has been some work on
bounding the speed of rumour spreading in terms of abstract properties
of the network graph, especially the graph's expansion properties of
conductance and vertex expansion. It has been shown that high values
for either of these guarantees fast rumour spreading—this should be
related to empirical observations that social networks have high
expansion. Some works established increasingly tighter upper bounds
for rumour spreading in term of conductance or vertex expansion, but
these bounds are not tight.

Our objective is to prove the missing tight upper bound for rumour spreading with vertex expansion. It is known that neither conductance nor vertex expansion are enough by themselves to completely characterize the speed of rumour spreading: are there graphs with bad expansion in which rumours spread fast? We plan to explore more refined notions of expansion and possibly other abstract graph properties, to establish more accurate bounds. Another interesting and challenging problem is the derivation of general lower bounds for rumour spreading as a function of abstract properties of graphs. No such bounds are currently known.

**Overcoming the dependence on expansion**:
Rumour spreading algorithms have very nice properties as their simplicity, good performances for many networks but they
may have very poor performance for some networks, even though these
networks have small diameter, and thus it is possible to achieve fast
information dissemination with more sophisticated protocols. Typically
nodes may choose the neighbours to contact with some non-uniform
probabilities that are determined based on information accumulated by
each node during the run of the algorithm. These algorithms achieve
information dissemination in time that is close to the diameter of the
network. These algorithms, however, do not meet some of the other nice
properties of rumour spreading, most importantly, robustness against
failures. We are investigating algorithms that combine the good
runtime of these latest protocols with the robustness of rumour
spreading. Indeed these algorithms assumed that the network topology
does not change during their run. In view of the dynamism of real
networks, in which nodes join and leave and connection between nodes
change constantly, we have to address dynamic network models. We plan
to investigate how the classic rumour spread algorithms perform in the
face of changes. We plan also in this area to reduce the size of the
messages they use, which can be high even if the amount of useful
information that must be disseminated is small.

**Competing rumours**: Suppose now that two, or more, conflicting
rumours (or opinions) spread in the network, and whenever a node
receives different rumours it keeps only one of them. Which rumour
prevails, and how long does it take until this happens? Similar
questions have been studied in other contexts but not in the context
of rumour spreading. The *voter* model is a well studied graph process that can be viewed as a competing
rumour process that follows the classic PULL rumour spreading
algorithm. However, research has only recently started to address the
question of how long it takes until a rumour prevails. An interesting variant of the problem that has not been considered
before is when different rumours are associated with different weights
(some rumours are more convincing than others).
We plan to study the above models and variations of them, and
investigate their connection to the standard rumour spreading
algorithms. This is clearly related to the dissemination of news and
personalization in social networks.

#### Objective 5: Computability and efficiency of distributed systems

A very relevant challenge (maybe a Holy Grail) lies in the definition of a computation model appropriate to dynamic systems. This is a fundamental question. As an example there are a lot of peer-to-peer protocols but none of them is formally defined with respect to an underlying computing model. Similarly to the work of Lamport on "static" systems, a model has to be defined for dynamic systems. This theoretical research is a necessary condition if one wants to understand the behavior of these systems. As the aim of a theory is to codify knowledge in order it can be transmitted, the definition of a realistic model for dynamic systems is inescapable whatever the aim we have in mind, be it teaching, research or engineering.

**Distributed computability**:
Among the fundamental theoretical results of distributed computing,
there is a list of problems (e.g., consensus or non-blocking atomic
commit) that have been proved to have no deterministic solution in
asynchronous distributed computing systems prone to failures.
In order such a problem to become solvable in an asynchronous
distributed system, that system has to be enriched with an appropriate
oracle (also called failure detector). We have
been deeply involved in this research and designed optimal consensus
algorithms suited to different kind of oracles.
This line of research paves the way to rank the distributed computing
problems according to the "power" of the additional oracle they required
(think of "additional oracle" as "additional assumptions"). The
ultimate goal would be the statement of a distributed computing
hierarchy, according to the minimal assumptions needed to solve
distributed computing problems (similarly to the
Chomsky's hierarchy that ranks problems/languages according to the
type of automaton they need to be solved).

**Distributed computing abstractions**: Major advances in
sequential computing came from machine-independent data abstractions
such as sets, records, etc., control abstractions such as while, if,
etc., and modular constructs such as functions and procedures. Today,
we can no longer envisage not to use these abstractions. In the
"static" distributed computing field, some abstractions have been
promoted and proved to be useful. Reliable broadcast, consensus,
interactive consistency are some examples of such abstractions.
These abstractions have well-defined specifications. There are both
a lot of theoretical results on them (mainly decidability and lower
bounds), and numerous implementations. There is no such equivalent
for dynamic distributed systems, i.e. for systems characterized by
nodes that may join and leave, or that may change their
characteristics at runtime. Our goal is to define such novel
abstractions, thereby extending the theory of distributed systems to
the dynamic case.