GANG - 2016 - Annual activity report

GANG

GANG - 2016

Project-Team Gang

Members

Overall Objectives

Research Program

Application Domains

Large scale networks

New Software and Platforms

big-graph-tools

New Results

Bilateral Contracts and Grants with Industry

Collaboration with Nokia Bell Labs

Partnerships and Cooperations

Dissemination

Bibliography

Publications of the year

Previous |

Home | Next next

Section: New Results

Models and Algorithms for Networks

Beyond Highway Dimension: Small Distance Labels Using Tree Skeletons

The goal of a hub-based distance labeling scheme for a network $G = (V, E)$ is to assign a small subset $S (u) \subseteq V$ to each node $u \in V$ , in such a way that for any pair of nodes $u, v$ , the intersection of hub sets $S (u) \cap S (v)$ contains a node on the shortest $u v$ -path. The existence of small hub sets, and consequently efficient shortest path processing algorithms, for road networks is an empirical observation. A theoretical explanation for this phenomenon was proposed by Abraham et al. (SODA 2010) through a network parameter they called highway dimension, which captures the size of a hitting set for a collection of shortest paths of length at least $r$ intersecting a given ball of radius $2 r$ . In [38], we revisit this explanation, introducing a more tractable (and directly comparable) parameter based solely on the structure of shortest-path spanning trees, which we call skeleton dimension. We show that skeleton dimension admits an intuitive definition for both directed and undirected graphs, provides a way of computing labels more efficiently than by using highway dimension, and leads to comparable or stronger theoretical bounds on hub set size.

Sublinear-Space Distance Labeling using Hubs

Continuing work in the previously discussed framework of hub-based distance labeling schemes, in [36], [39], we present a hub labeling which allows us to decode exact distances in sparse graphs using labels of size sublinear in the number of nodes. For graphs with at most $n$ nodes and average degree $Δ$ , the tradeoff between label bit size $L$ and query decoding time $T$ for our approach is given by $L = O (n log {log}_{Δ} T / {log}_{Δ} T)$ , for any $T \leq n$ . Our simple approach is thus the first sublinear-space distance labeling for sparse graphs that simultaneously admits small decoding time (for constant $Δ$ , we can achieve any $T = ω (1)$ while maintaining $L = o (n)$ ), and it also provides an improvement in terms of label size with respect to previous slower approaches.

By using similar techniques, we then present a 2-additive labeling scheme for general graphs, i.e., one in which the decoder provides a 2-additive-approximation of the distance between any pair of nodes. We achieve almost the same label size-time tradeoff $L = O (n {log}^{2} log T / log T)$ , for any $T \leq n$ . To our knowledge, this is the first additive scheme with constant absolute error to use labels of sublinear size. The corresponding decoding time is then small (any $T = ω (1)$ is sufficient).

We believe all of our techniques are of independent value and provide a desirable simplification of previous approaches.

Labeling Schemes for Ancestry Relation

In [17], we solve the ancestry-labeling scheme problem which aims at assigning the shortest possible labels (bit strings) to nodes of rooted trees, so that ancestry queries between any two nodes can be answered by inspecting their assigned labels only. This problem was introduced more than twenty years ago by Kannan et al. [STOC '88], and is among the most well-studied problems in the field of informative labeling schemes. We construct an ancestry-labeling scheme for n-node trees with label size ${log}_{2} n + O (log log n)$ bits, thus matching the $l o g_{2} n + Ω (log log n)$ bits lower bound given by Alstrup et al. [SODA '03]. Our scheme is based on a simplified ancestry scheme that operates extremely well on a restricted set of trees. In particular, for the set of n-node trees with depth at most d, the simplified ancestry scheme enjoys label size of ${log}_{2} n + 2 {log}_{2} d + O (1)$ bits. Since the depth of most XML trees is at most some small constant, such an ancestry scheme may be of practical use. In addition, we also obtain an adjacency-labeling scheme that labels n-node trees of depth $d$ with labels of size ${log}_{2} n + 3 {log}_{2} d + O (1)$ bits. All our schemes assign the labels in linear time, and guarantee that any query can be answered in constant time. Finally, our ancestry scheme finds applications to the construction of small universal partially ordered sets (posets). Specifically, for any fixed integer $k$ , it enables the construction of a universal poset of size $O (n^{k})$ for the family of $n$ -element posets with tree-dimension at most $k$ . Up to lower order terms, this bound is tight thanks to a lower bound of $n^{k - o (1)}$ due to Alon and Scheinerman [Order '88].

Independent Lazy Better-Response Dynamics on Network Games

In [43], we study an independent best-response dynamics on network games in which the nodes (players) decide to revise their strategies independently with some probability. We are interested in the convergence time to the equilibrium as a function of this probability, the degree of the network, and the potential of the underlying games.

Forwarding Tables Verification through Representative Header Sets

Forwarding table verification consists in checking the distributed data-structure resulting from the forwarding tables of a network. A classical concern is the detection of loops. We study in [42] this problem in the context of software-defined networking (SDN) where forwarding rules can be arbitrary bitmasks (generalizing prefix matching) and where tables are updated by a centralized controller. Basic verification problems such as loop detection are NP-hard and most previous work solves them with heuristics or SAT solvers. We follow a different approach based on computing a representation of the header classes, i.e. the sets of headers that match the same rules. This representation consists in a collection of representative header sets, at least one for each class, and can be computed centrally in time which is polynomial in the number of classes. Classical verification tasks can then be trivially solved by checking each representative header set. In general, the number of header classes can increase exponentially with header length, but it remains polynomial in the number of rules in the practical case where rules are constituted with predefined fields where exact, prefix matching or range matching is applied in each field (e.g., IP/MAC addresses, TCP/UDP ports). We propose general techniques that work in polynomial time as long as the number of classes of headers is polynomial and that do not make specific assumptions about the structure of the sets associated to rules. The efficiency of our method rely on the fact that the data-structure representing rules allows efficient computation of intersection, cardinal and inclusion. Finally, we propose an algorithm to maintain such representation in presence of updates (i.e., rule insert/update/removal). We also provide a local distributed algorithm for checking the absence of black-holes and a proof labeling scheme for locally checking the absence of loops.

A Locally-Blazed Ant Trail Achieves Efficient Collective Navigation Despite Limited Information

This work fits into the framework of computationally-inspired analysis of biological systems. Any organism faces sensory and cognitive limitations which may result in maladaptive decisions. Such limitations are prominent in the context of groups where the relevant information at the individual level may not coincide with collective requirements. In [14], we study the navigational decisions exhibited by Paratrechina longicornis ants as they cooperatively transport a large food item. These decisions hinge on the perception of individuals which often restricts them from providing the group with reliable directional information. We find that, to achieve efficient navigation despite partial and even misleading information, these ants employ a locally-blazed trail. This trail significantly deviates from the classical notion of an ant trail: First, instead of systematically marking the full path, ants mark short segments originating at the load. Second, the carrying team constantly loses the guiding trail. We experimentally and theoretically show that the locally-blazed trail optimally and robustly exploits useful knowledge while avoiding the pitfalls of misleading information.

Parallel Exhaustive Search without Coordination

In [31], we analyze parallel algorithms in the context of exhaustive search over totally ordered sets. Imagine an infinite list of “boxes”, with a “treasure” hidden in one of them, where the boxes' order reflects the importance of finding the treasure in a given box. At each time step, a search protocol executed by a searcher has the ability to peek into one box, and see whether the treasure is present or not. Clearly, the best strategy of a single searcher would be to open the boxes one by one, in increasing order. Moreover, by equally dividing the workload between them, $k$ searchers can trivially find the treasure $k$ times faster than one searcher. However, this straightforward strategy is very sensitive to failures (e.g., crashes of processors), and overcoming this issue seems to require a large amount of communication. We therefore address the question of designing parallel search algorithms maximizing their speed-up and maintaining high levels of robustness, while minimizing the amount of resources for coordination. Based on the observation that algorithms that avoid communication are inherently robust, we focus our attention on identifying the best running time performance of non-coordinating algorithms. Specifically, we devise non-coordinating algorithms that achieve a speed-up of $9 / 8$ for two searchers, a speed-up of $4 / 3$ for three searchers, and in general, a speed-up of $\frac{k}{4} {(1 + 1 / k)}^{2}$ for any $k \geq 1$ searchers. Thus, asymptotically, the speed-up is only four times worse compared to the case of full-coordination. Moreover, these bounds are tight in a strong sense as no non-coordinating search algorithm can achieve better speed-ups. Furthermore, our algorithms are surprisingly simple and hence applicable. Overall, we highlight that, in faulty contexts in which coordination between the searchers is technically difficult to implement, intrusive with respect to privacy, and/or costly in term of resources, it might well be worth giving up on coordination, and simply run our non-coordinating exhaustive search algorithms.

Rumor Spreading in Random Evolving Graphs

Randomized gossip is one of the most popular way of disseminating information in large scale networks. This method is appreciated for its simplicity, robustness, and efficiency. In the Push protocol, every informed node selects, at every time step (a.k.a. round), one of its neighboring node uniformly at random and forwards the information to this node. This protocol is known to complete information spreading in $O (log n)$ time steps with high probability (w.h.p.) in several families of $n$ -node static networks. The Push protocol has also been empirically shown to perform well in practice, and, specifically, toe robust against dynamic topological changes. In [15], we aim at analyzing the Push protocol in dynamic networks. We consider the edge-Markovian evolving graph model which captures natural temporal dependencies between the structure of the network at time $t$ , and the one at time $t + 1$ . Precisely, a non-edge appears with probability $p$ , while an existing edge dies with probability $q$ . In order to fit with real-world traces, we mostly concentrate our study on the case where $p = Ω (\frac{1}{n})$ and $q$ is constant. We prove that, in this realistic scenario, the Push protocol does perform well, completing information spreading in $O (log n)$ time steps w.h.p. Note that this performance holds even when the network is, w.h.p., disconnected at every time step (e.g., when $p ≪ \frac{log n}{n}$ ). Our result provides the first formal argument demonstrating the robustness of the Push protocol against network changes. We also address another range of parameters $p$ and $q$ , namely $p + q = 1$ with arbitrary $p$ and $q$ . Although this latter range does not precisely fit with the measures performed on real-world traces, they can be of independent interest for other settings. The result in this case confirms the positive impact of dynamism.

Sparsifying Congested Cliques and Core-Periphery Networks

The core-periphery network architecture proposed by Avin et al. [ICALP 2014] was shown to support fast computation for many distributed algorithms, while being much sparser than the congested clique. For being efficient, the core-periphery architecture is however bounded to satisfy three axioms, among which is the capability of the core to emulate the clique, i.e., to implement the all-to-all communication pattern, in $O (1)$ rounds in the CONGEST model. In [26], we show that implementing all-to-all communication in $k$ rounds can be done in $n$ -node networks with roughly $n^{2} / k$ edges, and this bound is tight. Hence, sparsifying the core beyond just saving a fraction of the edges requires to relax the constraint on the time to simulate the congested clique. We show that, for $p ≫ \sqrt{log n / n}$ , a random graph in $𝒢_{n, p}$ can, w.h.p., perform the all-to-all communication pattern in $O (min {\frac{1}{p^{2}}, n p})$ rounds. Finally, we show that if the core can emulate the congested clique in $t$ rounds, then there exists a distributed MST construction algorithm performing in $O (t log n)$ rounds. Hence, for $t = O (1)$ , our (deterministic) algorithm improves the best known (randomized) algorithm for constructing MST in core-periphery networks by a factor $Θ (log n)$ .

Core-periphery Clustering and Collaboration Networks

In [28], we analyse the core-periphery clustering properties of collaboration networks, where the core of a network is formed by the nodes with highest degree. In particular, we first observe that, even for random graph models aiming at matching the degree-distribution and/or the clustering coefficient of real networks, these models produce synthetic graphs which have a spatial distribution of the triangles with respect to the core and to the periphery which does not match the spatial distribution of the triangles in the real networks. We therefore propose a new model, called CPCL, whose aim is to distribute the triangles in a way fitting with their real core-periphery distribution, and thus producing graphs matching the core-periphery clustering of real networks.

Previous |

Home | Next next