EN FR
EN FR


Section: New Results

Applications in Telecommunications

Participants : Zaid Allybokus, Sara Alouf, Eitan Altman, Konstantin Avrachenkov, Swapnil Dhamal, Alain Jean-Marie, Giovanni Neglia, Dimitra Politaki.

Caching

A fundamental brick of the information-centric architectures proposed for Internet evolution is in-network caching, i.e. the possibility for the routers to store locally the contents and directly serve future requests. This has raised a new interest in the performance of networks of caches. Since 2012, there has been a significant research activity in Neo on this topic. Our work raised the attention of researchers at Akamai Technologies (the world leader in Content Delivery Networks). In real caching systems the hit rate is often limited by the speed at which contents can be retrieved by the Hard-Disk Drive (HDD) (this is the so-called spurious misses' problem). Akamai researchers asked us to design an algorithm to solve this problem. In  [43] G. Neglia and D. Tsigkari, together with D. Carra (Univ. of Verona, Italy), M. Feng, V. Janardhan (Akamai Technologies, USA), and P. Michiardi (EURECOM ) have proposed a simple randomized caching policy that makes optimal use of the RAM to minimize the load on the HDD and then the number of spurious misses. Moreover, experiments in Akamai CDN have shown that our policy reduces the HDD load by an additional 10% in comparison to the (highly optimized) baseline policy currently employed by Akamai. In [15] a subset of the same authors (G. Neglia, D. Carra, P. Michardi) have shown that the same approach can be adapted to minimize any miss cost function as far as the cost is additive over the misses.

More recently, we moved to consider the problem of caches' coordination in a dense cellular network scenario, where caches are deployed at base stations (BSs) and a user can potentially retrieve the content from multiple BSs. In this setting, the optimal content placement problem is NP-hard even when the goal is simply to maximize the hit ratio. Most of the existing literature has proposed heuristics assuming that content popularities are static and known, but in reality their estimation can be very difficult at the scale of the geographical area covered by a BS. In [14] E. Leonardi (Politecnico di Torino, Italy) and G. Neglia have introduced a class of simple and fully distributed caching policies, which require neither direct communication among BSs, nor a priori knowledge of content popularity (strongly deviating from the assumptions of existing literature). They have shown that optimal coordination can be achieved by applying minor changes to existing policies and piggybacking an additional information bit to each content request. How to achieve coordination for more complex performance metrics (e.g. the retrieval time or fairness) is still an open research problem that is now the PhD subject of G. Iecker, co-supervised by G. Neglia and T. Spyropoulos (EURECOM ).

Modeling and workload characterization of data center clusters

There are many challenges faced when modeling computing clusters. In such systems, jobs to be executed are submitted by users. These jobs may generate a large number of tasks. Some tasks may be executed more than once while other may abandon before execution. D. Politaki, S. Alouf, F. Hermenier (Nutanix), and A. Jean-Marie have developed a multi-server queueing system with abandonments and resubmissions to model computing clusters. To capture the correlations observed in real workload submissions, a Batch Markov Arrival Process is considered. The service time is assumed to have a phase-type distribution. This model has not been analyzed in the literature. The distributions of the interarrivals and the service times found in the Google Cluster Data have been characterized and compared with fitted distributions. The authors findings support the model assumptions. Ongoing work investigates the approaches that can be adopted to overcome the technical challenges found in the performance evaluation of the computing clusters. In particular, the developed tool marmoteCore-Q (see §7.2.2) will be used.

To understand the essential characteristics of a computing cluster for modelling purposes, the same authors have looked into two datasets consisting of job scheduler logs. The first dataset comes from a Google cluster and is publicly available (https://github.com/google/cluster-data). The second dataset has been collected from the internal computing cluster of Inria Sophia-Antipolis Méditerrannée. After a preliminary analysis and sanitizing of each dataset, a numerical analysis is performed to characterize the different stochastic processes taking place in the computing cluster. In particular, the authors characterize the impatience process, the re-submission process, the arrival process (batch sizes and correlations) and the service time, considering the impact of the scheduling class and of the execution type.

Software Defined Networks (SDN)

The performance of computer networks relies on how bandwidth is shared among different flows. Fair resource allocation is a challenging problem particularly when the flows evolve over time. To address this issue, bandwidth sharing techniques that quickly react to the traffic fluctuations are of interest, especially in large scale settings with hundreds of nodes and thousands of flows. In this context, K. Avrachenkov and Z. Allybokus, together with J. Leguay (Huawei Research) and L. Maggi (Nokia Bell Labs), in [1] propose a distributed algorithm based on the Alternating Direction Method of Multipliers (ADMM) that tackles the multi-path fair resource allocation problem in a distributed SDN control architecture. Their ADMM-based algorithm continuously generates a sequence of resource allocation solutions converging to the fair allocation while always remaining feasible, a property that standard primal-dual decomposition methods often lack. Thanks to the distribution of all computer intensive operations, they demonstrate that large instances can be handled at scale.

Impulsive control of G-AIMD dynamics

Motivated by various applications from Internet congestion control to power control in smart grids and electric vehicle charging, in [20] K. Avrachenkov together with A. Piunovskiy and Y. Zhang (Univ. of Liverpool, UK) study Generalized Additive Increase Multiplicative Decrease (G-AIMD) dynamics under impulsive control in continuous time with the time average alpha-fairness criterion. They first show that the control under relaxed constraints can be described by a threshold. Then, they propose a Whittle-type index heuristic for the hard constraint problem. They prove that in the homogeneous case the index policy is asymptotically optimal when the number of users is large.

Application of Machine Learning to optimal resource allocation in cellular networks

In [9], E. Altman in collaboration with A. Chattopadhyay and B. Błaszczyszyn (from Inria Dyogene team) consider location-dependent opportunistic bandwidth sharing between static and mobile downlink users in a cellular network. In order to provide higher data rate to mobile users, the authors propose to provide higher bandwidth to the mobile users at favourable times and locations, and provide higher bandwidth to the static users in other times. They formulate the problem as Markov decision process (MDP) where the per-step reward is a linear combination of instantaneous data volumes received by static and mobile users. The transition structure of this MDP is not known in general. They thus propose a learning algorithms based on stochastic approximation with one and with two time scales. The results are extended to address the issue of fair bandwidth sharing between the two classes of users.

To optimize routing of flows in datacenters, SDN controllers receive a packet-in message whenever a new flow appears in the network. Unfortunately, flow arrival rates can peak to millions per second, impairing the ability of controllers to treat them on time. Flow scheduling copes with this by segmenting the traffic between elephant and mice flows and by treating elephant flows in priority, as they disrupt short lived TCP flows and create bottlenecks. In [21], E. Altman in collaboration with F. De Pellegrini (UAPV), L. Maggi (Huawei), A. Massaro (FBK Trento), D. Saucez (Inria Diana team) and J. Leguay (Huawei Research) propose a stochastic approximation based learning algorithm called SOFIA and able to perform optimal online flow segmentation. Extensive numerical experiments characterize the performance of SOFIA.

Forecast Scheduling

With the age of big data and with geo-localisation measurements available, the precision in predicting the mobility of users increases, and hence also that of the prediction of channel conditions. In [35], E. Altman in collaboration with H. Zaaraoui, S. Jema, Z. Altman (Orange Labs) and T. Jimenez (UAPV) propose a convex optimization approach to Forecast Scheduling which makes use of current and future predicted channel conditions to obtain an optimal alpha fair schedule. They further extend the model in [34] to take into account different types of random events such as arrival and departure of users and uncertainties in the mobile trajectories. Simulation results illustrate the significant performance gain achieved by the Forecast Scheduling algorithms in the presence of random events.

Fairness in allocation to users with different time constraints

E. Altman and S. Ramanath (IIT Bombay, India) study in [31] how to allocate resources fairly when different users have different time constraints for using the resources. They formulate this as a Markov Decision Process (MDP) for a two user case and provide a Dynamic Program (DP) solution. Simulation results in an LTE framework are provided to support the theoretical claims.