Section: New Results

High Speed Network's traffic metrology and statistical analysis

A long-range dependent model for network traffic with flow-scale correlations

Participant : Paulo Gonçalves.

This is a joint work with Patrick Loiseau (Eurecom) and Pascale Vicat-Blanc (Lyatiss).

For more than a decade, it has been observed that network traffic exhibits long-range dependence and many models have been proposed relating this property to heavy-tailed flow durations. However, none of these models consider correlations at flow scale. Such correlations exist and will become more prominent in the future Internet with the emergence of flow-aware control mechanisms correlating a flow’s transmission to its characteristics (size, duration, etc.). In our present work, we study the impact of the correlation between flow rates and durations on the long-range dependence of aggregate traffic. Our results extend those of existing models by showing that two possible regimes of long-range dependence exist at different time scales. The long-range dependence in each regime can be stronger or weaker than standard predictions, depending on the conditional statistics between the flow rates and durations. In the independent case, our proposed model consistently reduces to former approaches. The pertinence of our model is validated on real web traffic traces, and its ability to accurately explain the Hurst parameter is validated on both web traces and numerical simulations.

A reccurent solution of Ph/M/c/N-like and Ph/M/c-like queues

Participant : Thomas Begin.

This work has been accepted for publication by the Journal of Applied Probability [50] and was performed in collaboration with Pr. Brandwajn (UCSC).

We propose an efficient semi-numerical approach to compute the steady-state probability distribution for the number of requests at arbitrary and at arrival time instants in Ph/M/c-like systems in which in the inter-arrival time distribution is represented by an acyclic set of memoryless phases. Our method is based on conditional probabilities and results in a simple computationally stable recurrence. It avoids the explicit manipulation of potentially large matrices and involves no iteration. Due to the use of conditional probabilities, it delays the onset of numerical issues related to floating-point underflow as the number of servers and/or phases increases. For generalized Coxian distributions, the computational complexity of the proposed approach grows linearly with the number of phases in the distribution.

A Markovian model based on SIR epidemic classification to reproduce the workload dynamics of a VoD server

Participants : Shubhabrata Roy, Thomas Begin, Paulo Gonçalves.

We have devised a Markovian model, based on the SIR epidemic classification, to reproduce the workload dynamics that can be observed on a VoD (Video on Demand) server. This model basically relies on the dynamic between three distinct populations (i.e., current watchers, past watchers and potential watchers). It also embeds events with very low probability but high impact on its overall behavior corresponding to the occurrence of a flash crowd or the the buzz effect on a VoD server. The steady-state solution to this model has shown that it exhibits a behavior qualitatively close to what can be expected from a real-life VoD server. We have also shown that the workload process as delivered this model satisfies a large deviation principle. Our future work aims at taking advantage of this information to devise a new scheme for allocating available resources in a VoD server.

A comparative study of existing MBAC using real network traces

Participants : Doreid Ammar, Thomas Begin, Isabelle Guérin-Lassous.

We have evaluated the respective performance of several MBACs (Measurement-based admission control) using a realistic framework in which the pattern of the background traffic follows experimental traces collected on real-life networks. This study has allowed to highlight the respective discrepancies between MBACs in terms of easiness to implement and attained performance. This work will now focus on the design of a new MBAC based on a iteratively learned model.

Graph Based Classification of Content and Users in Graph Based Classification of Content and Users in BitTorrent

Participants : Paulo Gonçalves, Marina Sokol.

This is a joint work with Konstantin Avrachenkov (INRIA Maestro) and Arnaud Legout (INRIA Planete).

P2P downloads still represent a large portion of today's Internet traffic. More than 100 million users operate BitTorrent and generate more than 30% of the total Internet traffic. Recently, a significant research effort has been done to develop tools for automatic classification of Internet traffic by application. The purpose of our present work is to provide a framework for sub-classification of P2P traffic generated by the BitTorrent protocol. Unlike previous works, we cannot rely on packet level characteristics and on the standard supervised machine learning methods. The application of the standard supervised machine learning methods is based on the availability of a large set of parameters (packet size, packet inter-arrival time, etc.). Since P2P transfers are based on the same BitTorrent protocol we cannot use this set of parameters to classify P2P content and users. Instead we can make use of the bipartite user-content graph. This is a graph formed by two sets of nodes: the set of users (peers) and the set of contents (downloaded files). From this basic bipartite graph we also construct the user graph, where two users are connected if they download the same content, and the content graph, where two files are connected if they are both downloaded by at least one same user. The general intuition is that the users with similar interests download similar contents. This intuition can be rigorously formalized with the help of graph based semi-supervised learning approach.

Generalized Optimization Framework for Graph-based Semi-supervised Learning

Participants : Paulo Gonçalves, Marina Sokol.

This is a joint work with Konstantin Avrachenkov (INRIA Maestro).

We develop a generalized optimization framework for graph-based semi-supervised learning. The framework gives as particular cases the Standard Laplacian, Normalized Laplacian and PageRank based methods. We have also provided new probabilistic interpretation based on random walks and characterized the limiting behavior of the methods. The random walk based interpretation allows us to explain differences between the performances of methods with different smoothing kernels. It appears that the PageRank based method is robust with respect to the choice of the regularization parameter and the labelled data. We illustrate our theoretical results with two realistic datasets, characterizing different challenges: Les Miserables characters social network and Wikipedia hyper-link graph. The graph-based semi-supervised learning classifies the Wikipedia articles with very good precision and perfect recall employing only the information about the hyper-text links.

On the estimation of the large deviations spectrum

Participant : Paulo Gonçalves.

This is a joint work with Julien Barral (Univ. Paris 13)

We propose an estimation algorithm for large deviations spectra of measures and functions. The algorithm converges for natural examples of multifractals.

Adaptive Multiscale Complexity Analysis of Fetal Heart Rate

Participant : Paulo Gonçalves.

This is a joint work with Patrice Abry (ENS Lyon, CNRS) and Muriel Doret (Hospice civils de Lyon, Univ. Lyon 1)

Per partum fetal asphyxia is a major cause of neonatal morbidity and mortality. Fetal heart rate monitoring plays an important role in early detection of acidosis, an indicator for asphyxia. This problem is addressed in this paper by introducing a novel complexity analysis of fetal heart rate data, based on producing a collection of piecewise linear approximations of varying dimensions from which a measure of complexity is extracted. This procedure specifically accounts for the highly non-stationary context of labor by being adaptive and multiscale. Using a reference dataset, made of real per partum fetal heart rate data, collected in situ and carefully constituted by obstetricians, the behavior of the proposed approach is analyzed and illustrated. Its performance is evaluated in terms of the rate of correct acidosis detection versus the rate of false detection, as well as how early the detection is made. Computational cost is also discussed. The results are shown to be extremely promising and further potential uses of the tool are discussed.