Section: New Results
Machine Learning for an efficient and dynamic management of network resources and services
Machine Learning in Networks
Participants : Nesrine Ben Hassine, Dana Marinca, Pascale Minet.
This work was done in collaboration with Dominique Barth (UVSQ) .
Content Delivery Networks (CDNs) are faced with an increasing and time varying demand of video contents. Their ability to promptly react to this demand is a success factor. Caching helps, but the question is: which contents to cache? We need to know which resources are needed before they are requested. This anticipation is made possible by using prediction computed by learning techniques.
Machine learning techniques can be used to improve the quality of experience for the end users of Content Delivery Networks (CDNs). In a CDN, the most popular video contents are cached near the end-users in order to minimize the contents delivery latency. Classically, machine learning techniques are classified as supervised or unsupervised. In 2017, we addressed two challenges:
-
as a supervised learning, the use of prediction techniques based on regression to evaluate the future popularity of video contents in order to decide which ones should be cached. The popularity of a video content is evaluated by the number of daily requests for this content.
-
as an unsupervised learning, the use of clustering techniques to put together videos with similar features. This clustering will reduce the number of prediction methods, called experts, used to provide an accurate prediction.
Prediction of video content popularity
Participants : Nesrine Ben Hassine, Dana Marinca, Pascale Minet.
This work was done in collaboration with Dominique Barth (UVSQ).
We consider various experts, coming from different fields (e.g. statistics, control theory). To evaluate the accuracy of the experts' popularity predictions, we assess these experts according to three criteria: cumulated loss, maximum instantaneous loss and best ranking. The loss function expresses the discrepancy between the prediction value and the real number of requests. We use real traces extracted from YouTube to compare different prediction methods and determine the best tuning of their parameters. The goal is to find the best trade-off between complexity and accuracy of the prediction methods used.
We also show the importance of a decision maker, called forecaster, that predicts the popularity based on the predictions of a selection of several experts. The forecaster based on the best K experts outperforms in terms of cumulated loss the individual experts' predictions and those of the forecaster based on only one expert, even if this expert varies over time.
The paper presented at the Wireless days 2017 conference ([29] is the result of a joint work done in collaboration with Ruben Milocco (Universidad Nacional Comahue, Buenos Aires, Argentina) and Selma Boumerdassi (CNAM, Paris). We focused on predicting the popularity of video contents using Auto-Regressive Moving Average (ARMA) methods applied on a sliding window. These predictions are used to put the most popular video contents into caches. After having identified the parameters of ARMA experts, we compare them with an expert predicting the same number of requests as the previous day. Results show that ARMA experts improve the accuracy of the predictions. Nevertheless, there is no ARMA model that provides the best prediction for all the video contents over all their lifetime. We combine these statistical experts with a higher level of experts, called forecasters. By combining the experts prediction, some forecasters succeed in predicting more accurate values which helped to increase the hit ratio while keeping a correct update ratio. Hence, improving the accuracy of the predictions succeeds in improving the hit ratio. To summarize, we proposed an original solution combining the predictions of several ARMA models. This solution achieves a better Hit Ratio and a smaller Update Ratio than the classical Least Frequently Used (LFU) caching technique.
Clustering of video contents
Participants : Nesrine Ben Hassine, Pascale Minet.
With regard to video content clustering, we proposed an original solution based on game theory that was presented at the CCNC 2017 conference ([30]. This is a joint work with Mohammed-Amine Koulali (Mohammed I University Oujda, Morocco), Mohammed Erradi (Mohammed I University Rabat, Morocco), Dana Marinca (University of Versailles Saint-Quentin) and Dominique Barth (University of Versailles Saint-Quentin). Game theory is a powerful tool that has recently been used in networks to improve the end users' quality of experience (e.g. decreased response time, higher delivery rate). In this paper, the original idea consists in using game theory in the context of Content Delivery Networks (CDNs) to organize video contents into clusters having similar request profiles. The popularity of each content in the cluster can be determined from the popularity of the representative of the cluster and used to store the most popular contents close to end users. A group of experts and a decision-maker predict the popularity of the representative of the cluster. This considerably reduces the number of experts used. More precisely, we model the clustering problem as a hedonic coalition formation game where the players are the video contents. We proved that this game always converges to a stable partition consisting of different clusters. We determined the best size of the observation window and showed that the play order minimizing the maximum distance to the representative of the cluster is the Rich-to-Poor order, whatever the number of video contents in the interval [20; 200]. The complexity of the coalition game remains very light. Convergence is obtained in a small number of rounds (i.e. less than 35 rounds for 200 video contents). We compare the results of this approach with the clustering obtained by the K-means algorithm, using real traces extracted from YouTube. We also evaluate the complexity of the proposed algorithm. The coalition game outperforms K-means in terms of the average and maximum distances to the representative of the cluster. The execution time is also in favor of the coalition game when the number of contents is higher than or equal to 50. Furthermore, the coalition game can be used to quickly determine the best value of K that is required as an input parameter of the K-means algorithm. Simulation results show that the coalition game provides very good performances.