Neo is an Inria project-team whose members are located in Sophia Antipolis (S. Alouf, K. Avrachenkov,
G. Neglia, S. Perlaza), in Avignon (E. Altman) at Lia (Lab. of Informatics of Avignon) and in Montpellier (A. Jean-Marie)
at Lirmm (Lab. Informatics, Robotics and Microelectronics of Montpellier).
The team is positioned at the intersection of Operations Research and Network Science. By using the tools of Stochastic Operations Research, we model situations arising in several application domains, involving networking in one way or the other. The aim is to understand the rules and the effects in order to influence and control them so as to engineer the creation and the evolution of complex networks.

Stochastic Operations Research is a collection of modeling, optimization and numerical computation techniques, aimed at assessing the behavior of man-made systems driven by random phenomena, and at helping to make decisions in such a context.

The discipline is based on applied probability and focuses on
effective computations and algorithms. Its core theory is that of
Markov chains over discrete state spaces. This family of stochastic
processes has, at the same time, a very large modeling capability and
the potential of efficient solutions. By “solution” is meant the
calculation of some performance metric, usually the
distribution of some random variable of interest, or its average,
variance, etc. This solution is obtained either through exact
“analytic” formulas, or numerically through linear algebra
methods. Even when not analytically or numerically tractable,
Markovian models are always amenable to “Monte-Carlo” simulations
with which the metrics can be statistically measured.

An example of this is the success of classical Queueing Theory,
with its numerous analytical formulas. Another important derived
theory is that of the Markov Decision Processes, which allows to
formalize optimal decision problems in a random environment.
This theory allows to characterize the optimal decisions, and provides
algorithms for calculating them.

Strong trends of Operations Research are: a) an increasing importance of multi-criteria multi-agent optimization, and the correlated introduction of Game Theory in the standard methodology; b) an increasing concern of (deterministic) Operations Research with randomness and risk, and the consequent introduction of topics like Chance Constrained Programming and Stochastic Optimization. Data analysis is also more and more present in Operations Research: techniques from statistics, like filtering and estimation, or Artificial Intelligence like clustering, are coupled with modeling in Machine Learning techniques like Q-Learning.

Network Science is a multidisciplinary body of knowledge, principally concerned with the emergence of global properties in a network of individual agents. These global properties emerge from “local” properties of the network, namely, the way agents interact with each other. The central model of “networks” is the graph (of Graph Theory/Operations Research). Nodes represent the different entities managing information and taking decisions, and links represent the fact that entities interact, or not. Links are usually equipped with a “weight” that measures the intensity of such interaction. Adding evolution rules to this quite elementary representation leads to dynamic network models, the properties of which Network Science tries to analyze.

A classical example of properties sought in networks is the famous “six degrees of separation” (or “small world”) property: how and why does it happen so frequently? Another ubiquitous property of real-life networks is the Zipf or “scale-free” distribution for degrees. Some of these properties, when properly exploited, lead to successful business opportunities: just consider the PageRank algorithm of Google, which miraculously connects the relevance of some Web information with the relevance of the other information that points to it.

In its primary acceptation, Network Science involves little or no
engineering: phenomena are assumed to be “natural” and emerge
without external interventions. However, the idea comes fast to intervene in
order to modify the outcome of the phenomena.
This is where Neo is positioned.
Beyond the mostly descriptive approach of Network Science, we aim at
using the techniques of Operations Research so as to engineer complex
networks.

To quote two examples: controlling the spread of diseases through a “network” of people is of primarily interest for mankind. Similarly, controlling the spread of information or reputation through a social network is of great interest in the Internet. Precisely, given the impact of web visibility on business income, it is tempting (and quite common) to manipulate the graph of the web by adding links so as to drive the PageRank algorithm to a desired outcome.

Another interesting example is the engineering of community structures.
Recently, thousands of papers have been written on the topic of community
detection problem.
In most of the works, the researchers propose methods,
most of the time, heuristics, for detecting communities or dense subgraphs
inside a large network. Much less effort has been put in the understanding
of community formation process and even much less effort has been
dedicated to the question of how one can influence the process of community
formation, e.g. in order to increase overlap among communities and reverse
the fragmentation of the society.

Our ambition for the medium term is to reach an understanding of the behavior of complex networks that will make us capable of influencing or producing a certain property in a given network. For this purpose, we will develop families of models to capture the essential structure, dynamics, and uncertainty of complex networks. The “solution” of these models will provide the correspondence between metrics of interest and model parameters, thus opening the way to the synthesis of effective control techniques.

In the process of tackling real, very large size networks, we increasingly deal with large graph data analysis and the development of decision techniques with low algorithmic complexity, apt at providing answers from large datasets in reasonable time.

marmoteCore is a C++ environment for modeling with Markov chains. It consists in a reduced set of high-level abstractions for constructing state spaces, transition structures and Markov chains (discrete-time and continuous-time). It provides the ability of constructing hierarchies of Markov models, from the most general to the particular, and equip each level with specifically optimized solution methods.

This software was started within the ANR MARMOTE project: ANR-12-MONU-00019.

Prefetching is a basic technique used to reduce the latency of diverse computer services. Deciding what to prefetch amounts to making a compromise between latency and the waste of resources (network bandwidth, storage, energy) if contents are mistakenly prefetched. We have pursued the analysis of the situation where the graph representing the different documents and their links is not completely known in advance. K. Keshava, under the supervision of S. Alouf and A. Jean-Marie, has studied a model where a tree, of depth one or two, is completed randomly after each movement by a uniform branching process. He has identified the optimal prefetching policy for a finite-horizon situation, in the case where the prefetching budget is one or two documents at each round. This optimal policy appears to be nontrivial.

State estimation enables efficient, scalable, and secure operation of power systems. Monitoring and control processes are supported by supervisory control and data acquisition (SCADA) systems and more recently by advanced communication systems that acquire and transmit observations to a state estimator. This cyber-layer exposes the system to malicious attacks that exploit the vulnerabilities of the sensing and communication infrastructure solutions. One of the main threats faced by modern power systems are data injection attacks (DIAs) that alter the state estimate of the operator by compromising the system observations. In 41, S. M. Perlaza, X. Ye, I. Esnaola, and Robert F. Harrison (all from U. of Sheffield, UK) presented sparse attacks that minimize simultaneously the information obtained by the operator and the probability of detection. When the assumption on the sparsity is dropped, S. M. Perlaza, S. Ke and I. Esnaola have studied these attacks in 20. An overview on data injection attacks is presented by the same authors in a book chapter 44.

In the absence of a vaccination or effective medical treatment against the pandemics such as the SARS-CoV-2, the global population must cohabitate with these viruses. For succeeding in this task, different strategies to slow down the outbreak can be implemented, for example, encouraging social distancing, isolation of infected individuals, mobility restrictions, lockdowns, and contact tracing. The main objective is to guarantee that the number of infected individuals that develop critical forms of symptoms does not exceed the capacity of local health care systems. Nonetheless, most of the strategies to slow down the outbreak induce dramatic economical consequences, and thus, public health policies must be designed based on reliable predictions of the evolution of the pandemic to minimize undesired effects on the global economy. For doing so, estimating the values of variables such as the proportion of susceptible, infected and recovered individuals in the population, among other variables, is of paramount importance. In 46, S. M. Perlaza, E. Altman, I. Mounir (CHU de Nice), and F.Z. Najid (CHU d'Amiens) presented a formula for estimating the prevalence ratio of a disease in a population that is tested with imperfect tests. The formula is in terms of the fraction of positive test results and test parameters, i.e., probability of true positives (sensitivity) and the probability of true negatives (specificity).

The marmoteCore platform has been upgraded and moved to a new GitLab development environment,
with the help of Inria's Service Experimentation Development. It is currently being ported
from C++ to Python. It is hosting the new RLGL method for Markov chain developed in the team. Its use for the analysis of
protein-protein interaction networks, via Markov chains with restart,
has been continued in collaboration with the project-team ABS.

Random geometric graphs have become now a popular object of research. Defined rather simply, these graphs describe real networks much better than classical Erdős–Rényi graphs due to their ability to produce tightly connected communities. The

We have pursued our work on epidemiology in two application areas. The first has been to fight malware, e-viruses and e-worms which has been among our central research themes during the 10 last years, and developing tools to fight Covid 2019. Herd immunity, one of the most fundamental concepts in network epidemics, occurs when a large fraction of the population of devices is immune against a virus or malware. The few individuals who have not taken countermeasures against the threat are assumed to have very low chances of infection, as they are indirectly protected by the rest of the devices in the network. Although very fundamental, herd immunity does not account for strategic attackers scanning the network for vulnerable nodes. In face of such attackers, nodes who linger vulnerable in the network become easy targets, compromising cybersecurity. In 19, V. Rufino, D. Menasche and C. Lima from UFRJ, in collaboration with I. Cunha from UFMG, E. Altman, R. El-Azouzi and F. de Pellegrini from Avignon Univ and L. de Aguiar and A. Avritzer from Siemens, M. Grottke from Friedrich-Alexander U., propose an analytical model which allows us to capture the impact of countermeasures against attackers. Their model suggests that nodes should adopt countermeasures even when the remainder of the nodes has already decided to do so.

In 23 K. Avrachenkov together with M. Mironov (MIPT, Russia) consider a graph clustering problem with a given number of clusters and approximate desired sizes of the clusters. One possible motivation for such task could be the problem of databases or servers allocation within several given large computational clusters, where one wants related objects to share the same cluster in order to minimize latency and transaction costs. This task differs from the original community detection problem. To solve this task, the authors adopt some ideas from Glauber Dynamics and the Label Propagation Algorithm. At the same time they consider no additional information about node labels, so the task has the nature of unsupervised learning. They propose an algorithm for this problem, show that it works well for a large set of parameters of Stochastic Block Model (SBM) and theoretically show that its running time complexity for achieving almost exact recovery is of

Due to high interest in many applications, from social networks to blockchain to power grids, deep learning on non-Euclidean objects such as graphs and manifolds, coined Geometric Deep Learning (GDL), continues to gain an ever increasing interest. In 26 K. Avrachenkov together with Y. Chen (SMU, USA) and Y. Gel (U. Texas at Dallas, USA) propose a new Lévy Flights Graph Convolutional Networks (LFGCN) method for semi-supervised learning, which casts the Lévy Flights into random walks on graphs and, as a result, allows both to accurately account for the intrinsic graph topology and to substantially improve classification performance, especially for heterogeneous graphs. Furthermore, they propose a new preferential P-DropEdge method based on the Girvan-Newman argument. That is, in contrast to uniform removing of edges as in DropEdge, following the Girvan-Newman algorithm, we detect network periphery structures using information on edge betweenness and then remove edges according to their betweenness centrality. Their experimental results on semi-supervised node classification tasks demonstrate that the LFGCN coupled with P-DropEdge accelerates the training task, increases stability and further improves predictive accuracy of learned graph topology structure.

The most popular framework for parallel training of machine learning models in a cluster is the (synchronous) parameter server (PS). This paradigm consists of

The client-server architecture is also adopted in federated learning (that is distributed training at the scale of the Internet). The parameter server is often called orchestrator. This approach may be inefficient in cross-silo settings, as close-by data silos with high-speed access links may exchange information between them faster than with the orchestrator, and the orchestrator may become a communication bottleneck. In 33, O. Marfoq, C. Xu, and G. Neglia, together with R. Vidal (Accenture Labs, France) define the problem of topology design for cross-silo federated learning using the theory of max-plus linear systems to compute the system throughput—number of communication rounds per time unit. They also propose practical algorithms that, under the knowledge of measurable network characteristics, find a topology with the largest throughput or with provable throughput guarantees. In realistic Internet networks with 10 Gbps access links for silos, these algorithms speed up training by a factor 9 and 1.5 in comparison to the client-server architecture and to state-of-the-art MATCHA, respectively. Speedups are even larger with slower access links.

The population dynamics for the replicator equation are well studied in continuous time but there is less work that explicitly considers the evolution in discrete time. The discrete-time dynamics can often be justified indirectly, by establishing the relevant evolutionary dynamics for the corresponding continuous-time system, and then appealing to an appropriate approximation property. In 9 K. Avrachenkov together with A. Albrecht, P. Howlett and G. Verma (U. South Australia, Australia) study the discrete-time system directly, and establish basic stability results for the evolution of a population defined by a positive definite system matrix, where the population is disrupted by random perturbations to the genotype distribution either through migration or mutation, in each successive generation. One very interesting conclusion is that the replicator dynamics in discrete time is much more stable than in continuous time.

S. Dhamal (Chalmers U. of Technology, Sweden) and E. Altman, in collaboration with W. Ben-Ameur and T. Chahed from Institut Polytechnique de Paris propose in 15 a setting for two-phase opinion dynamic investment in social networks, where at each step, the final opinion of a node in the first phase acts as its initial biased opinion in the second phase. In this setting, we study the problem of competing camps aiming to maximize adoption of their respective opinions by strategically investing on nodes, where the effectiveness of a camp's investment on a node depends on the node's initial bias. We propose an extension of Friedkin-Johnsen model for our setting, and hence formulate the utility functions of the camps. We show the existence of Nash equilibria under reasonable assumptions, and that they can be computed in polynomial time. Our main conclusion is that, if nodes attribute high weightage to their initial biases, it is advantageous to have a high investment in the first phase, so as to exploit the manipulated biases in the second phase.

B. Toure and S. Paturel have followed in their third year of study in Centrale-Supéléc a 6-hours course for preparation to research. They published their research project 39 on routing games with losses in an International IEEE sponsored conference on wireless communications (WINCOM). Their work under the guidance of E. Altman received the best paper award of the conference. The loss criterion studied here is quite challenging as it is not additive and flow is not conserved. Further results on the structure of the equilibrium are obtained in 28 by E. Altman, M. Datar and G. Ferrat for the parallel link topology.

Cloud operators now offer data storage that can be dynamically configured over short timescales (minutes). In 14 G. Neglia, D. Carra (U. of Verona, Italy), and P. Michiardi (EURECOM, France) continue their work on elastic resource provisioning in the cloud, focusing on in-memory key-value stores used as caches. The goal is to dynamically scale resources to the traffic pattern minimizing the overall cost, which includes not only the storage cost, but also the cost due to misses. In fact, a small variation of the cache miss ratio may have a significant impact on user perceived performance in modern web services, which in turn has an impact on the overall revenues for the content provider using such services. They propose and study a dynamic algorithm for TTL (Time To Live) caches, which is able to obtain close-to-minimal costs. On real-world traces collected from Akamai, they show that the TTL approach is able to track the optimal cache configuration and achieve significant cost savings specially in highly dynamic settings that are likely to require elastic cloud services.

The Miss Ratio Curve (MRC) represents a fundamental tool for cache performance profiling. Approximate methods based on sampling provide a low-complexity solution for MRC construction. In 25, G. Neglia and D. Carra (U. of Verona, Italy) show that, in case of content with a large variance in popularity, the approximate MRC may be highly sensitive to the set of sampled content. They study in detail the impact of content popularity heterogeneity on the accuracy of the approximate MRC and observe that few, highly popular, items may cause large error at the head of the reconstructed MRC. From these observations, they design a new approach for building an approximate MRC, where they combine an exact portion of the MRC with an approximate one built from samples. This new algorithm computes MRC with an error up to 10 times smaller than state-of-the-art methods based on sampling, with similar computational and space overhead.

Many cache eviction policies have been proposed and implemented to improve the hit probability. In 37, G. Neglia, P. Nain (Inria Dante), N. Panigrahy and D. Towsley (UMass at Amherst, USA), propose a new method to compute an upper bound on hit probability for all non-anticipative caching policies, i.e., for policies that have no knowledge of future requests. At each object request arrival, they use hazard rate (HR) function based ordering to classify the request as a hit or not. Under some statistical assumptions, they prove that the proposed HR based ordering model computes the maximum achievable hit probability and serves as an upper bound for all non-anticipative caching policies. In simulation, they find this approach to be almost always tighter than Belady’s bound.

In 5G and beyond network architectures, operators and content providers base their content distribution strategies on Heterogeneous Networks, where macro and small(er) cells are combined to offer better Quality of Service (QoS) to wireless users. On top of such networks, edge caching and Coordinated Multi-Point (CoMP) transmissions are used to further improve performance. The problem of optimally utilizing the cache space in dense and heterogeneous cell networks has been extensively studied under the name of “FemtoCaching”. However, the literature usually assumes relatively simple physical layer (PHY) setups and known or stationary content popularity. In 43, G. I. Ricardo and G. Neglia, together with T. Spyropoulos (EURECOM, France), address these issues by proposing a class of fully distributed and dynamic caching algorithms that take advantage of CoMP capabilities towards minimizing PHY-aware metrics, such as end-to-end (E2E) delay. These policies outperform existing dynamic solutions that are PHY-unaware, under both synthetic and real (non-stationary) request processes, and converge to efficient centralized solutions, in static setups.

Battery dependency is a critical issue when communications systems are deployed in hard-to-reach locations, e.g., remote geographical areas, concrete structures, human bodies, or disaster/war zones. In this case, the lifetime of the electronic devices or even the whole communications system is determined by the battery life. An effective remedy is using energy harvesting technologies. Specifically, energy can be harvested from different ambient sources such as light, vibrations, heat, chemical reactions, physiological processes, or the radio frequency (RF) signals produced by other communications systems. This observation rises the idea of simultaneous information and energy transmission (SIET) via RF. S. U. Zuhra, S. M. Perlaza, and E. Altman have studied the fundamental limits on the rates at which information and energy can be simultaneously transmitted over an additive white Gaussian noise channel. The underlying assumption is that the number of channel input symbols (constellation size) is finite. The main results are mathematical expressions of the achievable and converse information-energy rates as a function of the constellation size, number of channel uses, decoding error probability, and energy-outage probability. As a by product, guidelines for optimal constellation design for SIET are obtained in terms of all real-system implementation parameters.

Providing an upper bound on the minimum decoding error probability (DEP) in point-to-point memoryless channels at a fixed information rate and fixed transmission duration, e.g. number of channel uses, is central in the analysis of communications systems. Nonethless, only lower and upper bounds on the DEP are available in current literature. Moreover, such existing bounds are difficult to calculate as they involve dealing with the tails of cumulative distribution functions (c.d.f.) of n-dimensional random vectors. In 12, S. M. Perlaza, D. Anade (Inria, Maracas), P. Mary (INSA de Rennes), and J.-M. Gorce (Inria, Maracas) have presented an upper bound on the absolute difference between: (a) the cumulative distribution function (c.d.f.) of the sum of a finite number of independent and identically distributed random variables; and (b) a saddlepoint approximation of such c.d.f. This upper bound is general and particularly precise in the regime of large deviations. This result is used to study the DEP in 22.

Macro cell densification with Small Cells (SCs) is an effective solution to cope with traffic increase. To fully benefit from the additional SCs capacity, interference mitigation techniques are needed. Densification in 5G networks with Massive Multiple Input Multiple Output (M-MIMO) deployment needs to rethink interference mitigation to account for highly focused beams and MultiUser (MU) scheduling. M. Masson and E. Altman, in collaboration with Z. Altman (Orange Labs) present in 35 a low complexity collaborative Proportional Fair (PF) based scheduling that maximizes the throughput and improves fairness of the heterogeneous network. The solution is based on the calculation of a loss factor indicator that each SC provides to the macro cell at each scheduling period. These indicators allow the macro cell MU scheduler to efficiently select the set of users for scheduling, leading to a significant improvement in performance. Numerical results illustrate the interest of the collaborative solution.

F. de Pellegrini, F. Faticanti and
D. Siracusa from U. Avignon and FBK in collaboration with
E. Altman and M. Datar, study in
29 the tradeoff between running cost and
processing delay in order to optimally orchestrate multiple
fog applications. Fog applications process batches of objects'
data along chains of containerised microservice modules, which
can run either for free on a local fog server or run in
cloud at a cost. Processor sharing techniques, in turn, affect
the applications' processing delay on a local edge server
depending on the number of application modules running on the
same server. The fog orchestrator copes with local server
congestion by offloading part of computation to the cloud
trading off processing delay for a finite budget. Such
problem can be described in a convex optimisation framework
valid for a large class of processor sharing techniques.
We show that the optimal solution is in
threshold form and depends
solely on the order induced by the marginal delays of

Slicing is emerging as a promising technique to support new differentiated services in 5G networks. It provides the necessary flexibility and scalability associated with future services. To maintain satisfactory services requirements and high profit for service providers, a slice may be designing according to the varying demands and resource availability. This paper develops a framework for resources allocation between slicing and business layer for multi-tenant slicing, e.g. virtual wireless operators, service providers and smart cities services. In 27, M. Datar, C. Touati and E. Altman (INRIA) in collaboration with F. de Pellegrini and R. El-Azouzi (LIA) propose a flexible mechanism based on a bidding scheme for slicing allocation, which achieve desirable fairness and efficiency among the network slices of the different tenants and their associated users. We then design a practical algorithms to realise the proposed desired solution. We also show through simulation the efficiency of our approach in terms of efficiency and fairness.

Advanced Sleep Modes (ASM) correspond to a gradual deactivation of the base station's components according to the time needed by each of them to shut down then reactivate again. Each level of sleep has a different power consumption and imposes an extra delay on arriving traffic as it has to wait for the components to wake up and serve it. F. Ezzahra Salem, A. Gati, Z. Altman (Orange Labs) in collaboration with T. Chahed (Institut Polytechnique de Paris) and E. Altman, present in 38 a scalable management strategy of this feature based on Markov Decision Processes in order to derive the optimal policy allowing to choose the best sleep level according to the traffic load and to the tradeoff between delay and energy consumption while ensuring a low complexity. Our results show that this solution is very promising and allows to achieve high energy saving (up to 91%) if there is no constraint on the delay. Even with a high constraint, the energy reduction can reach up to 52% while the impact on the delay is negligible.

Mobile phones rely on batteries to provide the power needed for transmission and for reception (up and downlink communications). Considering uplink, E. Altman, M. Datar and G. Ferrat analyse in 28 how the characteristics of the battery affect the amount of information that one can draw out from the terminal. We focus in particular on the impact of the charge in the battery on the internal resistance which grows as the battery depletes.

Graphlet counting is a widely-explored problem in network analysis and has been successfully applied to a variety of applications in many domains, most notably bioinformatics, social science and infrastructure network studies. Efficiently computing graphlet counts remains challenging due to the combinatorial explosion, where a naive enumeration algorithm needs convolutional neural network (CNN) models and a series of data preprocessing techniques to solve the GCL problem. Extensive experiments are conducted on three types of synthetic random graphs and three types of real world graphs for all 3,4,5-node graphlets to demonstrate the accuracy, efficiency and generalizability of their framework.

Nowadays, Semi-Supervised Learning (SSL) on citation graph data sets is a rapidly growing area of research. However, the recently proposed graph-based SSL algorithms use a default adjacency matrix with binary weights on edges (citations), that causes a loss of the nodes (papers) similarity information. In 31 M. Kamalov and K. Avrachenkov propose a framework focused on embedding PageRank SSL in a generative model. This framework allows one to do joint training of nodes latent space representation and label spreading through the reweighted adjacency matrix by node similarities in the latent space. They explain that a generative model can improve accuracy and reduce the number of iteration steps for PageRank SSL. Moreover, we show that our framework outperforms the best graph-based SSL algorithms on four public citation graph data sets and improves the interpretability of classification results.

For providing quick and accurate results, a search engine maintains a local snapshot of the entire web. And, to keep this local cache fresh, it employs a crawler for tracking changes across various web pages. However, finite bandwidth availability and server restrictions impose some constraints on the crawling frequency. Consequently, the ideal crawling rates are the ones that maximise the freshness of the local cache and also respect the above constraints. Azar et al. 2018 recently proposed a tractable algorithm to solve this optimisation problem. However, they assume the knowledge of the exact page change rates, which is unrealistic in practice. K. Avrachenkov, K. Patil and G. Thoppe (IISc Bangalore, India) address this issue in 24. Specifically, they provide two novel schemes for online estimation of page change rates. Both schemes only need partial information about the page change process, i.e., the schemes only need to know if the page has changed or not since the last crawled instance. For both these schemes, the authors prove convergence and, also, derive their convergence rates. Finally, they provide some numerical experiments to compare the performance of the estimators they proposed with the existing ones (e.g., MLE).

S. Alouf and A. Jean-Marie have proposed in 10 an efficient forecasting method for solar irradiance. The model is a stochastic process at the minute scale, whose parameters depend on both the period within the day and a category of day (sunny, cloudy, changing...). The categories and the distributions of interest are identified for a given location by clustering observed data. Then, based on the weather forecast for next day, the proper day category is selected and random trajectories can be generated. Experiments show that this new model outperforms several previous proposals when the solar input is used in a datacenter model, consisting of a storage, and an energy consumption corresponding to a real workload.

Neo members are involved in the

Neo has contracts with Accenture (see §8.1.5), Azursoft (see §8.1.6), MyDataModels (see §8.1.7)
and Payback Network (see §8.1.8).

Over the last few years, research in computer science has shifted focus to machine learning methods for the analysis of increasingly large amounts of user data. As the research community has sought to optimize the methods for sparse data and high-dimensional data, more recently new problems have emerged, particularly from a networking perspective that had remained in the periphery.

The technical program of this ADR consists of three parts: Distributed machine learning, Multiobjective optimisation as a lexicographic problem, and Use cases / Applications. We address the challenges related to the first part by developing distributed optimization tools that reduce communication overhead, improve the rate of convergence and are scalable. Graph-theoretic tools including spectral analysis, graph partitioning and clustering will be developed. Further, stochastic approximation methods and D-iterations or their combinations will be applied in designing fast online unsupervised, supervised and semi-supervised learning methods.

A growing number of network infrastructures are being presently considered for a software-based replacement: these range from fixed and wireless access functions to carrier-grade middle boxes and server functionalities. On the one hand, performance requirements of such applications call for an increased level of software optimization and hardware acceleration. On the other hand, customization and modularity at all layers of the protocol stack are required to support such a wide range of functions. In this scope the ADR focuses on two specific research axes: (1) the design, implementation and evaluation of a modular NFV architecture, and (2) the modelling and management of applications as virtualized network functions. Our interest is in low-latency machine learning prediction services and in particular how the quality of the predictions can be traded off with latency.

We shall study asynchronously distributed methods for network centrality computation. The asynchronous distributed methods are very useful because they allow efficient and flexible use of computational resources on the one hand (e.g., using a cluster or a cloud) and on the other hand they allow quick local update of centrality measures without the need to recompute them from scratch.

The considerable extent of the complexity of 5G networks and their operation is in contrast with the increasing demands in terms of simplicity and efficiency. This antagonism highlights the critical importance of network management. Self-Organizing Networks (SON), which cover self-configuration, self-optimization and self-repair, play a central role for 5G Radio Access Network (RAN).

This CIFRE thesis aims at innovating in the field of managing 5G RAN, with a special focus on the features of the SON-5G. Three objectives are identified: a) develop self-organizing features (SON in 5G-RAN), b) develop cognitive managing mechanisms for the SON-5G features developed, and c) demonstrate how do the self-organizing mechanisms fit in the virtual RAN.

Contractor: Accenture Labs

(https://

IoT applications will become one of the main sources to train data-greedy machine learning models. Until now, IoT applications were mostly about collecting data from the physical world and sending them to the Cloud. Google’s federated learning already enables mobile phones, or other devices with limited computing capabilities, to collaboratively learn a machine learning model while keeping all training data locally, decoupling the ability to do machine learning from the need to store the data in the cloud. While Google envisions only users’ devices, it is possible that part of the computation is executed at other intermediate elements in the network. This new paradigm is sometimes referred to as Edge Computing or Fog Computing. Model training as well as serving (provide machine learning predictions) are going to be distributed between IoT devices, cloud services, and other intermediate computing elements like servers close to base stations as envisaged by the Multi-Access Edge Computing framework. The goal of this project is to propose distributed learning schemes for the IoT scenario, taking into account in particular its communication constraints. This 6-month contract prepares a CIFRE.

Intrusion detection or telesurveillance systems generates signals from sensors that allow to raise alarm and start a checking procedure for a potential intrusion or anomaly. Typically, one telesurveillance system surveys many sites and is challenged by a stream of false alarms. In this project, we aim to reduce the rate of false alarms by using various supervised and semi-supervised learning methods.

Variational autoencoders are highly flexible machine learning techniques for learning latent dimension representation. This model is applicable for denoising data as well as for classification purposes. In this thesis we plan to add semi-supervision component to the variational autoencoder techniques. We plan to develop methods which are universally applicable to versatile data such as categorical data, images, texts, etc. Initially starting from static data we aim to extend the methods to time-varying data such as audio, video, time-series, etc. The proposed algorithms can be integrated into the internal engine of MyDataModels company and tested on use cases of MyDataModels.

Consulting with the startup Payback Network on differential privacy techniques.

The Embassy of France in the United States, via the progamme “make our planet great again”, has funded an initiative led by S. M. Perlaza and A. Tajer (RPI, USA) for addressing foundational questions pertinent to two emerging wireless communication technologies: (i) energy harvesting (EH) systems, and (ii) ultra low-latency systems for critical missions. This project explores two strongly symbiotic research directions for establishing the fundamental limits of (i) data transmission and, (ii) simultaneous energy and data transmission, in mission critical systems empowered by EH. The expected results have applications in, e.g., disaster relief, medical instruments, cyber-physical systems, and the Internet of things. This program was launched by the President of France Emmanuel Macron in June 2017 and handled by the Embassy of France in the United States of America. The aim of such fellowships is to reinforce the international engagements of the 2015 Paris Agreement on Climate Change by fostering collaborations between scholars in both US and France.

ANSWER is a joint project between Qwant and Inria, funded by the French Government's initiative PIA “Programme d'Investissement d'Avenir”.

The aim of the ANSWER project is to develop the new version of the search engine http://

Of the five characteristics of big data, the ANSWER project will focus more particularly on the aspects of Velocity in terms of near real-time processing of results, and Variety for the integration of new indicators (emotions, sociality, etc.) and meta-data. The Volume, Value and Veracity aspects will necessarily be addressed jointly with these first ones and will also be the subject of locks, especially on the topics of crawling and indexing.

This registration of the search engine in the Big Data domain will only be reinforced by developments in the Web such as the Web of data, and generally by the current trend to integrate the Web of increasingly diverse, rich and complex resources.

Neo members are reviewers for
Environmental Modeling & Assessment, IEEE/ACM Transactions on Networking, IEEE Transactions on Mobile Computing, IEEE Transactions on Network and Service Management, IEEE Transactions on Information Theory,
IEEE Transactions on Wireless Communications,
IEEE Transactions on Communications,
IEEE Journal on Selected Topics in Signal Processing,
IEEE Journal on Selected Areas in Communications,
and many other journals.

Note: UCA is the Univ Côte d'Azur.

Neo members participated in the Ph.D. committees of (in alphabetical order):