Large distributed infrastructures are rampant in our society. Numerical simulations form the basis of computational sciences and high performance computing infrastructures have become scientific instruments with similar roles as those of test tubes or telescopes. Cloud infrastructures are used by companies in such an intense way that even the shortest outage quickly incurs the loss of several millions of dollars. But every citizen also relies on (and interacts with) such infrastructures via complex wireless mobile embedded devices whose nature is constantly evolving. In this way, the advent of digital miniaturization and interconnection has enabled our homes, power stations, cars and bikes to evolve into smart grids and smart transportation systems that should be optimized to fulfill societal expectations.
Our dependence and intense usage of such gigantic systems obviously leads to very high expectations in terms of performance. Indeed, we strive for low-cost and energy-efficient systems that seamlessly adapt to changing environments that can only be accessed through uncertain measurements. Such digital systems also have to take into account both the users' profile and expectations to efficiently and fairly share resources in an online way. Analyzing, designing and provisioning such systems has thus become a real challenge.
Such systems are characterized by their ever-growing size, intrinsic heterogeneity and distributedness, user-driven requirements, and an unpredictable variability that renders them essentially stochastic. In such contexts, many of the former design and analysis hypotheses (homogeneity, limited hierarchy, omniscient view, optimization carried out by a single entity, open-loop optimization, user outside of the picture) have become obsolete, which calls for radically new approaches. Properly studying such systems requires a drastic rethinking of fundamental aspects regarding the system's observation (measure, trace, methodology, design of experiments), analysis (modeling, simulation, trace analysis and visualization), and optimization (distributed, online, stochastic).
The goal of the POLARIS project is to contribute to the understanding of the performance of very large scale distributed systems by applying ideas from diverse research fields and application domains. We believe that studying all these different aspects at once without restricting to specific systems is the key to push forward our understanding of such challenges and to proposing innovative solutions. This is why we intend to investigate problems arising from application domains as varied as large computing systems, wireless networks, smart grids and transportation systems.
The members of the POLARIS project cover a very wide spectrum of expertise in performance evaluation and models, distributed optimization, and analysis of HPC middleware. Specifically, POLARIS' members have worked extensively on:
Experimental methodology, measuring/monitoring/tracing tools, experiment control, design of experiments, and reproducible research, especially in the context of large computing infrastructures (such as computing grids, HPC, volunteer computing and embedded systems).
Parallel application visualization (paje, triva/viva, framesoc/ocelotl, ...), characterization of failures in large distributed systems, visualization and analysis for geographical information systems, spatio-temporal analysis of media events in RSS flows from newspapers, and others.
Emulation, discrete event simulation, perfect sampling, Markov chains, Monte Carlo methods, and others.
Stochastic approximation, mean field limits, game theory, discrete and continuous optimization, learning and information theory.
In the rest of this document, we describe in detail our new results in the above areas.
Experiments in large scale distributed systems are costly, difficult to control and therefore difficult to reproduce. Although many of these digital systems have been built by men, they have reached such a complexity level that we are no longer able to study them like artificial systems and have to deal with the same kind of experimental issues as natural sciences. The development of a sound experimental methodology for the evaluation of resource management solutions is among the most important ways to cope with the growing complexity of computing environments. Although computing environments come with their own specific challenges, we believe such general observation problems should be addressed by borrowing good practices and techniques developed in many other domains of science.
This research theme builds on a transverse activity on Open science and reproducible research and is organized into the following two directions: (1) Experimental design (2) Smart monitoring and tracing. As we will explain in more detail hereafter, these transverse activity and research directions span several research areas and our goal within the POLARIS project is foremost to transfer original ideas from other domains of science to the distributed and high performance computing community.
As explained in the previous section, the first difficulty encountered when modeling large scale computer systems is to observe these systems and extract information on the behavior of both the architecture, the middleware, the applications, and the users. The second difficulty is to visualize and analyze such multi-level traces to understand how the performance of the application can be improved. While a lot of efforts are put into visualizing scientific data, in comparison little effort have gone into to developing techniques specifically tailored for understanding the behavior of distributed systems. Many visualization tools have been developed by renowned HPC groups since decades (e.g., BSC , Jülich and TU Dresden , , UIUC , , and ANL , Inria Bordeaux and Grenoble , ...) but most of these tools build on the classical information visualization mantra that consists in always first presenting an overview of the data, possibly by plotting everything if computing power allows, and then to allow users to zoom and filter, providing details on demand. However in our context, the amount of data comprised in such traces is several orders of magnitude larger than the number of pixels on a screen and displaying even a small fraction of the trace leads to harmful visualization artifacts . Such traces are typically made of events that occur at very different time and space scales, which unfortunately hinders classical approaches. Such visualization tools have focused on easing interaction and navigation in the trace (through gantcharts, intuitive filters, pie charts and kiviats) but they are very difficult to maintain and evolve and they require some significant experience to identify performance bottlenecks.
Therefore many groups have more recently proposed in combination to these tools some techniques to help identifying the structure of the application or regions (applicative, spatial or temporal) of interest. For example, researchers from the SDSC propose some segment matching techniques based on clustering (Euclidean or Manhattan distance) of start and end dates of the segments that enables to reduce the amount of information to display. Researchers from the BSC use clustering, linear regression and Kriging techniques , , to identify and characterize (in term of performance and resource usage) application phases and present aggregated representations of the trace . Researchers from Jülich and TU Darmstadt have proposed techniques to identify specific communication patterns that incur wait states ,
Evaluating the scalability, robustness, energy consumption and performance of large infrastructures such as exascale platforms and clouds raises severe methodological challenges. The complexity of such platforms mandates empirical evaluation but direct experimentation via an application deployment on a real-world testbed is often limited by the few platforms available at hand and is even sometimes impossible (cost, access, early stages of the infrastructure design, ...). Unlike direct experimentation via an application deployment on a real-world testbed, simulation enables fully repeatable and configurable experiments that can often be conducted quickly for arbitrary hypothetical scenarios. In spite of these promises, current simulation practice is often not conducive to obtaining scientifically sound results. To date, most simulation results in the parallel and distributed computing literature are obtained with simulators that are ad hoc, unavailable, undocumented, and/or no longer maintained. For instance, Naicken et al. point out that out of 125 recent papers they surveyed that study peer-to-peer systems, 52% use simulation and mention a simulator, but 72% of them use a custom simulator. As a result, most published simulation results build on throw-away (short-lived and non validated) simulators that are specifically designed for a particular study, which prevents other researchers from building upon it. There is thus a strong need for recognized simulation frameworks by which simulation results can be reproduced, further analyzed and improved.
The SimGrid simulation toolkit , whose development is partially supported by POLARIS, is specifically designed for studying large scale distributed computing systems. It has already been successfully used for simulation of grid, volunteer computing, HPC, cloud infrastructures and we have constantly invested on the software quality, the scalability and the validity of the underlying network models , . Many simulators of MPI applications have been developed by renowned HPC groups (e.g., at SDSC , BSC , UIUC , Sandia Nat. Lab. , ORNL or ETH Zürich for the most prominent ones). Yet, to scale most of them build on restrictive network and application modeling assumptions that make them difficult to extend to more complex architectures and to applications that do not solely build on the MPI API. Furthermore, simplistic modeling assumptions generally prevent to faithfully predict execution times, which limits the use of simulation to indication of gross trends at best. Our goal is to improve the quality of SimGrid to the point where it can be used effectively on a daily basis by practitioners to reproduce the dynamic of real HPC systems.
We also develop another simulation software, PSI (Perfect SImulator) , , dedicated to the simulation of very large systems that can be modeled as Markov chains. PSI provides a set of simulation kernels for Markov chains specified by events. It allows one to sample stationary distributions through the Perfect Sampling method (pioneered by Propp and Wilson ) or simply to generate trajectories with a forward Monte-Carlo simulation leveraging time parallel simulation (pioneered by Fujimoto , Lin and Lazowska ). One of the strength of the PSI framework is its expressiveness that allows us to easily study networks with finite and infinite capacity queues . Although PSI already allows to simulate very large and complex systems, our main objective is to push its scalability even further and improve its capabilities by one or several orders of magnitude.
Many systems can be effectively described by stochastic population
models. These systems are composed of a set of
This results in the need for approximation techniques. Mean field
analysis offers a viable, and often very accurate, solution for large
Within the POLARIS project, we will continue developing both
the theory behind these approximation techniques and their
applications. Typically, these techniques require a homogeneous
population of objects where the dynamics of the entities depend only
on their state (the state space of each object must not scale with
Game theory is a thriving interdisciplinary field that studies the interactions between competing optimizing agents, be they humans, firms, bacteria, or computers. As such, game-theoretic models have met with remarkable success when applied to complex systems consisting of interdependent components with vastly different (and often conflicting) objectives – ranging from latency minimization in packet-switched networks to throughput maximization and power control in mobile wireless networks.
In the context of large-scale, decentralized systems (the core focus of the POLARIS project), it is more relevant to take an inductive, “bottom-up” approach to game theory, because the components of a large system cannot be assumed to perform the numerical calculations required to solve a very-large-scale optimization problem. In view of this, POLARIS' overarching objective in this area is to develop novel algorithmic frameworks that offer robust performance guarantees when employed by all interacting decision-makers.
A key challenge here is that most of the literature on learning in games has focused on static games with a finite number of actions per player , . While relatively tractable, such games are ill-suited to practical applications where players pick an action from a continuous space or when their payoff functions evolve over time – this being typically the case in our target applications (e.g., routing in packet-switched networks or energy-efficient throughput maximization in wireless). On the other hand, the framework of online convex optimization typically provides worst-case performance bounds on the learner's regret that the agents can attain irrespectively of how their environment varies over time. However, if the agents' environment is determined chiefly by their interactions these bounds are fairly loose, so more sophisticated convergence criteria should be applied.
From an algorithmic standpoint, a further challenge occurs when players can only observe their own payoffs (or a perturbed version thereof). In this bandit-like setting regret-matching or trial-and-error procedures guarantee convergence to an equilibrium in a weak sense in certain classes of games. However, these results apply exclusively to static, finite games: learning in games with continuous action spaces and/or nonlinear payoff functions cannot be studied within this framework. Furthermore, even in the case of finite games, the complexity of the algorithms described above is not known, so it is impossible to decide a priori which algorithmic scheme can be applied to which application.
Supercomputers typically comprise thousands to millions of multi-core CPUs with GPU accelerators interconnected by complex interconnection networks that are typically structured as an intricate hierarchy of network switches. Capacity planning and management of such systems not only raises challenges in term of computing efficiency but also in term of energy consumption. Most legacy (SPMD) applications struggle to benefit from such infrastructure since the slightest failure or load imbalance immediately causes the whole program to stop or at best to waste resources. To scale and handle the stochastic nature of resources, these applications have to rely on dynamic runtimes that schedule computations and communications in an opportunistic way. Such evolution raises challenges not only in terms of programming but also in terms of observation (complexity and dynamicity prevents experiment reproducibility, intrusiveness hinders large scale data collection, ...) and analysis (dynamic and flexible application structures make classical visualization and simulation techniques totally ineffective and require to build on ad hoc information on the application structure).
Considerable interest has arisen from the seminal prediction that the use of multiple-input, multiple-output (MIMO) technologies can lead to substantial gains in information throughput in wireless communications, especially when used at a massive level. In particular, by employing multiple inexpensive service antennas, it is possible to exploit spatial multiplexing in the transmission and reception of radio signals, the only physical limit being the number of antennas that can be deployed on a portable device. As a result, the wireless medium can accommodate greater volumes of data traffic without requiring the reallocation (and subsequent re-regulation) of additional frequency bands. In this context, throughput maximization in the presence of interference by neighboring transmitters leads to games with convex action sets (covariance matrices with trace constraints) and individually concave utility functions (each user's Shannon throughput); developing efficient and distributed optimization protocols for such systems is one of the core objectives of Theme 5.
Another major challenge that occurs here is due to the fact that the efficient physical layer optimization of wireless networks relies on perfect (or close to perfect) channel state information (CSI), on both the uplink and the downlink. Due to the vastly increased computational overhead of this feedback – especially in decentralized, small-cell environments – the ongoing transition to fifth generation (5G) wireless networks is expected to go hand-in-hand with distributed learning and optimization methods that can operate reliably in feedback-starved environments. Accordingly, one of POLARIS' application-driven goals will be to leverage the algorithmic output of Theme 5 into a highly adaptive resource allocation framework for next-géneration wireless systems that can effectively "learn in the dark", without requiring crippling amounts of feedback.
Smart urban transport systems and smart grids are two examples of collective adaptive systems. They consist of a large number of heterogeneous entities with decentralised control and varying degrees of complex autonomous behaviour. We develop an analysis tools to help to reason about such systems. Our work relies on tools from fluid and mean-field approximation to build decentralized algorithms that solve complex optimization problems. We focus on two problems: decentralized control of electric grids and capacity planning in vehicle-sharing systems to improve load balancing.
Social computing systems are online digital systems that use personal data of their users at their core to deliver personalized services directly to the users. They are omnipresent and include for instance recommendation systems, social networks, online medias, daily apps, etc. Despite their interest and utility for users, these systems pose critical challenges of privacy, security, transparency, and respect of certain ethical constraints such as fairness. Solving these challenges involves a mix of measurement and/or audit to understand and assess issues, and modeling and optimization to propose and calibrate solutions.
N. Gast received an ANR JCJC grant.
The team was highly involved in the 3IA institute MIAI @ Grenoble Alpes: P. Loiseau is co-holder of the chair “Explainable and Responsible AI” of which N. Gast and B. Pradelski are members; and P. Mertikopoulos is a member of the chair “Optimization & Learning”.
Arnaud Legrand participated in the writing of a book on Reproducible Research, which aims at helping students and engineers and researchers to find efficient and accessible ways leading them to improve their reproducible research practices.
The paper “Privacy Risks with Facebook's PII-based Targeting: Auditing a Data Broker's Advertising Interface” by P. Loiseau and co-authors (IEEE S&P '18) was runner up for the 2019 Caspar Bowden Award for Outstanding Research in Privacy Enhancing Technologies.
The paper “Investigating ad transparency mechanisms in social media: A case study of Facebook's explanations” by P. Loiseau and co-authors (NDSS '18) was runner up for the 2019 CNIL-Inria Award for Privacy Protection.
Functional Description: Framesoc is the core software infrastructure of the SoC-Trace project. It provides a graphical user environment for execution-trace analysis, featuring interactive analysis views as Gantt charts or statistics views. It provides also a software library to store generic trace data, play with them, and build other analysis tools (e.g., Ocelotl).
Participants: Arnaud Legrand and Jean-Marc Vincent
Contact: Guillaume Huard
Functional Description: GameSeer is a tool for students and researchers in game theory that uses Mathematica to generate phase portraits for normal form games under a variety of (user-customizable) evolutionary dynamics. The whole point behind GameSeer is to provide a dynamic graphical interface that allows the user to employ Mathematica's vast numerical capabilities from a simple and intuitive front-end. So, even if you've never used Mathematica before, you should be able to generate fully editable and customizable portraits quickly and painlessly.
Contact: Panayotis Mertikopoulos
Markov Modeling Tools and Environments - the Core
Keywords: Modeling - Stochastic models - Markov model
Functional Description: marmoteCore is a C++ environment for modeling with Markov chains. It consists in a reduced set of high-level abstractions for constructing state spaces, transition structures and Markov chains (discrete-time and continuous-time). It provides the ability of constructing hierarchies of Markov models, from the most general to the particular, and equip each level with specifically optimized solution methods.
This software is developed within the ANR MARMOTE project: ANR-12-MONU-00019.
Participants: Alain Jean-Marie, Hlib Mykhailenko, Benjamin Briot, Franck Quessette, Issam Rabhi, Jean-Marc Vincent and Jean-Michel Fourneau
Partner: UVSQ
Contact: Alain Jean-Marie
Publications: marmoteCore: a Markov Modeling Platform - marmoteCore: a software platform for Markov modeling
Memory Organisation Cartography and Analysis
Keywords: High-Performance Computing - Performance analysis
Contact: David Beniamine
Multidimensional Overviews for Huge Trace Analysis
Functional Description: Ocelotl is an innovative visualization tool, which provides overviews for execution trace analysis by using a data aggregation technique. This technique enables to find anomalies in huge traces containing up to several billions of events, while keeping a fast computation time and providing a simple representation that does not overload the user.
Participants: Arnaud Legrand and Jean-Marc Vincent
Contact: Jean-Marc Vincent
Perfect Simulator
Functional Description: Perfect simulator is a simulation software of markovian models. It is able to simulate discrete and continuous time models to provide a perfect sampling of the stationary distribution or directly a sampling of functional of this distribution by using coupling from the past. The simulation kernel is based on the CFTP algorithm, and the internal simulation of transitions on the Aliasing method.
Contact: Jean-Marc Vincent
Keywords: Large-scale Emulators - Grid Computing - Distributed Applications
Scientific Description: SimGrid is a toolkit that provides core functionalities for the simulation of distributed applications in heterogeneous distributed environments. The simulation engine uses algorithmic and implementation techniques toward the fast simulation of large systems on a single machine. The models are theoretically grounded and experimentally validated. The results are reproducible, enabling better scientific practices.
Its models of networks, cpus and disks are adapted to (Data)Grids, P2P, Clouds, Clusters and HPC, allowing multi-domain studies. It can be used either to simulate algorithms and prototypes of applications, or to emulate real MPI applications through the virtualization of their communication, or to formally assess algorithms and applications that can run in the framework.
The formal verification module explores all possible message interleavings in the application, searching for states violating the provided properties. We recently added the ability to assess liveness properties over arbitrary and legacy codes, thanks to a system-level introspection tool that provides a finely detailed view of the running application to the model checker. This can for example be leveraged to verify both safety or liveness properties, on arbitrary MPI code written in C/C++/Fortran.
News Of The Year: There were 3 major releases in 2019: Python bindings were introduced, SMPI now partially supports some of the MPI/IO functions, a new model for Wifi networks was proposed, and the API for the simulation of storage resources was completely revisited. We also pursued our efforts to improve the documentation of the software, simplified the web site, and made a lot of bug fixing and code refactoring.
Participants: Adrien Lèbre, Arnaud Legrand, Augustin Degomme, Florence Perronnin, Frédéric Suter, Jean-Marc Vincent, Jonathan Pastor, Luka Stanisic and Martin Quinson
Partners: CNRS - ENS Rennes
Contact: Martin Quinson
URL: https://
Tool for Analyzing the Behavior of Applications Running on NUMA ArChitecture
Keywords: High-Performance Computing - Performance analysis - NUMA
Contact: David Beniamine
Performance engineering of scientific HPC applications requires to measure repeatedly the performance of applications or of computation kernels, which consume a large amount of time and resources. It is essential to design experiments so as to reduce this cost as much as possible. Our contribution along this axis is twofold: (1) the investigation sound exploration techniques and (2) the control of experiments to ensure the measurements are as representative as possible of real workload.
Writing, porting, and optimizing scientific applications makes
autotuning techniques fundamental to lower the cost of leveraging the
improvements on execution time and power consumption provided by the
latest software and hardware platforms. Despite the need for economy,
most autotuning techniques still require large budgets of costly
experimental measurements to provide good results, while rarely
providing exploitable knowledge after
optimization. In , we investigate the use of
Design
of Experiments to propose a user-transparent autotuning technique that
operates under tight budget constraints by significantly reducing the
measurements needed to find good optimizations. Our approach enables
users to make informed decisions on which optimizations to pursue and
when to stop. We present an experimental evaluation of our approach
and show it is capable of leveraging user decisions to find the best
global configuration of a GPU Laplacian kernel using half of the
measurement budget used by other common autotuning techniques. We show
that our approach is also capable of finding speedups of up to
Our second contribution is related to the control of measurements. In , we relate a surprising observation on the performance of the highly optimized and regular DGEMM function on modern processors. The DGEMM function is a widely used implementation of the matrix product. While the asymptotic complexity of the algorithm only depends on the sizes of the matrices, we show that the performance is significantly impacted by the matrices content. Although it would be expected that special values like 1 or 0 may yield to specific behevior, we show that arbitrary constant values are no different and that random values incur a significant performance drop. Our experiments show that this may be due to bit flips in the CPU causing an energy consumption overhead. Such phenomenon reminds the importance of thoroughly randomizing every single parameter of experiments to avoid bias toward specific behavior.
Finely tuning MPI applications (number of processes, granularity, collective operation algorithms, topology and process placement) is critical to obtain good performance on supercomputers. With a rising cost of modern supercomputers, running parallel applications at scale solely to optimize their performance is extremely expensive. Using SimGrid, we work toward providing a methodology allowing to provide inexpensive but faithful predictions of expected performance.
The methodology we propose relies on SimGrid/SMPI and captures the complexity of adaptive applications by emulating the MPI code while skipping insignificant parts. In we demonstrate its capability with High Performance Linpack (HPL), the benchmark used to rank supercomputers in the TOP500 and which requires a careful tuning. We explain (1) how we both extended the SimGrid's SMPI simulator and slightly modified the open-source version of HPL to allow a fast emulation on a single commodity server at the scale of a supercomputer and (2) how to model the different components (network, BLAS, ...) of the system. We show that a careful modeling of both spatial and temporal node variability allows us to obtain predictions within a few percents of real experiments. The modeling of BLAS operations is particularly important and we have thus started investigating in the context of simulating a sparse direct solver how to automatically performance models for commonly used BLAS kernels . A key difficulty remains the acquisition of faithful performance measurements as modern processors are often quite unstable. This effort is therefore particularly related to the aforementioned "Design of Experiments" line of research.
In , we present ASGriDS, an asynchronous Smart Grid simulation framework. ASGriDS is multi-domain, it simultaneously models the power network along with its physical loads/generators, controllers, and communication infrastructure. ASGriDS provides a unified workflow in a pythonic environment, to describe, run and control complex SmartGrid deployment scenarios. ASGriDS is an event-driven simulator that can run in either real-time or accelerated real-time. As it is modular and its components interact asynchronously, it can run either locally on a distributed infrastructure, also in hardware-in-the- loop setups, and on top of emulated/physical communication links. In this paper, we present the design of our simulator and we demonstrate its use with a generation control problem on a low voltage network. We use ASGriDS to deploy a real-time controller based on optimal power flow, on top of TCP and UDP based communication network, under various packet loss conditions.
Despite the impressive growth and size of super-computers, the computational power they provide still cannot match the demand. Efficient and fair resource allocation is a critical task. Super-computers use Resource and Job Management Systems to schedule applications, which is generally done by relying on generic index policies such as First Come First Served and Shortest Processing time First in combination with Backfilling strategies. Unfortunately, such generic policies often fail to exploit specific characteristics of real workloads.
In , we focus on improving the performance of online schedulers by studying mixed policies, which are created by combining multiple job characteristics in a weighted linear expression, as opposed to classical pure policies which use only a single characteristic. This larger class of scheduling policies aims at providing more flexibility and adaptability. We use space coverage and black-box optimization techniques to explore this new space of mixed policies and we study how can they adapt to the changes in the workload. We perform an extensive experimental campaign through which we show that (1) the best pure policy is far from optimal and that (2) using a carefully tuned mixed policy would allow to significantly improve the performance of the system. (3) We also provide empirical evidence that there is no one size fits all policy, by showing that the rapid workload evolution seems to prevent classical online learning algorithms from being effective.
A careful investigation of why such mixed strategy fail to globally exploit weekly workload features reveal that some users sometimes provide widely inaccurate information, which dramatically fools the batch scheduling heuristic. Indeed, users typically provide a loose upper bound estimate for job execution times that are hardly useful. Previous studies attempted to improve these estimates using regression techniques. Although these attempts provide reasonable predictions, they require a long period of training data. Furthermore, aiming for perfect prediction may be of limited use for scheduling purposes. In , we propose a simpler approach by classifying jobs as small or large and prioritizing the execution of small jobs over large ones. Indeed, small jobs are the most impacted by queuing delays but they typically represent a light load and incur a small burden on the other jobs. The classifier operates online and learns by using data collected over the previous weeks, facilitating its deployment and enabling fast adaptations to changes in workload characteristics. We evaluate our approach using four scheduling policies on six HPC platform workload traces. We show that: (i) incorporating such classification significantly reduces the average bounded slowdown of jobs in all scenarios, and (ii) the obtained improvements are comparable, in most scenarios, to the ideal hypothetical situation where the scheduler would know the exact running time of jobs in advance.
In distributed systems, load balancing is a powerful concept to improve the distribution of jobs across multiple computing resources and to control performance metrics such as delays and throughputs while avoiding the overload of any single resource. This section describes three contributions:
In multi-server distributed queueing systems, the access of stochastically arriving jobs to resources is often regulated by a dispatcher, also known as load balancer. A fundamental problem consists in designing a load balancing algorithm that minimizes the delays experienced by jobs. During the last two decades, the power-of-
When dispatching jobs to parallel servers, or queues, the highly scalable round-robin (RR) scheme reduces the variance of interarrival times at all queues to a great extent but has no impact on the variances of service processes. Contrariwise, size-interval task assignment (SITA) routing has little impact on the variances of interarrival times but makes the service processes as deterministic as possible. In , we unify both 'static' approaches to design a scalable load balancing framework able to control the variances of the arrival and service processes jointly. It turns out that the resulting combination significantly improves performance and is able to drive the mean job delay to zero in the large-system limit; it is known that this property is not achieved when both approaches are considered separately. Within realistic parameters, we show that the optimal number of size intervals that partition the support of the job size distribution is small with respect to the system size. This enhances the applicability of the proposed load balancing scheme at a large scale. In fact, we find that adding a little bit of information about job sizes to a dispatcher operating under RR improves performance a lot. Under the optimal scaling of size intervals and assuming highly variable job sizes, numerical simulations indicate that the proposed algorithm is competitive with the (less scalable) join-the-shortest-workload algorithm even when the system size grows large.
Size-based routing provides robust strategies to improve the performance of computer and communication systems with highly variable workloads because it is able to isolate small jobs from large ones in a static manner. The basic idea is that each server is assigned all jobs whose sizes belong to a distinct and continuous interval. In the literature, dispatching rules of this type are referred to as SITA (Size Interval Task Assignment) policies. Though their evident benefits, the problem of finding a SITA policy that minimizes the overall mean (steady-state) waiting time is known to be intractable. In particular it is not clear when it is preferable to balance or unbalance server loads and, in the latter case, how. In , we provide an answer to these questions in the celebrated limiting regime where the system capacity grows linearly with the system demand to infinity. Within this framework, we prove that the minimum mean waiting time achievable by a SITA policy necessarily converges to the mean waiting time achieved by SITA-E, the SITA policy that equalizes server loads, provided that servers are homogeneous. However, within the set of SITA policies we also show that SITA-E can perform arbitrarily bad if servers are heterogeneous. In this case we prove that there exist exactly C! asymptotically optimal policies, where C denotes the number of server types, and all of them are linked to the solution of a single strictly convex optimization problem. It turns out that the mean waiting time achieved by any of such asymptotically optimal policies does not depend on how job-size intervals are mapped to servers. Our theoretical results are validated by numerical simulations with respect to realistic parameters and suggest that the above insights are also accurate in small systems composed of a few servers, i.e., ten.
To this day, the Internet of Things (IoT) continues its explosive growth. Nevertheless, with the exceptional evolution of traffic demand, existing infrastructures are struggling to resist. In this context, Fog computing is shaping the future of IoT applications. It offers nearby computational, networking and storage resources to respond to the stringent requirements of these applications. However, despite its several advantages, Fog computing raises new challenges which slow its adoption down. Hence, there is a lack of practical solutions to enable the exploitation of this novel concept.
In , we propose FITOR, an orchestration system for IoT applications in the Fog environment. This solution builds a realistic Fog environment while offering efficient orchestration mechanisms. In order to optimize the provisioning of Fog-Enabled IoT applications, FITOR relies on O-FSP, an optimized fog service provisioning strategy which aims to minimize the provisioning cost of IoT applications, while meeting their requirements. Based on extensive experiments, the results obtained show that O-FSP optimizes the placement of IoT applications and outperforms the related strategies in terms of i) provisioning cost ii) resource usage and iii) acceptance rate. In , we propose a novel strategy, which we call GO-FSP and which optimizes the placement of IoT application components while coping with their strict performance requirements. To do so, we first propose an Integer Linear Programming (ILP) formulation for the IoT application provisioning problem. The latter targets to minimize the deployment cost while ensuring a load balancing between heterogeneous devices. Then, a GRASP-based approach is proposed to achieve the aforementioned objectives. Finally, we make use of the FITOR orchestration system to evaluate the performance of our solution under real conditions. Obtained results show that our scheme outperforms the related strategies. We are currently comparing such strategy with other strategies based on online learning mechanisms under various information scenarios (delayed and noisy feedback, inaccurate application load information, etc.).
Last, fog computing also extends the capacities of the cloud to the edge of the network, near the physical world, so that Internet of Things (IoT) applications can benefit from properties such as short delays, real-time and privacy. Unfortunately, devices in the Fog-IoT environment are usually unstable and prone to failures. In this context, the consequences of failures may impact the physical world and can, therefore, be critical. In , we present a framework for end-to-end resilience of Fog-IoT applications. The framework was implemented and experimented on a smart home testbed.
We are actively promoting better research practices, in particular in term of research reproducibility and contribution recognition. Our contribution this year is threefold
First, we have participated to the writing of a book introducing reproducible research . For a researcher, there is nothing more frustrating than the failure to reproduce major results obtained a few months back. The causes of such disappointments can be multiple and insidious. This phenomenon plays an important role in the so-called "research reproducibility crisis". This book takes a current perspective onto a number of potentially dangerous situations and practices, to examplify and highlight the symptoms of non-reproducibility in research. Each time, it provides efficient solutions ranging from good-practices that are easily and immediately implementable to more technical tools, all of which are free and have been put to the test by the authors themselves. Students and engineers and researchers should find efficient and accessible ways leading them to improve their reproducible research practices.
Second, to allow students and engineers and researchers to receive proper training in reproducible research, we have run the second session of the Mooc "Reproducible research: Methodological principles for a transparent science" on the FUN platform from April, 1 to June, 13 2019. This MOOC allows scientists to learn modern and reliable tools such as Markdown for taking structured notes, Desktop search applications, GitLab for version control and collaborative working, and Computational notebooks (Jupyter, RStudio, and Org-Mode) for efficiently combining the computation, presentation, and analysis of data. More than 2,100 persons registered to this session and we are currently working on a third session which is expected to start in the beginning of the year 2020.
Third, software is a fundamental pillar of modern scientific research, not only in computer science, but actually across all fields and disciplines. However, there is a lack of adequate means to cite and reference software, for many reasons. An obvious first reason is software authorship, which can range from a single developer to a whole team, and can even vary in time. The panorama is even more complex than that, because many roles can be involved in software development: software architect, coder, debugger, tester, team manager, and so on. Arguably, the researchers who have invented the key algorithms underlying the software can also claim a part of the authorship. And there are many other reasons that make this issue complex. We provide in a contribution to the ongoing efforts to develop proper guidelines and recommendations for software citation, building upon the internal experience of Inria, the French research institute for digital sciences. As a central contribution, we make three key recommendations. (1) We propose a richer taxonomy for software contributions with a qualitative scale. (2) We claim that it is essential to put the human at the heart of the evaluation. And (3) we propose to distinguish citation from reference which is particularly important in the context of reproducible research.
In , we consider mean field games with discrete state spaces (called discrete mean field games in the following) and we analyze these games in continuous and discrete time, over finite as well as infinite time horizons. We prove the existence of a mean field equilibrium assuming continuity of the cost and of the drift. These conditions are more general than the existing papers studying finite state space mean field games. Besides, we also study the convergence of the equilibria of N -player games to mean field equilibria in our four settings. On the one hand, we define a class of strategies in which any sequence of equilibria of the finite games converges weakly to a mean field equilibrium when the number of players goes to infinity. On the other hand, we exhibit equilibria outside this class that do not converge to mean field equilibria and for which the value of the game does not converge. In discrete time this non-convergence phenomenon implies that the Folk theorem does not scale to the mean field limit.
In , we consider a class of nonlinear systems of differential equations with uncertainties, i.e., with lack of knowledge in some of the parameters that is represented by a time-varying unknown bounded functions. An under-approximation of such systems consists of a subset of its reachable set, for any value of the unknown parameters. By relying on optimal control theory through Pontryagin's principle, we provide an algorithm for the under-approximation of a linear combination of the state variables in terms of a fully automated tool-chain named UTOPIC. This allows to establish tight under-approximations of common benchmarks models with dimensions as large as sixty-five.
This section describes four contributions on energy and network optimization.
One of the key challenges in Internet of Things (IoT) networks is to connect many different types of autonomous devices while reducing their individual power consumption. This problem is exacerbated by two main factors: first, the fact that these devices operate in and give rise to a highly dynamic and unpredictable environment where existing solutions (e.g., water-filling algorithms) are no longer relevant; and second, the lack of sufficient information at the device end. To address these issues, we propose a regret-based formulation that accounts for arbitrary network dynamics: this allows us to derive an online power control scheme that is provably capable of adapting to such changes, while relying solely on strictly causal feedback. In so doing, we identify an important tradeoff between the amount of feedback available at the transmitter side and the resulting system performance: if the device has access to unbiased gradient observations, the algorithm's regret after
Many businesses possess a small infrastructure that they can use for their computing tasks, but also often buy extra computing resources from clouds. Cloud vendors such as Amazon EC2 offer two types of purchase options: on-demand and spot instances. As tenants have limited budgets to satisfy their computing needs, it is crucial for them to determine how to purchase different options and utilize them (in addition to possible self-owned instances) in a cost-effective manner while respecting their response-time targets. In this paper, we propose a framework to design policies to allocate self-owned, on-demand and spot instances to arriving jobs. In particular, we propose a near-optimal policy to determine the number of self-owned instances and an optimal policy to determine the number of on-demand instances to buy and the number of spot instances to bid for at each time unit. Our policies rely on a small number of parameters and we use an online learning technique to infer their optimal values. Through numerical simulations, we show the effectiveness of our proposed policies, in particular that they achieve a cost reduction of up to 64.51% when spot and on-demand instances are considered and of up to 43.74% when self-owned instances are considered, compared to previously proposed or intuitive policies. This contribution appeared in .
In , we consider the classical problem of minimizing offline the total energy consumption
required to execute a set of n real-time jobs on a single processor with varying speed. Each
real-time job is defined by its release time, size, and deadline (all integers). The goal is
to find a sequence of processor speeds, chosen among a finite set of available speeds, such
that no job misses its deadline and the energy consumption is minimal. Such a sequence
is called an optimal speed schedule. We propose a linear time algorithm that checks the
schedulability of the given set of n jobs and computes an optimal speed schedule. The
time complexity of our algorithm is in
Network utility maximization (NUM) is an iconic problem in network traffic management which is at the core of many current and emerging network design paradigms - and, in particular, software-defined networks (SDNs). Thus, given the exponential growth of modern-day networks (in both size and complexity), it is crucial to develop scalable algorithmic tools that are capable of providing efficient solutions in time which is dimension-free, i.e., independent-or nearly-independent-on the size of the system. To do so, we leverage a suite of modified gradient methods known as “mirror descent” and we derive a scalable and efficient algorithm for the NUM problem based on gradient exponentiation. We show that the convergence speed of the proposed algorithm only carries a logarithmic dependence on the size of the network, so it can be implemented reliably and efficiently in massively large networks where traditional gradient methods are prohibitively slow. These theoretical results are sub-sequently validated by extensive numerical simulations showing an improvement of several order of magnitudes over standard gradient methods in large-scale networks. This contribution appeared in .
In the DNS resolution process, packet losses and ensuing retransmission timeouts induce marked latencies: the current UDP-based resolution process takes up to 5 seconds to detect a loss event. In , , we find that persistent DNS connections based on TCP or TLS can provide an elegant solution to this problem. With controlled experiments on a testbed, we show that persistent DNS connections significantly reduces worst-case latency. We then leverage a large-scale platform to study the performance impact of TCP/TLS on recursive resolvers. We find that off-the-shelf software and reasonably powerful hardware can effectively provide recursive DNS service over TCP and TLS, with a manageable performance hit compared to UDP.
This section describes four contributions on privacy, fairness and transparency in online social medias
The Facebook advertising platform has been subject to a number of controversies in the past years regarding privacy violations, lack of transparency, as well as its capacity to be used by dishonest actors for discrimination or propaganda. In this study, we aim to provide a better understanding of the Facebook advertising ecosystem, focusing on how it is being used by advertisers. We first analyze the set of advertisers and then investigate how those advertisers are targeting users and customizing ads via the platform. Our analysis is based on the data we collected from over 600 real-world users via a browser extension that collects the ads our users receive when they browse their Facebook timeline, as well as the explanations for why users received these ads. Our results reveal that users are targeted by a wide range of advertisers (e.g., from popular to niche advertisers); that a non-negligible fraction of advertisers are part of potentially sensitive categories such as news and politics, health or religion; that a significant number of advertisers employ targeting strategies that could be either invasive or opaque; and that many advertisers use a variety of targeting parameters and ad texts. Overall, our work emphasizes the need for better mechanisms to audit ads and advertisers in social media and provides an overview of the platform usage that can help move towards such mechanisms.
To help their users to discover important items at a particular time, major websites like Twitter, Yelp, TripAdvisor or NYTimes provide Top-K recommendations (e.g., 10 Trending Topics, Top 5 Hotels in Paris or 10 Most Viewed News Stories), which rely on crowd-sourced popularity signals to select the items. However, diferent sections of a crowd may have diferent preferences, and there is a large silent majority who do not explicitly express their opinion. Also, the crowd often consists of actors like bots, spammers, or people running orchestrated campaigns. Recommendation algorithms today largely do not consider such nuances, hence are vulnerable to strategic manipulation by small but hyper-active user groups. To fairly aggregate the preferences of all users while recommending top-K items, we borrow ideas from prior research on social choice theory, and identify a voting mechanism called Single Trans-ferable Vote (STV) as having many of the fairness properties we desire in top-K item (s)elections. We develop an innovative mechanism to attribute preferences of silent majority which also make STV completely operational. We show the generalizability of our approach by implementing it on two diferent real-world datasets. Through extensive experimentation and comparison with state-of-the-art techniques, we show that our proposed approach provides maximum user satisfaction, and cuts down drastically on items disliked by most but hyper-actively promoted by a few users.
The rise of algorithmic decision making led to active researches on how to define and guarantee fairness, mostly focusing on one-shot decision making. In several important applications such as hiring, however, decisions are made in multiple stage with additional information at each stage. In such cases, fairness issues remain poorly understood. In this paper we study fairness in k-stage selection problems where additional features are observed at every stage. We first introduce two fairness notions, local (per stage) and global (final stage) fairness, that extend the classical fairness notions to the k-stage setting. We propose a simple model based on a probabilistic formulation and show that the locally and globally fair selections that maximize precision can be computed via a linear program. We then define the price of local fairness to measure the loss of precision induced by local constraints; and investigate theoretically and empirically this quantity. In particular, our experiments show that the price of local fairness is generally smaller when the sensitive attribute is observed at the first stage; but globally fair selections are more locally fair when the sensitive attribute is observed at the second stage—hence in both cases it is often possible to have a selection that has a small price of local fairness and is close to locally fair.
Most social platforms offer mechanisms allowing users to delete their posts, and a significant fraction of users exercise this right to be forgotten. However, ironically, users' attempt to reduce attention to sensitive posts via deletion, in practice, attracts unwanted attention from stalkers specifically to those (deleted) posts. Thus, deletions may leave users more vulnerable to attacks on their privacy in general. Users hoping to make their posts forgotten face a "damned if I do, damned if I don't" dilemma. Many are shifting towards ephemeral social platform like Snapchat, which will deprive us of important user-data archival. In the form of intermittent withdrawals, we present, Lethe, a novel solution to this problem of (really) forgetting the forgotten. If the next-generation social platforms are willing to give up the uninterrupted availability of non-deleted posts by a very small fraction, Lethe provides privacy to the deleted posts over long durations. In presence of Lethe, an adversarial observer becomes unsure if some posts are permanently deleted or just temporarily withdrawn by Lethe; at the same time, the adversarial observer is overwhelmed by a large number of falsely flagged un-deleted posts. To demonstrate the feasibility and performance of Lethe, we analyze large-scale real data about users' deletion over Twitter and thoroughly investigate how to choose time duration distributions for alternating between temporary withdrawals and resurrections of non-deleted posts. We find a favorable trade-off between privacy, availability and adversarial overhead in different settings for users exercising their right to delete. We show that, even against an ultimate adversary with an uninterrupted access to the entire platform, Lethe offers deletion privacy for up to 3 months from the time of deletion, while maintaining content availability as high as 95% and keeping the adversarial precision to 20%.
This section describes six contributions on optimization.
In , we propose an interior-point method for linearly
constrained – and possibly nonconvex – optimization problems. The proposed
method – which we call the Hessian barrier algorithm (HBA) – combines a
forward Euler discretization of Hessian Riemannian gradient flows with an
Armijo backtracking step-size policy. In this way, HBA can be seen as an
alternative to mirror descent (MD), and contains as special cases the affine
scaling algorithm, regularized Newton processes, and several other iterative
solution methods. Our main result is that, modulo a non-degeneracy condition,
the algorithm converges to the problem’s critical set; hence, in the convex case,
the algorithm converges globally to the problem’s minimum set. In the case of
linearly constrained quadratic programs (not necessarily convex), we also show
that the method's convergence rate is
In,
Lipschitz continuity is a central requirement for achieving the optimal
In Variational inequalities have recently attracted considerable interest in machine learning as a flexible paradigm for models that go beyond ordinary loss function minimization (such as generative adversarial networks and related deep learning systems). In this setting, the optimal O(1/t) convergence rate for solving smooth monotone variational inequalities is achieved by the Extra-Gradient (EG) algorithm and its variants. Aiming to alleviate the cost of an extra gradient step per iteration (which can become quite substantial in deep learning applications), several algorithms have been proposed as surrogates to Extra-Gradient with a single oracle call per iteration. In this paper, we develop a synthetic view of such algorithms, and we complement the existing literature by showing that they retain a O(1/t) ergodic convergence rate in smooth, deterministic problems. Subsequently, beyond the monotone deterministic case, we also show that the last iterate of single-call, stochastic extra-gradient methods still enjoys a O(1/t) local convergence rate to solutions of non-monotone variational inequalities that satisfy a second-order sufficient condition.
In ,
we study a class of online convex optimization problems with long-term budget constraints that arise naturally as reliability guarantees or total consumption constraints. In this general setting, prior work by Mannor et al. (2009) has shown that achieving no regret is impossible if the functions defining the agent’s budget are chosen by an adversary. To overcome this obstacle, we refine the agent's regret metric by introducing the notion of a “
In , owing to their connection with generative adversarial networks (GANs), saddle-point problems have recently attracted considerable interest in machine learning and beyond. By necessity, most theoretical guarantees revolve around convex-concave (or even linear) problems; however, making theoretical inroads towards efficient GAN training depends crucially on moving beyond this classic framework. To make piecemeal progress along these lines, we analyze the behavior of mirror descent (MD) in a class of non-monotone problems whose solutions coincide with those of a naturally associated variational inequality - a property which we call coherence. We first show that ordinary, "vanilla" MD converges under a strict version of this condition, but not otherwise; in particular, it may fail to converge even in bilinear models with a unique solution. We then show that this deficiency is mitigated by optimism: by taking an "extra-gradient" step, optimistic mirror descent (OMD) converges in all coherent problems. Our analysis generalizes and extends the results of Daskalakis et al. (2018) for optimistic gradient descent (OGD) in bilinear problems, and makes concrete headway for establishing convergence beyond convex-concave games. We also provide stochastic analogues of these results, and we validate our analysis by numerical experiments in a wide array of GAN models (including Gaussian mixture models, as well as the CelebA and CIFAR-10 datasets).
In , we develop a new stochastic algorithm with variance reduction for solving pseudo-monotone stochastic variational inequalities. Our method builds on Tseng’s forward-backward-forward algorithm, which is known in the deterministic literature to be a valuable alternative to Korpelevich’s extragradient method when solving variational inequalities over a convex and closed set governed with pseudo-monotone and Lipschitz continuous operators. The main computational advantage of Tseng’s algorithm is that it relies only on a single projection step, and two independent queries of a stochastic oracle. Our algorithm incorporates a variance reduction mechanism, and leads to a.s. convergence to solutions of a merely pseudo-monotone stochastic variational inequality problem. To the best of our knowledge, this is the first stochastic algorithm achieving this by using only a single projection at each iteration.
This section describes three contributions on machine learning.
In , we examine the convergence of no-regret learning in games with continuous action sets. For concreteness, we focus on learning via "dual averaging", a widely used class of no-regret learning schemes where players take small steps along their individual payoff gradients and then "mirror" the output back to their action sets. In terms of feedback, we assume that players can only estimate their payoff gradients up to a zero-mean error with bounded variance. To study the convergence of the induced sequence of play, we introduce the notion of variational stability, and we show that stable equilibria are locally attracting with high probability whereas globally stable equilibria are globally attracting with probability 1. We also discuss some applications to mixed-strategy learning in finite games, and we provide explicit estimates of the method's convergence speed.
Resource allocation games such as the famous Colonel Blotto (CB) and Hide-and-Seek (HS) games are often used to model a large variety of practical problems, but only in their one-shot versions. Indeed, due to their extremely large strategy space, it remains an open question how one can efficiently learn in these games. In this work, we show that the online CB and HS games can be cast as path planning problems with side-observations (SOPPP): at each stage, a learner chooses a path on a directed acyclic graph and suffers the sum of losses that are adversarially assigned to the corresponding edges; and she then receives semi-bandit feedback with side-observations (i.e., she observes the losses on the chosen edges plus some others). We propose a novel algorithm, EXP3-OE, the first-of-its-kind with guaranteed efficient running time for SOPPP without requiring any auxiliary oracle. We provide an expected-regret bound of EXP3-OE in SOPPP matching the order of the best benchmark in the literature. Moreover, we introduce additional assumptions on the observability model under which we can further improve the regret bounds of EXP3-OE. We illustrate the benefit of using EXP3-OE in SOPPP by applying it to the online CB and HS games.
This contribution appeared in , . In an earlier article , we also looked at the sequential Colonel Blotto game under bandit feedback and we proposed a blackbox optimization based method to optimize the exploration distribution of the classical ComBand algorithm.
In , we study nonzero-sum hypothesis testing games that arise in the context of adversarial classification, in both the Bayesian as well as the Neyman-Pearson frameworks. We first show that these games admit mixed strategy Nash equilibria, and then we examine some interesting concentration phenomena of these equilibria. Our main results are on the exponential rates of convergence of classification errors at equilibrium, which are analogous to the well-known Chernoff-Stein lemma and Chernoff information that describe the error exponents in the classical binary hypothesis testing problem, but with parameters derived from the adversarial model. The results are validated through numerical experiments.
Nicolas Gast obtained funding Enedis for a study on the PLC-G3 protocol (
Nicolas Gast received a grant from the IDEX UGA that funds a two-years post-doctoral researcher (Takai Kennouche) for two years (2018 and 2019) to work on the smart-grid project that focus on distributed optimization in electrical distribution networks.
Patrick Loiseau and Panayotis Mertikopoulos received a grant from the IDEX UGA that partly funds a PhD student (Benjamin Roussillon) to work on game theoretic models for adversarial classification.
Arnaud Legrand is the leader of the HAC SPECIS project. The goal of the HAC SPECIS (High-performance Application and Computers: Studying PErformance and Correctness In Simulation) project is to answer methodological needs of HPC application and runtime developers and to allow to study real HPC systems both from the correctness and performance point of view. To this end, we gather experts from the HPC, formal verification and performance evaluation community. Inria Teams: AVALON, POLARIS, MYRIADS, SUMO, HIEPACS, STORM, MEXICO, VERIDIS.
Patrick Loiseau and Bary Pradelski received a grant from the Presidence of Grenoble INP that covers half of the funding of PhD student Dimitrios Moustakas to work on dynamic matching. This PhD is done in collaboration with Univ. Zurich (Heinrich Nax), which covers the rest.
Patrick Loiseau and Panayotis Mertikopoulos received a grant from DGA that complements the funding of PhD student (Benjamin Roussillon) to work on game theoretic models for adversarial classification.
PGMO projects are supported by the Jacques Hadamard Mathematical Foundation (FMJH). Our project (HEAVY.NET) is focused on congested networks and their asymptotic properties.
Panayotis Mertikopoulos is co-PI of a PEPS I3A project: MixedGAN ("Mixed-strategy generative adversarial networks") (PI: R. Laraki, U. Dauphine).
Project IAM (Informatique à la Main) funded by fondation Blaise Pascal (Jean-Marc Vincent).
MIAI @ Grenoble Alpes (Multidisciplinary Institute in Artificial Intelligence) is the 3IA institute of Grenoble that was selected by the government in 2019. With the MIAI institute, Patrick Loiseau is the co-holder of a chair on “Explainable and Responsible AI” of which Nicolas Gast and Bary Pradelski are also members; and Panayotis Mertikopoulos is a member of the “Optimization and Learning” chair.
Nicolas Gast obtained funding from the ANR JCJC for the project REFINO. 250k euros. Duration: 4 years
Bary Pradelski (PI), P. Mertikopoulos and P. Loiseau obtained funding from the ANR for the project ALIAS (Adaptive Learning for Interactive Agents and Systems). This is a bilateral PRCI (collaboration internationale) project joint with Singapore University of Technology and Design (SUTD). The Singapore team consists of G. Piliouras and G. Panageas.
ORACLESS (2016–2021)
ORACLESS is an ANR starting grant (JCJC) coordinated by Panayotis Mertikopoulos. The goal of the project is to develop highly adaptive resource allocation methods for wireless communication networks that are provably capable of adapting to unpredictable changes in the network. In particular, the project will focus on the application of online optimization and online learning methodologies to multi-antenna systems and cognitive radio networks.
CONNECTED (2016–2019)
CONNECTED is an ANR Tremplin-ERC (T-ERC) grant coordinated by Patrick Loiseau. The goal of the project is to work on several game-theoretic models involving learning agents and data revealed by strategic agents in response to the learning algorithms, so as to derive better learning algorithms for such special data.
Title: Reproducible Data Science
International Partner (Institution - Laboratory - Researcher):
Universidade Federal do Rio Grande do Sul (Brazil) - Industrial Engineering and Operations Research Departments - Lucas Mello Schnorr
Start year: 2019
Data science builds on a variety of technique and tools that makes analysis often difficult to follow and reproduce. The goal of this project is to develop interactive, reproducible and scalable analysis workflows that provide uncertainty and quality estimators about the analysis.
GENE
Title: Stochastic dynamics of large games and networks
International Partners (Institution - Laboratory - Researcher):
Universidad de Buenos Aires (Argentina) - Matthieu Jonckheere
Universidad de la Republica Uruguay (Uruguay) - Federico La Rocca
CNRS (France) - Balakrishna Prabhu
Universidad ORT Uruguay (Uruguay) - Andrés Ferragut
Duration: 2018 - 2019
Start year: 2018
Through the creation and consolidation of strong research and formation exchanges between Argentina, France and Uruguay, the GENE project will contribute to the fields of performance evaluation and control of communication networks, using tools of game theory, probability theory and control theory. Some of the challenges this project will address are: - Mean-field games and their application to load balancing and resource allocations, - Scaling limits for centralized and decentralized load balancing strategies and implementation of practical policies for web servers farms, - Information diffusion and communication protocols in large and distributed wireless networks.
We have hosted multiple international scientists for short (typically one-week) visits: Jonathan Newton, Paul Duetting, Jason Marden, Bruno Ziliotto
V. Danjean spent one week at Porto Allegre (Brasil) at UFRGS, hosted by Lucas M. Schnorr to work on the research subject: Tracing of multi-tasked OpenMP Application.
A. Legrand spent 10 days at Porto Allegre (Brasil) at UFRGS, hosted by Lucas M. Schnorr to teach scientific methodology and Performance Evaluation and to work on the visual performance analysis of dynamic task-based applications.
G. Huard visited UFRGS (Porto Alegre, Brasil) in the context of the ReDaS Inria associated team from Nov. 27th to Dec 16th along with Alexis Janon. During this visit we worked with Lucas Schnorr on several application trace analysis cases using our own custom analysis framework and leveraging UFRGS expertise on the design and conduct of practical data analysis.
B. Pradelski was invited for seminars at several places: IHP Game Theory Seminar, Bar-Ilan University Economic Theory seminar, University of Oxford Game Theory seminar. He is also an associate member of the Oxford Man Institute.
P. Mertikopoulos was invited to spend a three-month research visit at the Ecole Polytechnique Fédérale de Lausanne (EPFL). He was hosted by the LIONS lab (headed by V. Cevher).
P. Mertikopoulos was a technical program co-chair of the 10th International Conference on NETwork Games, COntrol and OPtimisation (NetGCoop 2020).
B. Pradelski was a technical program co-chair of the 14th Workshop on the Economics of Networks, Systems and Computation (NetEcon), colocated with ACM SIGMETRICS and EC.
B.Gaujal organized a special day on Potential games at Gamenets ( Paris)
P. Mertikopoulos co-organized the workshop “20PoA: Twenty years of the Price of Anarchy” (Chania, Greece, July 2019).
B. Gaujal: WiOpt, NeuIPS
J. Anselmi: Valuetools, ASMTA
P. Mertikopoulos: The 2020 French Days on Optimization and Decision Science (SMAI MODE 2020)
A. Legrand: ISC HIGH PERFORMANCE
N. Gast: SIGMETRICS, NeurIPS, ValueTools
J-M. Vincent: Epew, Valuetools, Simultech
P. Loiseau: NeurIPS, ICML, SIGMETRICS, PETS, NetEcon
P. Mertikopoulos: ICML, NeurIPS (area chair).
P. Mertikopoulos serves as an associate editor for JDG (Journal of Dynamics and Games) and MCAP (Methodology and Computing in Applied Probability).
N. Gast serves as an associate editor for Performance Evaluation and Stochastic Models.
P. Loiseau is an associate editor for ACM Transactions on Internet Technology and IEEE Transactions on Big Data.
All members of the team are active reviewers for several international journals and conferences.
B. Gaujal:
23/01: ENS Lyon seminars (Le Pleynet) “Evolutionary games and bounded rationality”
7/02: Eva Tardos seminar (Grenoble): “price of anarchy in routing games”
3/05: Workshop for Eitan Altman's 60th Birthday (Avignon) “Sturmian words at work in optimal routing”
P. Mertikopoulos:
Invited instructor at the CONNECT Summer School on Machine Learning for Communications “Online learning and optimization for wireless systems”, Trinity College, Dublin
Invited talk at ICCOPT 2019 (2019 International Conference on Continuous Optimization), Berlin, August 2019
Invited talk at NPCG 2019 (Workshop on Network, Population and Congestion Games), Paris, April 2019
Invited talk at GDO 2019 (Workshop on Games, Dynamics and Optimization), Cluj-Napoca, April 2019
Invited talk at OSL 2019 (Workshop on Optimization and Statistical Learning), Les Houches, March 2019
Invited talk at EPFL Machine Learning Seminar, March 2019
Invited talk at the Criteo AI Lab, February 2019
A. Legrand:
Simulation of HPC applications and predictions, Scheduling workshop, Bordeaux (27/6/19)
Series of talks about reproducible research: TILECS workshop, Grenoble (3/7/19); UFRGS keynote, Porto Alegre (9/10/19); SBAC-PAD conference, Campo Grande (17/10/19); Formidex, UGA (6/11/19); Doctoral school, Neuchatel (7/11/19); Inria Alumni, Paris (12/11/19)
Nicolas Gast was invited to give a tutorial about “Mean field and refined mean field approximation” at the conference ITC.
B. Gaujal is a member of the scientific committee of GDR-IM and a member of the council of `pole MSTIC' Grenoble
P. Mertikopoulos is a member of the steering committee (comité de liaison) of the optimization and decision theory group of the French Society for Industrial and Applied Mathematics (SMAI)
P. Mertikopoulos is the working group coordinator, core group member and management committee (MC) representative for France in the European Network for Game Theory (GAMENET).
P. Loiseau is the chair of the steering committee of NetEcon.
We only list the master level teaching.
B. Gaujal was involved in multiple courses:
M2 course in ENS Lyon with Panayotis Mertikopoulos : Online Optimization
M2 course in MPRI (Paris) with Ana Busic: performance evaluation in communication networks
M2 course (Ensimag) on network performance models
M1 exercice session (Ensimag) applied probability
P. Mertikopoulos gave an invited PhD level course at EPFL on “Min-max optimization and variational inequalities”.
V. Danjean was involved in INFO3 and INFO4 at Polytech Grenoble (System Architecture, Internship supervising, ...) and in M1 Info (Operationg systems and Parallel Programming course, Operating System project)
A. Legrand was involved in multiple courses:
Scientific Methodology and Performance Evaluation (M2 MOSIG, UGA)
Scientific Methodology and Performance Evaluation (M2 Univ. Federale do Rio Grande do Sul, Porto Alegre)
Parallel Systems (M2 MOSIG, UGA)
Probability and Simulation (M1, Polytech/UGA)
Performance Evaluation (M1, Polytech/UGA)
Reproducible Research (Doctoral School MSTII, UGA)
J. Anselmi taught in the course Probability and Simulation (M1, Polytech/UGA).
P. Loiseau taught in the courses Probability and Simulation (M1, Polytech/UGA) and “Algorithms for data processing” (M1 INFO, UGA).
N. Gast is responsible of the master course “Optimization under Uncertainties” (Master 2 ORCO in Grenoble).
J.-M. Vincent teaches Probability for Informatics and Performance Evaluation at Ensimag, and Mathematics for Computer Science (1st year) and Scientific Methodology and Performance Evaluation (2nd year) at the Master of Computer Science.
G. Huard taught the course Object Oriented Design class for the M1 INFO, UGA.
Supervision of PhD students and postdocs:
B. Jonglez (Bruno Gaujal and Martin Heusse)
S. Plassart (Bruno Gaujal and Alain Girault)
K. Khun (Bruno Gaujal and Nicolas Gast)
C. Yan (Bruno Gaujal and Nicolas Gast)
K. Antonakopoulos (P. Mertikopoulos and E. V. Belmega, ETIS/ENSEA)
B. Roussillon (P. Mertikopoulos and P. Loiseau)
B. Donassolo (P. Mertikopoulos and A. Legrand):
P. Rocha Bruel (A. Legrand and Alfredo Goldman)
T. Cornebize (A. Legrand)
C. Heinrich (A. Legrand)
S. Zrigui (A. Legrand and D. Trystram)
A. Janon (G. Huard and A. Legrand)
V. Emelianov (N. Gast and P. Loiseau)
T. Barzolla (N. Gast with Vincent Jost and Van-Dat Cung from G-SCOP laboratory)
M. Mendil (N. Gast)
T. Kennouche (N. Gast)
U. Ozeer (J-M. Vincent)
Dong Quan Vu (P. Loiseau)
Vera Sosnovik (O. Goga and P. Loiseau)
Eleni Gkiouzepi (P. Loiseau)
Lucas Leandro Nesi (A. Legrand and Lucas Mello Schnorr)
Dimitrios Moustakas (B. Pradelski and P. Loiseau, with H. Nax from UZH)
Simon Jantscheg (B. Pradelski and P. Loiseau, with H. Nax from UZH)
Supervision of M2 Students:
Manal Benaissa (V. Danjean)
Leo Gayral (Bruno Gaujal and Federica Garin)
Kimang Khun (Bruno Gaujal and Nicolas Gast)
Nicolas Rocher (Patrick Loiseau and Panayotis Mertikopoulos)
Chen Yan (Nicolas Gast)
Dimitrios Moustakas (B. Pradelski)
B. Gaujal was a reviewer of the PhD Thesis of Paulin Jacquot (Ecole Polytechnique).
V. Danjean was involved in several teaching jurys: INFO3 at Polytech Grenoble, L3 M&I, M1 Info, DU ISN and DIU EIL at UGA.
A. Legrand was a reviewer of the PhD Thesis of Mohamad El Sayah (Univ. Franche Comté, Besançon)
N. Gast was member of the PhD Jury of Celine Comte and Eyal Castiel.
Patrick Loiseau wrote with Oana Goga an article “Publicité en Ligne : reprenons la main !”, that was co-published co-published by the blog Binaire (Le Monde) and The Conversation France, June 3, 2019.
B. Gaujal is a member of the CR2 hiring committee in Grenoble.
J.-M. Vincent is in charge of the relation Rectorat / Inria-Grenoble for the organization of scientific events (Festival of Science, Schools Visits, organization of Conference Cycles on research in CS and Applied Mathematics for teachers in Colleges)
J.-M. Vincent is
Member of the national coordination of the Diplôme Inter-Universitaire “Enseigner l’Informatique au Lycée” (50 universities involved).
Local Head of DIU EIL ine Academy of Grenoble
Member of the organization of the teaching sessions for all the teachers in CS coming from abroad
Member of the national Commission Inter-Irem in Informatics
Member of the first national jury for the competitive recruitment of teachers in computer science (Capes NSI 2019-20)
Arnaud Legrand participated in the writing of a book on Reproducible Research, which aims at helping students and engineers and researchers to find efficient and accessible ways leading them to improve their reproducible research practices.
V. Danjean is the head of the DU ISN formation (Diplôme Universitaire Informatique et Sciences du Numérique)
V. Danjean co-organized the new DIU EIL formation (Diplôme Inter-Universitaire Enseigner l'Informatique au Lycée). He is involved both at the national level (for the coordination and the definition of the content of this formation provided in more than 30 universities in France), and at the local level (coordination of the local teams, courses scheduling, conference organization, ...)
V. Danjean participated in “La Fête de la Science”, animating several sessions of “unplugged computer science”
P. Loiseau co-rganized and animated a workshop “IA, éthique et société”, Forum Ecobiz Grenoble, October 2019.
P. Loiseau participated in a debate “Ethique et numérique : quels enjeux sociétaux ?”. Festival Transfo, Grenoble, France, January 2019.