## Section: Research Program

### Large System Modeling and Analysis

Participants : Nicolas Gast, Bruno Gaujal, Arnaud Legrand, Panayotis Mertikopoulos, Florence Perronnin, Olivier Richard, Jean-Marc Vincent.

Markov chains, Queuing networks, Mean field approximation, Simulation, Performance evaluation, Discrete event dynamic systems.

#### Simulation of distributed systems

Since the advent of distributed computer systems, an active field of
research has been the investigation of *scheduling* strategies
for parallel applications. The common approach is to employ
scheduling heuristics that approximate an optimal schedule.
Unfortunately, it is often impossible to obtain analytical results
to compare the efficiency of these heuristics. One possibility is
to conduct large numbers of back-to-back experiments on real
platforms. While this is possible on tightly-coupled platforms, it
is unfeasible on modern distributed platforms (i.e., grids or
peer-to-peer environments) as it is labor-intensive and does not
enable repeatable results. The solution is to resort to
*simulations*.

##### Flow Simulations

To make simulations of large systems efficient and trustful, we have used flow simulations (where streams of packets are abstracted into flows). SimGrid is a simulation platform that specifically targets the simulation of large distributed systems (grids, clusters, peer-to-peer systems, volunteer computing systems, clouds) from the perspective of applications. It enables to obtain repeatable results and to explore wide ranges of platform and application scenarios.

##### Perfect Simulation

Using a constructive representation of a Markovian queuing network based on events (often called GSMPs), we have designed perfect simulation algorithms computing samples distributed according to the stationary distribution of the Markov process with no bias. The tools based on our algorithms ($\psi $) can sample the stationary measure of Markov processes using directly the queuing network description. Some monotone networks with up to ${10}^{50}$ states can be handled within minutes over a regular PC.

#### Fluid models and mean field limits

When the size of systems grows very large, one may use asymptotic techniques to get a faithful estimate of their behavior. One such tool is mean field analysis and fluid limits, that can be used at a modeling and simulation level. Proving that large discrete dynamic systems can be approximated by continuous dynamics uses the theory of stochastic approximation pioneered by Michel Benaïm or population dynamics introduced by Thomas Kurtz and others. We have extended the stochastic approximation approach to take into account discontinuities in the dynamics as well as to tackle optimization issues.

Recent applications include call centers and peer to peer systems, where the mean field approach helps to get a better understanding of the behavior of the system and to solve several optimization problems. Another application concerns task brokering in desktop grids taking into account statistical features of tasks as well as of the availability of the processors. Mean field has also been applied to the performance evaluation of work stealing in large systems and to model central/local controllers as well as knitting systems.

#### Game Theory

Resources in large-scale distributed platforms (grid computing platforms, enterprise networks, peer-to-peer systems) are shared by a number of users having conflicting interests who are thus prone to act selfishly. A natural framework for studying such non-cooperative individual decision-making is game theory. In particular, game theory models the decentralized nature of decision-making.

It is well known that such non-cooperative behaviors can lead to
important inefficiencies and unfairness. In other words, individual
optimizations often result in global resource waste. In the context
of game theory, a situation in which all users selfishly optimize their
own utility is known as a *Nash equilibrium* or *Wardrop
equilibrium*. In such equilibria, no user has interest in
unilaterally deviating from its strategy. Such policies are thus
very natural to seek in fully distributed systems and have some
stability properties. However, a possible consequence is the
*Braess paradox* in which the increase of resource happens at
the expense of *every* user. This is why, the study of the
occurrence and degree of such inefficiency is of crucial
interest. Up until now, little is known about general conditions for
optimality or degree of efficiency of these equilibria, in a general
setting.

Many techniques have been developed to enforce some form of
collaboration and improve these equilibria. In this context, it is
generally prohibitive to take joint decisions so that a global
optimization cannot be achieved. A possible option relies on the
establishment of virtual prices, also called *shadow prices* in
congestion networks. These prices ensure a rational use of
resources.

Once the payoffs are fixed (using shadow prices or not), the main question is to design algorithms that allow the players to learn Nash equilibria in a distributed way, while being robust to noise and information delay as well as fast enough to outrate changing conditions of the environment.