## Section: New Results

### Stochastic processes, queueing, control theory and game theory

Participants : Eitan Altman, Julien Gaillard, Majed Haddad, Alain Jean-Marie.

#### Convergence of rolling horizon control

In collaboration with E. Della Vecchia and S. Di Marco (both from National Univ. Rosario, Argentina), A. Jean-Marie has investigated the performance (convergence and error bounds) of the Rolling Horizon heuristic for optimal stochastic control and stochastic games in different modeling situations.

In the case of the long-term average expected gain, they have shown [85] that convergence occurs whenever the value iteration algorithm converges. They have then considered zero-sum semi-Markov games with discounted payoff [54] , [76] , for which they have proved geometric convergence under the usual assumptions of the literature.

#### Impulse control versus continuous control

Impulse control is a modeling framework of optimal control theory, in which the control actions can provoke instantaneous changes in the value of the state. For modelers, it has the features of both continuous-time and discrete-time models, and it can help understand which one to choose in a given optimization situation. A. Jean-Marie has studied the question in conjunction with K. Erdlenbruch (Cemagref ), M. Tidball (Inra ) and M. Moreaux (Univ. Toulouse 1). In a quite generic single-dimensional model, they show that the optimality of impulse policies with respect to “smooth” control policies is strongly related to a submodularity property of the instantaneous cost function [101] .

#### Routing games

Several fundamental results have been obtained in routing games that model finite number of sources of traffic (players) who decide how to split the traffic among various paths. When the number of players is large, the Wardrop equilibrium concept is often used, where the problem is modeled as one with a continuum of decision makers where each has a negligible impact (non atomic game) on other's performance. E. Altman and his co-workers have studied the question of whether Wardrop equilibrium is a good approximation for a problem with finitely many players for which the Nash equilibrium is the solution concept. In [38] , E. Altman, in collaboration with Z. Altman, R. Combes (both from Orange Labs, Issy les Moulineaux) and S. Sorin (Univ. Pierre and Marie Curie (UPMC)) establishes the convergence under mild convexity assumptions on the link costs (or delays). The proof is based on yet another fundamental result derived in that reference and that was later extended in [44] by E. Altman in collaboration with O. Pourtallier (Inria project-team Coprin ), T. Jimenez (Univ. Avignon/LIA) and H. Kameda (Univ. Tsukuba, Japan), that states that if there is some symmetry in a network then any Nash equilibrium will inherit the symmetric properties (for example, if two users have the same source and destination and the same demand then at equilibrium, they will send the same amount of traffic over each link).

In all the above work there is an assumption that the link cost (or delay) per packet is class independent (it depends on the flows through the link only through their sum). In the case of Wardrop equilibrium this assumption implies that the game has an equivalent global optimization problem whose solution coincides with the equilibrium. The link cost evaluated at some $x$ in the equivalent problem is the integral of the original link cost (from 0 to $x$) and is in fact a potential. In the case of class dependent cost, that is, when the cost depends in other ways on the traffic of each class then the result of the integration may depend on the path and one cannot transform the problem to an equivalent optimization one. H. Kameda and J. Li (both from Univ. Tsukuba, Japan) in collaboration with E. Altman, identify in [27] other class-dependent cost that have the property of a field, that is, it can be expressed as the gradient of a potential. They obtain the Wardrop equilibrium and study its properties.

Another difficulty occurs in rouging games when the paths available are not the same for all users. This is the case, in particular, when there are priorities. This problem is addressed in [25] by J. Elias (University Paris Descartes), F. Martignon (University Paris-Sud 11), A. Capone (Politecnico di Milano, Italy) and E. Altman within an application to non-cooperative spectrum access in cognitive radio networks.

#### Bio-inspired paradigms

##### Epidemiology

For several years now, E. Altman has been developing techniques for dynamic optimal control and games in cooperation with with S. Sarkar's group from the University of Pennsylvania (which used to be part of the DAWN associated team with Maestro ). This year this collaboration has resulted in three additional publications co-authored by M.H.R. Khouzani and S. Sarkar (both from Univ. of Pennsylvania, PA, USA) and E. Altman [61] , [60] , [104] . All three papers use the Pontriagin maximal principle to derive the structure of optimal policies applied to a mean-field approximation of the problem. The first two papers do that in a context of optimal control theory while the third one does it in the context of a dynamic game.

##### Sequential anonymous games (SAG)

Sequential Anonymous Games (SAG) can be viewed as an extension of Markov Decision evolutionary games. In both formalisms there are many players modeled as a continuum number of players. A Markov chain is associated with each player. There are several types of players. The fraction of players in each class is called a global state and the state of the Markov chain of an individual is called the individual state. An individual chooses at some sequential decision opportunities actions. It earns some immediate reward (fitness) at each slot and moves with some probability to another individual state. In SAG, both the transition probabilities and the immediate fitness of an individual depend on its current state and action as well as on the current global system state. The latter evolves according to some function averaged over the fitness of the individuals in each class (the fraction of individuals in a class grows if they do better than those in the other classes). In [75] E. Altman investigates, in collaboration with P. Wiecek (Wroclaw Univ. of Technology, Poland), the case where the objective of an individual is to maximize either its total expected fitness during its life time or its expected average fitness. The authors establish the existence of equilibria and study its properties. Applications to power control have appeared in [32] by E. Altman, in collaboration with P. Wiecek (Wroclaw Univ. of Technology, Poland) and Y. Hayel (Univ. Avignon/LIA).

##### Markov decision evolutionary games (MDEGs)

Since his 2004 Infocom paper, E. Altman has been working on this novel paradigm. The model is similar to that of the previous paragraph (SAG) except that in MDEG both immediate fitness and transition probabilities depend linearly on the global state. This reflects a scenario where the interaction between a player and the rest of the population occurs through pairwise interactions: each player encounters from time to time a randomly chosen other player and it finds itself playing a matrix game with that player. The entries of the matrix corresponding to each player as well as the transition probabilities for each player depend on the individual states of the players. E. Altman has applied this model to the dynamic Hawk and Dove game, in which individuals have to choose the degree of aggressiveness in their behavior as a function of their energy state.

Below are three publications both with biological applications and applications to wireless communications (where depending on one's remaining battery energy, one has to decide at what power to transmit). The first publication in [30] , by E. Altman, H. Tembine (Supelec ), R. El-Azouzi and Y. Hayel (both from Univ. Avignon/LIA), lays the foundations of MDEGs and presents the application to power control in which the individual state is the battery level of energy. The second publication in [59] , by Y. Hayel (Univ. Avignon/LIA), E. V. Belmega (Supelec ) and E. Altman, studies theoretical aspects that arise in case that the global state cannot represent the fractions of different populations but rather their actual size. The third publication in [97] by E. Altman, J. Gaillard, M. Haddad and P. Wiecek (Wroclaw Univ. of Technology, Poland) again studies MDEGs (as in the first paper), but restricts to policies that use static policies: the same mixed strategy is taken by a player at each state. The authors manage to compute explicitly the equilibrium in this game within this class of policies.

##### Delayed evolutionary games

Evolutionary game theory includes much theory on the description of the global system state as a function of the fitness of individuals. The models are often described through differential equations (e.g. the “replicator dynamics”). In many scenarios it is realistic to consider delays between the moment that one receives a given fitness till this is translated to a change in the population size. For example, if the lifetime of a computer is three years then an application that performs better with one computer may take more than a year till it is adopted by other users who do not have the same computer. In [30] , H. Tembine (Supelec ), E. Altman, R. El-Azouzi and Y. Hayel (both from Univ. Avignon/LIA) investigate instability phenomena that are introduced by the delay and derive necessary stability conditions.