## Section: New Results

### Network Science

Participants : Eitan Altman, Konstantin Avrachenkov, Arun Kadavankandy, Jithin Kazhuthuveettil Sreedharan, Hlib Mykhailenko, Giovanni Neglia, Alina Tuholukova.

#### Computation on Large Graphs

The Maestro team has been working on how to partition large graphs in distributed computation frameworks in order to speed up the execution time.

In [43], H. Mykhailenko and G. Neglia in collaboration with F. Huet (Univ. Côte d'Azur, CNRS, I3S), provide an overview of existing edge partitioning algorithms. However, based only on published work, it is not possible to draw a clear conclusion about the relative performances of these partitioners. For this reason, the authors compare all the edge partitioners currently available for the widely-used framework for graph processing Apache GraphX. Preliminary results suggest that the Hybrid-Cut partitioner provides the best performance.

In [44], H. Mykhailenko and G. Neglia in collaboration with F. Huet (Univ. Côte d'Azur, CNRS, I3S), focus on vertex-cut graph partitioning and they investigate how it is possible to evaluate the quality of a partition before running the computation. To this purpose the authors scrutinize a set of metrics proposed in literature. They carry experiments with Apache GraphX and they perform an accurate statistical analysis. Preliminary experimental results show that communication metrics like vertex-cut and communication cost are effective predictors on most of the cases.

#### Network centrality measures

In [19], K. Avrachenkov in collaboration with V. Mazalov (Karelian Institute of Applied Mathematical Research, Russia), L. Trukhina (Baikal State Univ. of Economics and Law, Russia) and B. Tsynguev (Transbaikal State Univ., Russia) worked on network centrality measures based on game-theoretic concepts. The betweenness centrality is one of the basic concepts in the analysis of the social networks. Initial definition for the betweenness of a node in the graph is based on the fraction of the number of geodesics (shortest paths) between any two nodes that given node lies on, to the total number of the shortest paths connecting these nodes. This method has polynomial complexity. We propose a new concept of the betweenness centrality for weighted graphs using the methods of cooperative game theory. The characteristic function is determined by special way for different coalitions (subsets of the graph). Two approaches are used to determine the characteristic function. In the first approach the characteristic function is determined via the number of direct and indirect weighted connecting paths in the coalition. In the second approach the coalition is considered as an electric network and the characteristic function is determined as a total current in this network. We use Kirchhoff’s law. After that the betweenness centrality is determined as the Myerson value. The results of computer simulations for some examples of networks, in particular, for the popular social network “VKontakte”, as well as the comparing with the PageRank method are presented.

#### Sampling and Inference of Complex Networks

In [32] K. Avrachenkov, G. Neglia and A. Tuholukova study chain-referral methods for sampling in social networks. These methods rely on subjects of the study recruiting other participants among their set of connections. This approach gives us the possibility to perform sampling when the other methods, that imply the knowledge of the whole network or its global characteristics, fail. Chain- referral methods can be implemented with random walks or crawling in the case of online social networks. However, the estimations made on the collected samples can have high variance, especially with small sample size. The other drawback is the potential bias due to the way the samples are collected. We suggest and analyze a subsampling technique, where some users are requested only to recruit other users but do not participate to the study. Assuming that the referral has lower cost than actual participation, this technique takes advantage of exploring a larger variety of population, thus decreasing significantly the variance of the estimator. We test the method on real social networks and on synthetic ones. As by-product, we propose a Gibbs-like method for generating synthetic networks with desired properties.

Function estimation on Online Social Networks (OSN) is an important field of study in complex network analysis. An efficient way to do function estimation on large networks is to use random walks. We can then defer to the extensive theory of Markov chains to do error analysis of these estimators. In [29], K. Avrachenkov, A. Kadavankandy and J.K. Sreedharan in collaboration with V. Borkar (IIT Bombay, India) compare two existing techniques, Metropolis-Hastings MCMC and Respondent-Driven Sampling, that use random walks to do function estimation and compare them with a new reinforcement learning based technique. We provide both theoretical and empirical analyses for the estimators we consider.

In [33] K. Avrachenkov and J.K. Sreedharan in collaboration with B. Ribeiro (Purdue Univ., USA) develop random walk based methods for inference in Online Social Networks (OSNs) to answer questions like are OSN users more likely to form friendships with those with similar attributes? Do users at an OSN A score content more favorably than OSN B users? Such questions frequently arise in the context of Social Network Analysis (SNA) but often crawling an OSN network via its Application Programming Interface (API) is the only way to gather data from a third party. To date, these partial API crawls are the majority of public datasets and the synonym of lack of statistical guarantees in incomplete-data comparisons, severely limiting SNA research progress. Using regenerative properties of the random walks, we propose estimation techniques based on short crawls that have proven statistical guarantees. Moreover, our short crawls can be implemented in massively distributed algorithms. We also provide an adaptive crawler that makes our method parameter-free, significantly improving our statistical guarantees. We then derive the Bayesian approximation of the posterior of the estimates, and in addition, obtain an estimator for the expected value of node and edge statistics in an equivalent configuration model or Chung-Lu random graph model of the given network (where nodes are connected randomly) and use it as a basis for testing null hypotheses. The theoretical results are supported with simulations on a variety of real-world networks.

In [30] K. Avrachenkov in collaboration with L. Iskhakov and M. Mironov (Moscow Institute of Physics and Technology, Russia) consider pairwise Markov random fields which have a number of important applications in statistical physics, image processing and machine learning such as Ising model and labeling problem to name a couple. Our own motivation comes from the need to produce synthetic models for social networks with attributes. First, we give conditions for rapid mixing of the associated Glauber dynamics and consider interesting particular cases. Then, for pairwise Markov random fields with submodular energy functions we construct monotone perfect simulation.

#### Distributed algorithms for complex network analysis

In [31] K. Avrachenkov and J.K. Sreedharan in collaboration with P. Jacquet (Nokia Bell Labs, France) address the problem of finding top-k eigenvalues and corresponding eigenvectors of symmetric graph matrices in networks in a distributed way. We propose a novel idea called complex power iterations in order to decompose the eigenvalues and eigenvectors at node level, analogous to time-frequency analysis in signal processing. At each node, eigenvalues correspond to the frequencies of spectral peaks and respective eigenvector components are the amplitudes at those points. Based on complex power iterations and motivated from fluid diffusion processes in networks, we devise distributed algorithms with different orders of approximation. We also introduce a Monte Carlo technique with gossiping which substantially reduces the computational overhead. An equivalent parallel random walk algorithm is also presented. We validate the algorithms with simulations on real-world networks. Our formulation of the spectral decomposition can be easily adapted to a simple algorithm based on quantum random walks. With the advent of quantum computing, the proposed quantum algorithm will be extremely useful.

In [56] K. Avrachenkov in collaboration with V. Borkar and K. Saboo (IIT Bombay, India) propose two asynchronously distributed approaches for graph-based semi-supervised learning. The first approach is based on stochastic approximation, whereas the second approach is based on randomized Kaczmarz algorithm. In addition to the possibility of distributed implementation, both approaches can be naturally applied online to streaming data. We analyse both approaches theoretically and by experiments. It appears that there is no clear winner and we provide indications about cases of superiority for each approach.

#### Random Matrix Theory for Complex Networks

In [41] A. Kadavankandy and K. Avrachenkov in collaboration with L. Cottatellucci (Eurecom, France) describe a test statistic based on the L1-norm of the eigenvectors of a modularity matrix to detect the presence of an embedded Erdos-Renyi (ER) subgraph inside a larger ER random graph. An embedded subgraph may model a hidden community in a large network such as a social network or a computer network. We make use of the properties of the asymptotic distribution of eigenvectors of random graphs to derive the distribution of the test statistic under certain conditions on the subgraph size and edge probabilities. We show that the distributions differ sufficiently for well defined ranges of subgraph sizes and edge probabilities of the background graph and the subgraph. This method can have applications where it is sufficient to know whether there is an anomaly in a given graph without the need to infer its location. The results we derive on the distribution of the components of the eigenvector may also be useful to detect the subgraph nodes.

#### Network Growth Models

Network growth and evolution is a fundamental theme that has puzzled scientists for the past decades. A number of models have been proposed to capture important properties of real networks. In an attempt to better describe reality, more recent growth models embody local rules of attachment, however they still require a primitive to randomly select an existing network node and then some kind of global knowledge about the network (at least the set of nodes and how to reach them). In [28] G. Neglia, in collaboration with B. Amorim, D. Figueiredo and G. Iacobelli (Federal Univ. of Rio de Janeiro, Brazil), proposes a purely local network growth model that makes no use of global sampling across the nodes. The model is based on a continuously moving random walk that after s steps connects a new node to its current location, but never restarts. Through extensive simulations and theoretical arguments, they analyze the behavior of the model finding a fundamental dependency on the parity of s, where networks with either exponential or a conditional power law degree distribution can emerge. As s increases parity dependency diminishes and the model recovers the degree distribution of Barabási-Albert preferential attachment model. The proposed purely local model indicates that networks can grow to exhibit interesting properties even in the absence of any global rule, such as global node sampling.

#### Competition over popularity in online social networks

In [24] E. Altman in collaboration with A. Jain and Y. Hayel (UAPV) consider a stochastic game that describes competition through advertisement over the popularity of their content. They show that the equilibrium may or may not be unique, depending on the system's parameters. They identify structural properties of the equilibria. In particular, they show that a finite improvement property holds on the best response pure policies which implies the existence of pure equilibria. They further show that all pure equilibria are fully ordered in the performance they provide to the players and propose a procedure to obtain the best equilibrium.

#### Trend detection in social networks using Hawkes processes

In [18], J. C. Louzada Pinto and T. Chahed from Telecom SudParis in collaboration with E. Altman propose a general Hawkes-based framework to model information diffusion in social networks. The proposed framework takes into consideration the hidden interactions between users as well as the interactions between contents and social networks, and can also accommodate dynamic social networks and various temporal effects of the diffusion, which provides a complete analysis of the hidden influences in social networks. This framework can be combined with topic modeling, for which modified collapsed Gibbs sampling and variational Bayes techniques are derived. We provide an estimation algorithm based on nonnegative tensor factorization techniques, which together with a dimensionality reduction argument are able to discover the latent community structure of the social network. We provide numerical examples from real-life networks: a Game of Thrones and a MemeTracker datasets.

#### Potential Game approach to defense against virus attacks in networks

The Susceptible-Infected-Susceptible (SIS) model is a classical epidemic model where agents alternate between a sane (susceptible) and an infected state. SIS epidemic non-zero sum games have been recently used to analyse virus protection in networks. A potential game approach was proposed for solving the game for the case of a fully connected network. In [42], F.-X. Legenvre and Y. Hayel (UAPV) in collaboration with E. Altman extend this result to an arbitrary topology by showing that the general topology game is a generalized ordinal potential game. We apply this result to study numerically some examples.