## Section: Scientific Foundations

### Statistical analysis of time series

Many of the problems of machine learning can be seen as extensions of classical problems of mathematical statistics to their (extremely) non-parametric and model-free cases. Other machine learning problems are founded on such statistical problems. Statistical problems of sequential learning are mainly those that are concerned with the analysis of time series. These problems are as follows.

#### Sequence prediction

Given a series of observations ${x}_{1},\cdots ,{x}_{n}$ it is required to predict the probability distribution of the next outcome ${x}_{n+1}$, before it is revealed and the process continues. Different goals can be formulated in this setting. One can either make some assumptions on the probability measure that generates the sequence ${x}_{1},\cdots ,{x}_{n},\cdots $, such as that the outcomes are independent and identically distributed (i.i.d.), or that the sequence is a Markov chain, that it is a stationary process, etc. More generally, one can assume that the data is generated by a probability measure that belongs to a certain set $\mathcal{C}$. In these cases the goal is to have the discrepancy between the predicted and the “true” probabilities to go to zero, if possible, with guarantees on the speed of convergence.

Alternatively, rather than making some assumptions on the data, one can change the goal: the predicted probabilities should be asymptotically as good as those given by the best reference predictor from a certain pre-defined set.

#### Hypothesis testing

Given a series of observations of ${x}_{1},\cdots ,{x}_{n},\cdots $ generated by some unknown probability measure $\mu $, the problem is to test a certain given hypothesis ${H}_{0}$ about $\mu $, versus a given alternative hypothesis ${H}_{1}$. There are many different examples of this problem. Perhaps the simplest one is testing a simple hypothesis “$\mu $ is Bernoulli i.i.d. measure with probability of 0 equals 1/2” versus “$\mu $ is Bernoulli i.i.d. with the parameter different from 1/2”. More interesting cases include the problems of model verification: for example, testing that $\mu $ is a Markov chain, versus that it is a stationary ergodic process but not a Markov chain. In the case when we have not one but several series of observations, we may wish to test the hypothesis that they are independent, or that they are generated by the same distribution. Applications of these problems to a more general class of machine learning tasks include the problem of feature selection, the problem of testing that a certain behaviour (such pulling a certain arm of a bandit, or using a certain policy) is better (in terms of achieving some goal, or collecting some rewards) than another behaviour, or than a class of other behaviours.

The problem of hypothesis testing can also be studied in its general formulations: given two (abstract) hypothesis ${H}_{0}$ and ${H}_{1}$ about the unknown measure that generates the data, fund out whether it is possible to test ${H}_{0}$ against ${H}_{1}$ (with confidence), and if yes then how can one do it.

#### Clustering

The problem of clustering, while being a classical problem of mathematical statistics, belongs to the realm of unsupervised learning. For time series, this problem can be formulated as follows: given several samples ${x}^{1}=({x}_{1}^{1},\cdots ,{x}_{{n}_{1}}^{1}),\cdots ,{x}^{N}=({x}_{N}^{1},\cdots ,{x}_{{n}_{N}}^{N})$, we wish group similar objects together. While this is of course not a precise formulation, it can be made precise if we assume that the samples were generated by $k$ different distributions. Alternatively, one may assume some specific model on the data, leading to different formalizations of the problem.