## Section: New Results

### Statistical analysis of time series

#### Change Point Analysis

*
Nonparametric multiple change point estimation in highly dependent time series [17]
*

Given a heterogeneous time-series sample, it is required to find the points in time (called change points) where the probability distribution generating the data has changed. The data is assumed to have been generated by arbitrary, unknown, stationary ergodic distributions. No modeling, independence or mixing are made. A novel, computationally efficient, nonparametric method is proposed, and is shown to be asymptotically consistent in this general framework; the theoretical results are complemented with experimental evaluations.

#### Clustering Time Series, Online and Offline

*
A Binary-Classification-Based Metric between Time-Series Distributions and Its Use in Statistical and Learning Problems [6]
*

A metric between time-series distributions is proposed that can be evaluated using binary classification methods, which were originally developed to work on i.i.d. data. It is shown how this metric can be used for solving statistical problems that are seemingly unrelated to classification and concern highly dependent time series. Specifically, the problems of time-series clustering, homogeneity testing and the three-sample problem are addressed. Universal consistency of the resulting algorithms is proven under most general assumptions. The theoretical results are illustrated with experiments on synthetic and real-world data.

#### Semi-Supervised and Unsupervised Learning

*
Learning from a Single Labeled Face and a Stream of Unlabeled Data [19]
*

Face recognition from a single image per person is a challenging problem because the training sample is extremely small. We consider a variation of this problem. In our problem, we recognize only one person, and there are no labeled data for any other person. This setting naturally arises in authentication on personal computers and mobile devices, and poses additional challenges because it lacks negative examples. We formalize our problem as one-class classification, and propose and analyze an algorithm that learns a non-parametric model of the face from a single labeled image and a stream of unlabeled data. In many domains, for instance when a person interacts with a computer with a camera, unlabeled data are abundant and easy to utilize. This is the first paper that investigates how these data can help in learning better models in the single-image-per-person setting. Our method is evaluated on a dataset of 43 people and we show that these people can be recognized 90% of time at nearly zero false positives. This recall is 25+% higher than the recall of our best performing baseline. Finally, we conduct a comprehensive sensitivity analysis of our algorithm and provide a guideline for setting its parameters in practice.

*
Unsupervised model-free representation learning [23]
*

Numerous control and learning problems face the situation where sequences of high-dimensional highly dependent data are available, but no or little feedback is provided to the learner. In such situations it may be useful to find a concise representation of the input signal, that would preserve as much as possible of the relevant information. In this work we are interested in the problems where the relevant information is in the time-series dependence. Thus, the problem can be formalized as follows. Given a series of observations $X\_0,\cdots ,X\_n$ coming from a large (high-dimensional) space $\mathcal{X}$, find a representation function $f$ mapping $\mathcal{X}$ to a finite space $\mathcal{Y}$ such that the series $f(X\_0),\cdots ,f(X\_n)$ preserve as much information as possible about the original time-series dependence in $X\_0,\cdots ,X\_n$. For stationary time series, the function $f$ can be selected as the one maximizing the time-series information $I\_\infty \left(f\right)=h\_0\left(f\right(X\left)\right)-h\_\infty \left(f\right(X\left)\right)$ where $h\_0\left(f\right(X\left)\right)$ is the Shannon entropy of $f(X\_0)$ and $h\_\infty \left(f\right(X\left)\right)$ is the entropy rate of the time series $f(X\_0),\cdots ,f(X\_n),\cdots $. In this paper we study the functional $I\_\infty \left(f\right)$ from the learning-theoretic point of view. Specifically, we provide some uniform approximation results, and study the behaviour of $I\_\infty \left(f\right)$ in the problem of optimal control.

*
Time-series information and learning [22]
*

Given a time series $X\_1,\cdots ,X\_n,\cdots $ taking values in a large (high-dimensional) space $\mathcal{X}$, we would like to find a function $f$ from $\mathcal{X}$ to a small (low-dimensional or finite) space $\mathcal{Y}$ such that the time series $f(X\_1),\cdots ,f(X\_n),\cdots $ retains all the information about the time-series dependence in the original sequence, or as much as possible thereof. This goal is formalized in this work, and it is shown that the target function $f$ can be found as the one that maximizes a certain quantity that can be expressed in terms of entropies of the series $\left(f\right(X\_i\left)\right)\_i\in \mathcal{N}$. This quantity can be estimated empirically, and does not involve estimating the distribution on the original time series $(X\_i)\_i\in \mathcal{N}$.