Publications of the year

Doctoral Dissertations and Habilitation Theses

  • 1A. Khaleghi.

    Sur quelques problèmes non-supervisés impliquant des séries temporelles hautement dèpendantes, Institut national de recherche en informatique et en automatique (Inria), November 2013.


Articles in International Peer-Reviewed Journals

  • 2M. G. Azar, R. Munos, H. Kappen.

    Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model, in: Machine Learning, 2013, vol. 91, no 3, pp. 325-349.

  • 3O. Cappé, A. Garivier, O.-A. Maillard, R. Munos, G. Stoltz.

    Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation, in: Annals of Statistics, 2013, vol. 41, no 3, pp. 1516-1541, Accepted.

  • 4J. Fruitet, A. Carpentier, R. Munos, M. Clerc.

    Automatic motor task selection via a bandit algorithm for a brain-controlled button, in: Journal of Neural Engineering, January 2013, vol. 10, no 1. [ DOI : 10.1088/1741-2560/10/1/016012 ]

  • 5M. Hauskrecht, I. Batal, M. Valko, S. Visweswaran, G. F. Cooper, G. Clermont.

    Outlier detection for patient monitoring and alerting, in: Journal of Biomedical Informatics, February 2013, vol. 46, pp. 47-55. [ DOI : 10.1016/j.jbi.2012.08.004 ]

  • 6D. Ryabko, J. Mary.

    A Binary-Classification-Based Metric between Time-Series Distributions and Its Use in Statistical and Learning Problems, in: Journal of Machine Learning Research, 2013, vol. 14, pp. 2837-2856.

  • 7B. Ryabko, D. Ryabko.

    A confidence-set approach to signal denoising, in: Statistical Methodology, 2013, vol. 15, pp. 115–120.


International Conferences with Proceedings

  • 8B. Avila Pires, M. Ghavamzadeh, C. Szepesvari.

    Cost-sensitive Multiclass Classification Risk Bounds, in: International Conference on Machine Learning, Atlanta, United States, 2013.

  • 9A. Carpentier, R. Munos.

    Toward optimal stratification for stratified monte-carlo integration, in: International Conference on Machine Learning, United States, 2013.

  • 10P. Chainais, C. Richard.

    Learning a common dictionary over a sensor network, in: CAMSAP 2013, Saint-Martin, France, December 2013, pp. 1-4.

  • 11R. Fonteneau, L. Busoniu, R. Munos.

    Optimistic planning for belief-augmented Markov decision processes, in: IEEE International Symposium on Adaptive Dynamic Programming and reinforcement Learning, ADPRL 2013, Singapore, April 2013.

  • 12V. Gabillon, M. Ghavamzadeh, B. Scherrer.

    Approximate Dynamic Programming Finally Performs Well in the Game of Tetris, in: Neural Information Processing Systems (NIPS) 2013, South Lake Tahoe, United States, 2013.

  • 13M. Gheshlaghi Azar, A. Lazaric, B. Emma.

    Regret Bounds for Reinforcement Learning with Policy Advice, in: ECML/PKDD - European conference on machine learning and principles and practice of knowledge discovery in databases - 2013, Prague, Czech Republic, September 2013.

  • 14M. Gheshlaghi Azar, A. Lazaric, B. Emma.

    Sequential Transfer in Multi-armed Bandit with Finite Set of Models, in: NIPS - Advances in Neural Information Processing Systems 25 - 2013, Lake Tahoe, United States, December 2013.

  • 15H. Kadri, M. Ghavamzadeh, P. Preux.

    A Generalized Kernel Approach to Structured Output Learning, in: International Conference on Machine Learning (ICML), Atlanta, United States, 2013.

  • 16G. Kedenburg, R. Fonteneau, R. Munos.

    Aggregating optimistic planning trees for solving markov decision processes, in: Advances in Neural Information Processing Systems, United States, 2013, pp. 2382-2390.

  • 17A. Khaleghi, D. Ryabko.

    Nonparametric multiple change point estimation in highly dependent time series, in: Proc. 24th International Conf. on Algorithmic Learning Theory (ALT'13), Singapore, Springer, 2013, pp. 382-396.

  • 18N. Korda, E. Kaufmann, R. Munos.

    Thompson sampling for one-dimensional exponential family bandits, in: Advances in Neural Information Processing Systems, United States, 2013.

  • 19B. Kveton, M. Valko.

    Learning from a Single Labeled Face and a Stream of Unlabeled Data, in: 10th IEEE International Conference on Automatic Face and Gesture Recognition, Shanghai, China, January 2013.

  • 20O.-A. Maillard, P. Nguyen, R. Ortner, D. Ryabko.

    Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning, in: ICML - 30th International Conference on Machine Learning, Atlanta, USA, United States, 2013, vol. 28(1), pp. 543-551.

  • 21P. Nguyen, O.-A. Maillard, D. Ryabko, R. Ortner.

    Competing with an Infinite Set of Models in Reinforcement Learning, in: AISTATS, Arizona, United States, JMLR W&CP, 2013, vol. 31, pp. 463-471.

  • 22D. Ryabko.

    Time-series information and learning, in: ISIT - International Symposium on Information Theory, Istanbul, Turkey, 2013, pp. 1392-1395.

  • 23D. Ryabko.

    Unsupervised model-free representation learning, in: Proc. 24th International Conf. on Algorithmic Learning Theory (ALT'13), Singapore, Springer, 2013, pp. 354-366.

  • 24B. Szorenyi, R. Busa-Fekete, I. Hegedüs, R. Ormandi, M. Jelasity, B. Kégl.

    Gossip-based distributed stochastic bandit algorithms, in: 30th International Conference on Machine Learning (ICML 2013), Atlanta, United States, S. Dasgupta, D. McAllester (editors), 2013, vol. 28, pp. 19-27.

  • 25E. M. Thomas, M. Clerc, A. Carpentier, E. Daucé, D. Devlaminck, R. Munos.

    Optimizing P300-speller sequences by RIP-ping groups apart, in: IEEE/EMBS 6th international conference on neural engineering (2013), San Diego, United States, IEEE/EMBS, November 2013.

  • 26M. Valko, A. Carpentier, R. Munos.

    Stochastic Simultaneous Optimistic Optimization, in: 30th International Conference on Machine Learning, Atlanta, United States, February 2013.

  • 27M. Valko, N. Korda, R. Munos, I. Flaounas, N. Cristianini.

    Finite-Time Analysis of Kernelised Contextual Bandits, in: The 29th Conference on Uncertainty in Artificial Intelligence, Bellevue, United States, 2013.


National Conferences with Proceedings

  • 28P. Bas, P. Chainais, E. Zidel - Cauffet.

    Quantification adaptative pour la stéganalyse d'images texturées, in: GRETSI 2013, Brest, France, September 2013.

  • 29P. Chainais, C. Richard.

    Distributed dictionary learning over a sensor network, in: CaP 2013, Villeneuve d'Ascq, France, July 2013, pp. 1-4.


Scientific Books (or Scientific Book chapters)

  • 30L. Busoniu, R. Munos, R. Babuska.

    A review of optimistic planning in Markov decision processes, in: Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control, F. Lewis, D. Liu (editors), IEEE Press Series on Computational Intelligence, Wiley-IEEE Press, January 2013, chap. 22, pp. 494-516.


Internal Reports

References in notes
  • 34P. Auer, N. Cesa-Bianchi, P. Fischer.

    Finite-time analysis of the multi-armed bandit problem, in: Machine Learning, 2002, vol. 47, no 2/3, pp. 235–256.
  • 35R. Bellman.

    Dynamic Programming, Princeton University Press, 1957.
  • 36D. Bertsekas, S. Shreve.

    Stochastic Optimal Control (The Discrete Time Case), Academic Press, New York, 1978.
  • 37D. Bertsekas, J. Tsitsiklis.

    Neuro-Dynamic Programming, Athena Scientific, 1996.
  • 38T. Ferguson.

    A Bayesian Analysis of Some Nonparametric Problems, in: The Annals of Statistics, 1973, vol. 1, no 2, pp. 209–230.
  • 39T. Hastie, R. Tibshirani, J. Friedman.

    The elements of statistical learning — Data Mining, Inference, and Prediction, Springer, 2001.
  • 40W. Powell.

    Approximate Dynamic Programming, Wiley, 2007.
  • 41M. Puterman.

    Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley and Sons, 1994.
  • 42H. Robbins.

    Some aspects of the sequential design of experiments, in: Bull. Amer. Math. Soc., 1952, vol. 55, pp. 527–535.
  • 43J. Rust.

    How Social Security and Medicare Affect Retirement Behavior in a World of Incomplete Market, in: Econometrica, July 1997, vol. 65, no 4, pp. 781–831.

  • 44J. Rust.

    On the Optimal Lifetime of Nuclear Power Plants, in: Journal of Business & Economic Statistics, 1997, vol. 15, no 2, pp. 195–208.
  • 45R. Sutton, A. Barto.

    Reinforcement learning: an introduction, MIT Press, 1998.
  • 46G. Tesauro.

    Temporal Difference Learning and TD-Gammon, in: Communications of the ACM, March 1995, vol. 38, no 3.
  • 47P. Werbos.

    ADP: Goals, Opportunities and Principles, IEEE Press, 2004, pp. 3–44, Handbook of learning and approximate dynamic programming.