Members
Overall Objectives
Research Program
Application Domains
Software and Platforms
New Results
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Bibliography

Publications of the year

Doctoral Dissertations and Habilitation Theses

  • 1A. Khaleghi.
    Sur quelques problèmes non-supervisés impliquant des séries temporelles hautement dèpendantes, Institut national de recherche en informatique et en automatique (Inria), November 2013.
    http://hal.inria.fr/tel-00920184

Articles in International Peer-Reviewed Journals

  • 2M. G. Azar, R. Munos, H. Kappen.
    Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model, in: Machine Learning, 2013, vol. 91, no 3, pp. 325-349.
    http://hal.inria.fr/hal-00831875
  • 3O. Cappé, A. Garivier, O.-A. Maillard, R. Munos, G. Stoltz.
    Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation, in: Annals of Statistics, 2013, vol. 41, no 3, pp. 1516-1541, Accepted.
    http://hal.inria.fr/hal-00738209
  • 4J. Fruitet, A. Carpentier, R. Munos, M. Clerc.
    Automatic motor task selection via a bandit algorithm for a brain-controlled button, in: Journal of Neural Engineering, January 2013, vol. 10, no 1. [ DOI : 10.1088/1741-2560/10/1/016012 ]
    http://hal.inria.fr/hal-00798561
  • 5M. Hauskrecht, I. Batal, M. Valko, S. Visweswaran, G. F. Cooper, G. Clermont.
    Outlier detection for patient monitoring and alerting, in: Journal of Biomedical Informatics, February 2013, vol. 46, pp. 47-55. [ DOI : 10.1016/j.jbi.2012.08.004 ]
    http://hal.inria.fr/hal-00742097
  • 6D. Ryabko, J. Mary.
    A Binary-Classification-Based Metric between Time-Series Distributions and Its Use in Statistical and Learning Problems, in: Journal of Machine Learning Research, 2013, vol. 14, pp. 2837-2856.
    http://hal.inria.fr/hal-00913240
  • 7B. Ryabko, D. Ryabko.
    A confidence-set approach to signal denoising, in: Statistical Methodology, 2013, vol. 15, pp. 115–120.
    http://hal.inria.fr/hal-00913253

International Conferences with Proceedings

  • 8B. Avila Pires, M. Ghavamzadeh, C. Szepesvari.
    Cost-sensitive Multiclass Classification Risk Bounds, in: International Conference on Machine Learning, Atlanta, United States, 2013.
    http://hal.inria.fr/hal-00840485
  • 9A. Carpentier, R. Munos.
    Toward optimal stratification for stratified monte-carlo integration, in: International Conference on Machine Learning, United States, 2013.
    http://hal.inria.fr/hal-00923685
  • 10P. Chainais, C. Richard.
    Learning a common dictionary over a sensor network, in: CAMSAP 2013, Saint-Martin, France, December 2013, pp. 1-4.
    http://hal.inria.fr/hal-00923742
  • 11R. Fonteneau, L. Busoniu, R. Munos.
    Optimistic planning for belief-augmented Markov decision processes, in: IEEE International Symposium on Adaptive Dynamic Programming and reinforcement Learning, ADPRL 2013, Singapore, April 2013.
    http://hal.inria.fr/hal-00840202
  • 12V. Gabillon, M. Ghavamzadeh, B. Scherrer.
    Approximate Dynamic Programming Finally Performs Well in the Game of Tetris, in: Neural Information Processing Systems (NIPS) 2013, South Lake Tahoe, United States, 2013.
    http://hal.inria.fr/hal-00921250
  • 13M. Gheshlaghi Azar, A. Lazaric, B. Emma.
    Regret Bounds for Reinforcement Learning with Policy Advice, in: ECML/PKDD - European conference on machine learning and principles and practice of knowledge discovery in databases - 2013, Prague, Czech Republic, September 2013.
    http://hal.inria.fr/hal-00924021
  • 14M. Gheshlaghi Azar, A. Lazaric, B. Emma.
    Sequential Transfer in Multi-armed Bandit with Finite Set of Models, in: NIPS - Advances in Neural Information Processing Systems 25 - 2013, Lake Tahoe, United States, December 2013.
    http://hal.inria.fr/hal-00924025
  • 15H. Kadri, M. Ghavamzadeh, P. Preux.
    A Generalized Kernel Approach to Structured Output Learning, in: International Conference on Machine Learning (ICML), Atlanta, United States, 2013.
    http://hal.inria.fr/hal-00695631
  • 16G. Kedenburg, R. Fonteneau, R. Munos.
    Aggregating optimistic planning trees for solving markov decision processes, in: Advances in Neural Information Processing Systems, United States, 2013, pp. 2382-2390.
    http://hal.inria.fr/hal-00923681
  • 17A. Khaleghi, D. Ryabko.
    Nonparametric multiple change point estimation in highly dependent time series, in: Proc. 24th International Conf. on Algorithmic Learning Theory (ALT'13), Singapore, Springer, 2013, pp. 382-396.
    http://hal.inria.fr/hal-00913250
  • 18N. Korda, E. Kaufmann, R. Munos.
    Thompson sampling for one-dimensional exponential family bandits, in: Advances in Neural Information Processing Systems, United States, 2013.
    http://hal.inria.fr/hal-00923683
  • 19B. Kveton, M. Valko.
    Learning from a Single Labeled Face and a Stream of Unlabeled Data, in: 10th IEEE International Conference on Automatic Face and Gesture Recognition, Shanghai, China, January 2013.
    http://hal.inria.fr/hal-00749197
  • 20O.-A. Maillard, P. Nguyen, R. Ortner, D. Ryabko.
    Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning, in: ICML - 30th International Conference on Machine Learning, Atlanta, USA, United States, 2013, vol. 28(1), pp. 543-551.
    http://hal.inria.fr/hal-00778586
  • 21P. Nguyen, O.-A. Maillard, D. Ryabko, R. Ortner.
    Competing with an Infinite Set of Models in Reinforcement Learning, in: AISTATS, Arizona, United States, JMLR W&CP, 2013, vol. 31, pp. 463-471.
    http://hal.inria.fr/hal-00823230
  • 22D. Ryabko.
    Time-series information and learning, in: ISIT - International Symposium on Information Theory, Istanbul, Turkey, 2013, pp. 1392-1395.
    http://hal.inria.fr/hal-00823233
  • 23D. Ryabko.
    Unsupervised model-free representation learning, in: Proc. 24th International Conf. on Algorithmic Learning Theory (ALT'13), Singapore, Springer, 2013, pp. 354-366.
    http://hal.inria.fr/hal-00913244
  • 24B. Szorenyi, R. Busa-Fekete, I. Hegedüs, R. Ormandi, M. Jelasity, B. Kégl.
    Gossip-based distributed stochastic bandit algorithms, in: 30th International Conference on Machine Learning (ICML 2013), Atlanta, United States, S. Dasgupta, D. McAllester (editors), 2013, vol. 28, pp. 19-27.
    http://hal.inria.fr/in2p3-00907406
  • 25E. M. Thomas, M. Clerc, A. Carpentier, E. Daucé, D. Devlaminck, R. Munos.
    Optimizing P300-speller sequences by RIP-ping groups apart, in: IEEE/EMBS 6th international conference on neural engineering (2013), San Diego, United States, IEEE/EMBS, November 2013.
    http://hal.inria.fr/hal-00907781
  • 26M. Valko, A. Carpentier, R. Munos.
    Stochastic Simultaneous Optimistic Optimization, in: 30th International Conference on Machine Learning, Atlanta, United States, February 2013.
    http://hal.inria.fr/hal-00789606
  • 27M. Valko, N. Korda, R. Munos, I. Flaounas, N. Cristianini.
    Finite-Time Analysis of Kernelised Contextual Bandits, in: The 29th Conference on Uncertainty in Artificial Intelligence, Bellevue, United States, 2013.
    http://hal.inria.fr/hal-00826946

National Conferences with Proceedings

  • 28P. Bas, P. Chainais, E. Zidel - Cauffet.
    Quantification adaptative pour la stéganalyse d'images texturées, in: GRETSI 2013, Brest, France, September 2013.
    http://hal.inria.fr/hal-00868550
  • 29P. Chainais, C. Richard.
    Distributed dictionary learning over a sensor network, in: CaP 2013, Villeneuve d'Ascq, France, July 2013, pp. 1-4.
    http://hal.inria.fr/hal-00923741

Scientific Books (or Scientific Book chapters)

  • 30L. Busoniu, R. Munos, R. Babuska.
    A review of optimistic planning in Markov decision processes, in: Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control, F. Lewis, D. Liu (editors), IEEE Press Series on Computational Intelligence, Wiley-IEEE Press, January 2013, chap. 22, pp. 494-516.
    http://hal.inria.fr/hal-00756742

Internal Reports

References in notes
  • 34P. Auer, N. Cesa-Bianchi, P. Fischer.
    Finite-time analysis of the multi-armed bandit problem, in: Machine Learning, 2002, vol. 47, no 2/3, pp. 235–256.
  • 35R. Bellman.
    Dynamic Programming, Princeton University Press, 1957.
  • 36D. Bertsekas, S. Shreve.
    Stochastic Optimal Control (The Discrete Time Case), Academic Press, New York, 1978.
  • 37D. Bertsekas, J. Tsitsiklis.
    Neuro-Dynamic Programming, Athena Scientific, 1996.
  • 38T. Ferguson.
    A Bayesian Analysis of Some Nonparametric Problems, in: The Annals of Statistics, 1973, vol. 1, no 2, pp. 209–230.
  • 39T. Hastie, R. Tibshirani, J. Friedman.
    The elements of statistical learning — Data Mining, Inference, and Prediction, Springer, 2001.
  • 40W. Powell.
    Approximate Dynamic Programming, Wiley, 2007.
  • 41M. Puterman.
    Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley and Sons, 1994.
  • 42H. Robbins.
    Some aspects of the sequential design of experiments, in: Bull. Amer. Math. Soc., 1952, vol. 55, pp. 527–535.
  • 43J. Rust.
    How Social Security and Medicare Affect Retirement Behavior in a World of Incomplete Market, in: Econometrica, July 1997, vol. 65, no 4, pp. 781–831.
    http://gemini.econ.umd.edu/jrust/research/rustphelan.pdf
  • 44J. Rust.
    On the Optimal Lifetime of Nuclear Power Plants, in: Journal of Business & Economic Statistics, 1997, vol. 15, no 2, pp. 195–208.
  • 45R. Sutton, A. Barto.
    Reinforcement learning: an introduction, MIT Press, 1998.
  • 46G. Tesauro.
    Temporal Difference Learning and TD-Gammon, in: Communications of the ACM, March 1995, vol. 38, no 3.
  • 47P. Werbos.
    ADP: Goals, Opportunities and Principles, IEEE Press, 2004, pp. 3–44, Handbook of learning and approximate dynamic programming.