EN FR
EN FR


Bibliography

Publications of the year

Doctoral Dissertations and Habilitation Theses

  • 1A. Carpentier.

    Toward optimal sampling in low and high dimension, Université Lille 1, Lille, France, Octobre 2012.
  • 2E. Delande.

    Multi-sensor PHD filtering with application to sensor management, Ecole Centrale, Lille, France, Octobre 2012.

    http://www.theses.fr/2012ECLI0001
  • 3J. F. Hren.

    Planification optimiste pour systèmes dèterministes, Université Lille 1, Lille, France, Juin 2012.
  • 4C. Salperwyck.

    Apprentissage incrémental en ligne sur flux de données, Université de Lille 3, Nov 2012.

Articles in International Peer-Reviewed Journals

  • 5M. G. Azar, R. Munos, H. Kappen.

    Minimax PAC-Bounds on the Sample Complexity of Reinforcement Learning with a Generative Model, in: Machine Learning Journal, 2012, To appear.
  • 6O. Cappé, A. Garivier, O.-A. Maillard, R. Munos, G. Stoltz.

    Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation, in: Annals of Statistics, 2012, Submitted to.
  • 7A. Carpentier, A. Lazaric, M. Ghavamzadeh, R. Munos, P. Auer.

    Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits, in: Theoretical Computer Science, 2012, To appear.
  • 8A. Carpentier, R. Munos, A. Antos.

    Minimax strategy for Stratified Sampling for Monte Carlo, in: Journal of Machine Learning Research, 2012, Submitted to.
  • 9G. Dulac-Arnold, L. Denoyer, P. Preux, P. Gallinari.

    Sequential approaches for learning datum-wise sparse representations, in: Machine Learning, October 2012, vol. 89, no 1-2, p. 87-122.
  • 10J. Fruitet, A. Carpentier, R. Munos, M. Clerc.

    Automatic motor task selection via a bandit algorithm for a brain-controlled button, in: Journal of Neural Engineering, 2012, To appear.
  • 11S. Girgin, J. Mary, P. Preux, O. Nicol.

    Managing advertising campaigns – an approximate planning approach, in: Frontiers of Computer Science, 2012, vol. 6, no 2, p. 209-229. [ DOI : 10.1007/s11704-012-2873-5 ]

    http://hal.inria.fr/hal-00747722
  • 12M. Hauskrecht, I. Batal, M. Valko, S. Visweswaran, G. F. Cooper, G. Clermont.

    Outlier detection for patient monitoring and alerting., in: Journal of Biomedical Informatics, August 2012. [ DOI : 10.1016/j.jbi.2012.08.004 ]

    http://hal.inria.fr/hal-00742097
  • 13A. Lazaric, M. Ghavamzadeh, R. Munos.

    Analysis of Classification-based Policy Iteration Algorithms, in: Journal of Machine learning Research, 2012, Submitted to.
  • 14A. Lazaric, M. Ghavamzadeh, R. Munos.

    Finite-Sample Analysis of Least-Squares Policy Iteration, in: Journal of Machine Learning Research, 2012, vol. 13, p. 3041-3074.
  • 15A. Lazaric, R. Munos.

    Learning with stochastic inputs and adversarial outputs, in: Journal of Computer and System Sciences (JCSS), 2012, vol. 78, no 5, p. 1516–1537. [ DOI : 10.1016/j.jcss.2011.12.027 ]

    http://www.sciencedirect.com/science/article/pii/S002200001200027X
  • 16O.-A. Maillard, R. Munos.

    Linear Regression with Random Projections, in: Journal of Machine learning Research, 2012, vol. 13, p. 2735-2772.
  • 17R. Munos.

    The Optimistic Principle applied to Games, Optimization and Planning: Towards Foundations of Monte-Carlo Tree Search, in: Foundations and Trends in Machine Learning, 2012, Submitted to.

    http://hal.archives-ouvertes.fr/hal-00747575
  • 18O. Nicol, J. Mary, P. Preux.

    ICML Exploration & Exploitation challenge: Keep it simple!, in: Journal of Machine Learning research Workshop and Conference Proceedings, 2012, vol. 26, p. 62-85.

    http://hal.inria.fr/hal-00747725
  • 19A. Rabaoui, N. Viandier, J. Marais, E. Duflos, P. Vanheeghe.

    Dirichlet Process Mixtures for Density Estimation in Dynamic Nonlinear Modeling: Application to GPS Positioning in Urban Canyons, in: IEEE Transactions on Signal Processing, April 2012, vol. 60, no 4, p. 1638 - 1655. [ DOI : 10.1109/TSP.2011.2180901 ]

    http://hal.inria.fr/hal-00712718
  • 20S. Razavi, E. Duflos, C. Haas, P. Vanheeghe.

    Dislocation detection in field environments: A belief functions contribution, in: Expert Systems with Applications, August 2012, vol. 39, no 10, p. 8505-8513. [ DOI : 10.1016/j.eswa.2011.12.014 ]

    http://hal.inria.fr/hal-00712720
  • 21D. Ryabko.

    Testing composite hypotheses about discrete ergodic processes, in: Test, 2012, vol. 21, no 2, p. 317-329.
  • 22D. Ryabko.

    Uniform hypothesis testing for finite-valued stationary processes, in: Statistics, 2013.
  • 23M. Valko, M. Ghavamzadeh, A. Lazaric.

    Semi-Supervised Apprenticeship Learning, in: Journal of Machine Learning Research: Workshop and Conference Proceedings, November 2012, vol. 24.

    http://hal.inria.fr/hal-00747921

International Conferences with Proceedings

  • 24M. G. Azar, R. Munos, H. Kappen.

    On the Sample Complexity of Reinforcement Learning with a Generative Model, in: International Conference on Machine Learning, 2012.
  • 25L. Busoniu, R. Munos.

    Optimistic planning in Markov decision processes, in: International conference on Artificial Intelligence and Statistics, 2012.
  • 26A. Carpentier, R. Munos.

    Adaptive Stratified Sampling for Monte-Carlo integration of Differentiable functions, in: Advances in Neural Information Processing Systems, 2012.
  • 27A. Carpentier, R. Munos.

    Bandit Theory meets Compressed Senseing for high dimensional Stochastic Linear Bandit, in: International conference on Artificial Intelligence and Statistics, 2012.
  • 28A. Carpentier, R. Munos.

    Minimax number of strata for online Stratified Sampling given Noisy Samples, in: International Conference on Algorithmic Learning Theory, 2012.
  • 29P. Chainais.

    Towards dictionary learning from images with non Gaussian noise, in: IEEE Int. Workshop on Machine Learning for Signal Processing, Santander, Spain, September 2012, 0000 p.

    http://hal.inria.fr/hal-00749035
  • 30R. Coulom.

    CLOP: Confident Local Optimization for Noisy Black-Box Parameter Tuning, in: Advances in Computer Games - 13th International Conference, Tilburg, Pays-Bas, H. J. van den Herik, A. Plaat (editors), Lecture Notes in Computer Science, Springer, 2012, vol. 7168, p. 146-157. [ DOI : 10.1007/978-3-642-31866-5_13 ]

    http://hal.inria.fr/hal-00750326
  • 31G. Dulac-Arnold, L. Denoyer, P. Preux, P. Gallinari.

    Fast Reinforcement Learning with Large Action Sets Using Error-Correcting Output Codes for MDP Factorization, in: European Conference on Machine Learning, Bristol, United Kingdom, Springer, 2012, vol. 2, p. 180-194. [ DOI : 10.1007/978-3-642-33486-3_12 ]

    http://hal.inria.fr/hal-00747729
  • 32J. Fruitet, A. Carpentier, R. Munos, M. Clerc.

    Bandit Algorithms boost motor-task selection for Brain Computer Interfaces, in: Advances in Neural Information Processing Systems, 2012.
  • 33V. Gabillon, M. Ghavamzadeh, A. Lazaric.

    Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence, in: Proceedings of Advances in Neural Information Processing Systems 25, MIT Press, 2012.
  • 34N. Gatti, A. Lazaric, F. Trovò.

    A Truthful Learning Mechanism for Multi-Slot Sponsored Search Auctions with Externalities (Extended Abstract), in: AAMAS, 2012.
  • 35N. Gatti, A. Lazaric, F. Trovò.

    A Truthful Learning Mechanism for Multi-Slot Sponsored Search Auctions with Externalities, in: Proceedings of the 13th ACM Conference on Electronic Commerce (EC'12), 2012.
  • 36M. Geist, B. Scherrer, A. Lazaric, M. Ghavamzadeh.

    A Dantzig Selector Approach to Temporal Difference Learning, in: Proceedings of the Twenty-Ninth International Conference on Machine Learning, 2012, p. 1399-1406.
  • 37M. Ghavamzadeh, A. Lazaric.

    Conservative and Greedy Approaches to Classification-based Policy Iteration, in: Proceedings of the Twenty-Sixth Conference on Artificial Intelligence, 2012, p. 914-920.
  • 38E. Kauffmann, N. Korda, R. Munos.

    Thompson Sampling: an Asymptotically Optimal Finite Time Analysis, in: International Conference on Algorithmic Learning Theory, 2012.
  • 39A. Khaleghi, D. Ryabko.

    Locating Changes in Highly Dependent Data with Unknown Number of Change Points, in: NIPS, Lake Tahoe, USA, 2012.
  • 40A. Khaleghi, D. Ryabko, J. Mary, P. Preux.

    Online Clustering of Processes, in: AISTATS, JMLR W&CP 22, 2012, p. 601-609.
  • 41B. Kveton, M. Valko.

    Learning from a Single Labeled Face and a Stream of Unlabeled Data, in: 10th IEEE International Conference on Automatic Face and Gesture Recognition, Shanghai, China, November 2012.

    http://hal.inria.fr/hal-00749197
  • 42O.-A. Maillard, A. Carpentier.

    Online allocation and homogeneous partitioning for piecewise constant mean approximation, in: Advances in Neural Information Processing Systems, 2012.
  • 43R. Ortner, D. Ryabko, P. Auer, R. Munos.

    Regret Bounds for Restless Markov Bandits, in: Proc. 23th International Conf. on Algorithmic Learning Theory (ALT'12), Lyon, France, LNCS 7568, Springer, Berlin, 2012, p. 214–228.
  • 44R. Ortner, D. Ryabko.

    Online Regret Bounds for Undiscounted Continuous Reinforcement Learning, in: NIPS, Lake Tahoe, USA, 2012.
  • 45D. Ryabko, J. Mary.

    Reducing statistical time-series problems to binary classification, in: NIPS, Lake Tahoe, USA, 2012.
  • 46A. Sani, A. Lazaric, R. Munos.

    Risk-Aversion in Multi-Armed Bandits, in: Advances in Neural Information Processing Systems, 2012.
  • 47B. Scherrer, M. Ghavamzadeh, V. Gabillon, M. Geist.

    Approximate Modified Policy Iteration, in: Proceedings of the Twenty-Ninth International Conference on Machine Learning, 2012, p. 1207-1214.

National Conferences with Proceeding

  • 48G. Dulac-Arnold, L. Denoyer, P. Preux, P. Gallinari.

    Apprentissage par renforcement rapide pour des grands ensembles d'actions en utilisant des codes correcteurs d'erreur, in: Journées Francophones sur la planification, la décision et l'apprentissage pour le contrôle des systèmes - JFPDA 2012, Villers-lès-Nancy, France, O. Buffet (editor), 2012, 12 p p.

    http://hal.inria.fr/hal-00736322
  • 49M. Geist, B. Scherrer, A. Lazaric, M. Ghavamzadeh.

    Un sélecteur de Dantzig pour l'apprentissage par différences temporelles, in: Journées Francophones sur la planification, la décision et l'apprentissage pour le contrôle des systèmes - JFPDA 2012, Villers-lès-Nancy, France, O. Buffet (editor), 2012, 13 p p.

    http://hal.inria.fr/hal-00736229
  • 50N. Jaoua, E. Duflos, P. Vanheeghe.

    DPM pour l'inférence dans les modèles dynamiques non linéaires avec des bruits de mesure alpha-stable, in: 44ème Journées de Statistique, Bruxelles, Belgium, May 2012, p. 1-4.

    http://hal.inria.fr/hal-00713857
  • 51B. Scherrer, V. Gabillon, M. Ghavamzadeh, M. Geist.

    Approximations de l'Algorithme Itérations sur les Politiques Modifié, in: Journées Francophones sur la planification, la décision et l'apprentissage pour le contrôle des systèmes - JFPDA 2012, Villers-lès-Nancy, France, O. Buffet (editor), 2012, 1 p p, Le corps de cet article est paru, en langue anglaise, dans ICML'2012 (Proceedings of the International Conference on Machine Learning).

    http://hal.inria.fr/hal-00736226

Conferences without Proceedings

  • 52C. Dhanjal, R. Gaudel, S. Clémençon.

    Incremental Spectral Clustering with the Normalised Laplacian, in: DISCML - 3rd NIPS Workshop on Discrete Optimization in Machine Learning - 2011, Sierra Nevada, Espagne, 2012.

    http://hal.inria.fr/hal-00745666
  • 53A. Farahmand, D. Precup, M. Ghavamzadeh.

    On Classification-based Approximate Policy Iteration, in: Thirtieth International Conference on Machine Learning, 2012, submitted.
  • 54D. Ryabko.

    Asymptotic statistics of stationary ergodic time series, in: WITMSE, Amsterdam, 2012.

Scientific Books (or Scientific Book chapters)

  • 55L. Busoniu, A. Lazaric, M. Ghavamzadeh, R. Munos, R. Babuska, B. De Schutter.

    Least-Squares Methods for Policy Iteration, in: Reinforcement Learning: State of the Art, M. Wiering, M. van Otterlo (editors), Springer Verlag, 2012, p. 75-110.
  • 56A. Lazaric.

    Transfer in Reinforcement Learning: a Framework and a Survey, in: Reinforcement Learning: State of the Art, M. Wiering, M. van Otterlo (editors), Springer, 2012.
  • 57N. Vlassis, M. Ghavamzadeh, S. Mannor, P. Poupart.

    Bayesian Reinforcement Learning, in: Reinforcement Learning: State of the Art, M. Wiering, M. van Otterlo (editors), Springer Verlag, 2012, p. 359-386.

Internal Reports

  • 58V. Gabillon, M. Ghavamzadeh, A. Lazaric.

    Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence, Inria, 2012, no inria-00747005.
  • 59H. Kadri, M. Ghavamzadeh, P. Preux.

    A Generalized Kernel Approach to Structured Output Learning, Inria, May 2012, no RR-7956.

    http://hal.inria.fr/hal-00695631
  • 60H. Kadri, A. Rakotomamonjy, F. Bach, P. Preux.

    Multiple Operator-valued Kernel Learning, Inria, March 2012, no RR-7900.

    http://hal.inria.fr/hal-00677012
  • 61B. Pires, M. Ghavamzadeh, Cs. Szepesváari.

    Risk Bounds in Cost-sensitive Multiclass Classification: an Application to Reinforcement Learning, Inria, 2012, in preparation.
  • 62B. Scherrer, V. Gabillon, M. Ghavamzadeh, M. Geist.

    Approximate Modified Policy Iteration, Inria, May 2012.

    http://hal.inria.fr/hal-00697169
References in notes
  • 63P. Auer, N. Cesa-Bianchi, P. Fischer.

    Finite-time analysis of the multi-armed bandit problem, in: Machine Learning, 2002, vol. 47, no 2/3, p. 235–256.
  • 64R. Bellman.

    Dynamic Programming, Princeton University Press, 1957.
  • 65D. Bertsekas, S. Shreve.

    Stochastic Optimal Control (The Discrete Time Case), Academic Press, New York, 1978.
  • 66D. Bertsekas, J. Tsitsiklis.

    Neuro-Dynamic Programming, Athena Scientific, 1996.
  • 67T. Ferguson.

    A Bayesian Analysis of Some Nonparametric Problems, in: The Annals of Statistics, 1973, vol. 1, no 2, p. 209–230.
  • 68T. Hastie, R. Tibshirani, J. Friedman.

    The elements of statistical learning — Data Mining, Inference, and Prediction, Springer, 2001.
  • 69W. Powell.

    Approximate Dynamic Programming, Wiley, 2007.
  • 70M. Puterman.

    Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley and Sons, 1994.
  • 71H. Robbins.

    Some aspects of the sequential design of experiments, in: Bull. Amer. Math. Soc., 1952, vol. 55, p. 527–535.
  • 72J. Rust.

    How Social Security and Medicare Affect Retirement Behavior in a World of Incomplete Market, in: Econometrica, July 1997, vol. 65, no 4, p. 781–831.

    http://gemini.econ.umd.edu/jrust/research/rustphelan.pdf
  • 73J. Rust.

    On the Optimal Lifetime of Nuclear Power Plants, in: Journal of Business & Economic Statistics, 1997, vol. 15, no 2, p. 195–208.
  • 74R. Sutton, A. Barto.

    Reinforcement learning: an introduction, MIT Press, 1998.
  • 75G. Tesauro.

    Temporal Difference Learning and TD-Gammon, in: Communications of the ACM, March 1995, vol. 38, no 3.

    http://www.research.ibm.com/massive/tdl.html
  • 76P. Werbos.

    ADP: Goals, Opportunities and Principles, IEEE Press, 2004, p. 3–44, Handbook of learning and approximate dynamic programming.