EN FR
EN FR


Bibliography

Major publications by the team in recent years
  • 1O. Cappé, A. Garivier, O.-A. Maillard, R. Munos, G. Stoltz.

    Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation, in: Annals of Statistics, 2013, vol. 41, no 3, pp. 1516-1541, Accepted, to appear in Annals of Statistics.

    https://hal.archives-ouvertes.fr/hal-00738209
  • 2A. Carpentier, M. Valko.

    Revealing graph bandits for maximizing local influence, in: International Conference on Artificial Intelligence and Statistics, Seville, Spain, May 2016.

    https://hal.inria.fr/hal-01304020
  • 3N. Gatti, A. Lazaric, M. Rocco, F. Trovò.

    Truthful Learning Mechanisms for Multi–Slot Sponsored Search Auctions with Externalities, in: Artificial Intelligence, October 2015, vol. 227, pp. 93-139.

    https://hal.inria.fr/hal-01237670
  • 4M. Ghavamzadeh, Y. Engel, M. Valko.

    Bayesian Policy Gradient and Actor-Critic Algorithms, in: Journal of Machine Learning Research, January 2016, vol. 17, no 66, pp. 1-53.

    https://hal.inria.fr/hal-00776608
  • 5H. Kadri, E. Duflos, P. Preux, S. Canu, A. Rakotomamonjy, J. Audiffren.

    Operator-valued Kernels for Learning from Functional Response Data, in: Journal of Machine Learning Research (JMLR), 2016.

    https://hal.archives-ouvertes.fr/hal-01221329
  • 6E. Kaufmann, O. Cappé, A. Garivier.

    On the Complexity of Best Arm Identification in Multi-Armed Bandit Models, in: Journal of Machine Learning Research, January 2016, vol. 17, pp. 1-42.

    https://hal.archives-ouvertes.fr/hal-01024894
  • 7A. Lazaric, M. Ghavamzadeh, R. Munos.

    Analysis of Classification-based Policy Iteration Algorithms, in: Journal of Machine Learning Research, 2016, vol. 17, pp. 1 - 30.

    https://hal.inria.fr/hal-01401513
  • 8R. Munos.

    From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning, 2014, 130 pages.

    https://hal.archives-ouvertes.fr/hal-00747575
  • 9R. Ortner, D. Ryabko, P. Auer, R. Munos.

    Regret bounds for restless Markov bandits, in: Journal of Theoretical Computer Science (TCS), 2014, vol. 558, pp. 62-76. [ DOI : 10.1016/j.tcs.2014.09.026 ]

    https://hal.inria.fr/hal-01074077
  • 10D. Ryabko, J. Mary.

    A Binary-Classification-Based Metric between Time-Series Distributions and Its Use in Statistical and Learning Problems, in: Journal of Machine Learning Research, 2013, vol. 14, pp. 2837-2856.

    https://hal.inria.fr/hal-00913240
Publications of the year

Doctoral Dissertations and Habilitation Theses

Articles in International Peer-Reviewed Journals

  • 15M. Ghavamzadeh, Y. Engel, M. Valko.

    Bayesian Policy Gradient and Actor-Critic Algorithms, in: Journal of Machine Learning Research, January 2016, vol. 17, no 66, pp. 1-53.

    https://hal.inria.fr/hal-00776608
  • 16H. Kadri, E. Duflos, P. Preux, S. Canu, A. Rakotomamonjy, J. Audiffren.

    Operator-valued Kernels for Learning from Functional Response Data, in: Journal of Machine Learning Research (JMLR), 2016.

    https://hal.archives-ouvertes.fr/hal-01221329
  • 17E. Kaufmann, O. Cappé, A. Garivier.

    On the Complexity of Best Arm Identification in Multi-Armed Bandit Models, in: Journal of Machine Learning Research, January 2016, vol. 17, pp. 1-42.

    https://hal.archives-ouvertes.fr/hal-01024894
  • 18A. Khaleghi, D. Ryabko.

    Nonparametric multiple change point estimation in highly dependent time series, in: Theoretical Computer Science, 2016, vol. 620, pp. 119-133. [ DOI : 10.1016/j.tcs.2015.10.041 ]

    https://hal.inria.fr/hal-01235330
  • 19A. Khaleghi, D. Ryabko, J. Mary, P. Preux.

    Consistent Algorithms for Clustering Time Series, in: Journal of Machine Learning Research, 2016, vol. 17, no 3, pp. 1 - 32.

    https://hal.inria.fr/hal-01399613
  • 20A. Lazaric, M. Ghavamzadeh, R. Munos.

    Analysis of Classification-based Policy Iteration Algorithms, in: Journal of Machine Learning Research, 2016, vol. 17, pp. 1 - 30.

    https://hal.inria.fr/hal-01401513
  • 21V. Musco, M. Monperrus, P. Preux.

    A Large-scale Study of Call Graph-based Impact Prediction using Mutation Testing, in: Software Quality Journal, 2016. [ DOI : 10.1007/s11219-016-9332-8 ]

    https://hal.inria.fr/hal-01346046
  • 22G. Neu, B. Gábor.

    Importance Weighting Without Importance Weights: An Efficient Algorithm for Combinatorial Semi-Bandits, in: Journal of Machine Learning Research, August 2016, vol. 17, no 154, pp. 1 - 21.

    https://hal.archives-ouvertes.fr/hal-01380278

International Conferences with Proceedings

  • 23K. Azizzadenesheli, A. Lazaric, A. Anandkumar.

    Reinforcement Learning of POMDPs using Spectral Methods, in: Proceedings of the 29th Annual Conference on Learning Theory (COLT2016), New York City, United States, June 2016.

    https://hal.inria.fr/hal-01322207
  • 24M. Barlier, R. Laroche, O. Pietquin.

    A Stochastic Model for Computer-Aided Human-Human Dialogue, in: Interspeech 2016, San Francisco, United States, September 2016, vol. 2016, pp. 2051 - 2055.

    https://hal.inria.fr/hal-01406894
  • 25M. Barlier, R. Laroche, O. Pietquin.

    Learning Dialogue Dynamics with the Method of Moments, in: Workshop on Spoken Language Technologie (SLT 2016), San Diego, United States, December 2016.

    https://hal.inria.fr/hal-01406904
  • 26D. Calandriello, A. Lazaric, M. Valko.

    Analysis of Nyström method with sequential ridge leverage score sampling, in: Uncertainty in Artificial Intelligence, New York City, United States, June 2016.

    https://hal.inria.fr/hal-01343674
  • 27A. Carpentier, M. Valko.

    Revealing graph bandits for maximizing local influence, in: International Conference on Artificial Intelligence and Statistics, Seville, Spain, May 2016.

    https://hal.inria.fr/hal-01304020
  • 28L. El Asri, R. Laroche, O. Pietquin.

    Compact and Interpretable Dialogue State Representation with Genetic Sparse Distributed Memory, in: 7th International Workshop on Spoken Dialogue Systems (IWSDS 2016), Saariselka, Finland, January 2016.

    https://hal.inria.fr/hal-01406873
  • 29L. El Asri, B. Piot, M. Geist, R. Laroche, O. Pietquin.

    Score-based Inverse Reinforcement Learning, in: International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2016), Singapore, Singapore, May 2016.

    https://hal.inria.fr/hal-01406886
  • 30A. Erraqabi, M. Valko, A. Carpentier, O.-A. Maillard.

    Pliable rejection sampling, in: International Conference on Machine Learning, New York City, United States, June 2016.

    https://hal.inria.fr/hal-01322168
  • 31C. Z. Felício, K. V. R. Paixão, C. A. Z. Barcelos, P. Preux.

    Preference-like Score to Cope with Cold-Start User in Recommender Systems, in: 28th International Conference on Tools with Artificial Intelligence (ICTAI), San Jose, United States, Proceedings of the IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI), November 2016.

    https://hal.inria.fr/hal-01390762
  • 32V. Gabillon, A. Lazaric, M. Ghavamzadeh, R. Ortner, P. Bartlett.

    Improved Learning Complexity in Combinatorial Pure Exploration Bandits, in: Proceedings of the 19th International Conference on Artificial Intelligence (AISTATS), Cadiz, Spain, May 2016.

    https://hal.inria.fr/hal-01322198
  • 33A. Garivier, E. Kaufmann.

    Optimal Best Arm Identification with Fixed Confidence, in: 29th Annual Conference on Learning Theory (COLT), New York, United States, JMLR Workshop and Conference Proceedings, June 2016, vol. 49.

    https://hal.archives-ouvertes.fr/hal-01273838
  • 34A. Garivier, E. Kaufmann, W. M. Koolen.

    Maximin Action Identification: A New Bandit Framework for Games, in: 29th Annual Conference on Learning Theory (COLT), New-York, United States, JMLR Workshop and Conference Proceedings, June 2016, vol. 49.

    https://hal.archives-ouvertes.fr/hal-01273842
  • 35A. Garivier, E. Kaufmann, T. Lattimore.

    On Explore-Then-Commit Strategies, in: NIPS, Barcelona, Spain, Advances in Neural Information Processing Systems (NIPS), December 2016, vol. 29.

    https://hal.archives-ouvertes.fr/hal-01322906
  • 36H. Glaude, O. Pietquin.

    PAC learning of Probabilistic Automaton based on the Method of Moments, in: International Conference on Machine Learning (ICML 2016), New York, United States, June 2016.

    https://hal.inria.fr/hal-01406889
  • 37J.-B. Grill, M. Valko, R. Munos.

    Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning, in: NIPS 2016 - Thirtieth Annual Conference on Neural Information Processing Systems, Barcelona, Spain, December 2016.

    https://hal.inria.fr/hal-01389107
  • 38F. Guillou, R. Gaudel, P. Preux.

    Large-scale Bandit Recommender System, in: the 2nd International Workshop on Machine Learning, Optimization and Big Data (MOD'16), Volterra, Italy, August 2016.

    https://hal.inria.fr/hal-01406389
  • 39F. Guillou, R. Gaudel, P. Preux.

    Scalable explore-exploit Collaborative Filtering, in: Pacific Asia Conference on Information Systems (PACIS'16), Chiayi, Taiwan, 2016.

    https://hal.inria.fr/hal-01406418
  • 40F. Guillou, R. Gaudel, P. Preux.

    Sequential Collaborative Ranking Using (No-)Click Implicit Feedback, in: The 23rd International Conference on Neural Information Processing (ICONIP'16), Kyoto, Japan, Lecture Notes in Computer Science, October 2016, vol. 9948, pp. 288 - 296. [ DOI : 10.1007/978-3-319-46672-9_33 ]

    https://hal.inria.fr/hal-01406338
  • 41E. Kaufmann, T. Bonald, M. Lelarge.

    A Spectral Algorithm with Additive Clustering for the Recovery of Overlapping Communities in Networks, in: ALT 2016 - Algorithmic Learning Theory, Bari, Italy, R. Ortner, H. U. Simon, S. Zilles (editors), Lecture Notes in Computer Science, Springer, October 2016, vol. 9925, pp. 355-370. [ DOI : 10.1007/978-3-319-46379-7_24 ]

    https://hal.archives-ouvertes.fr/hal-01163147
  • 42T. Kocák, G. Neu, M. Valko.

    Online learning with Erdős-Rényi side-observation graphs, in: Uncertainty in Artificial Intelligence, New York City, United States, June 2016.

    https://hal.inria.fr/hal-01320588
  • 43T. Kocák, G. Neu, M. Valko.

    Online learning with noisy side observations, in: International Conference on Artificial Intelligence and Statistics, Seville, Spain, May 2016.

    https://hal.inria.fr/hal-01303377
  • 44V. Musco, A. Carette, M. Monperrus, P. Preux.

    A Learning Algorithm for Change Impact Prediction, in: 5th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering, Austin, United States, May 2016.

    https://hal.inria.fr/hal-01279620
  • 45V. Musco, M. Monperrus, P. Preux.

    Mutation-Based Graph Inference for Fault Localization, in: International Working Conference on Source Code Analysis and Manipulation, Raleigh, United States, October 2016.

    https://hal.inria.fr/hal-01350515
  • 46J. Pérolat, B. Piot, M. Geist, B. Scherrer, O. Pietquin.

    Softened Approximate Policy Iteration for Markov Games, in: ICML 2016 - 33rd International Conference on Machine Learning, New York City, United States, June 2016.

    https://hal.inria.fr/hal-01393328
  • 47J. Pérolat, B. Piot, B. Scherrer, O. Pietquin.

    On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games, in: 19th International Conference on Artificial Intelligence and Statistics (AISTATS 2016), Cadiz, Spain, Proceedings of the International Conference on Artificial Intelligences and Statistics, May 2016.

    https://hal.inria.fr/hal-01291495
  • 48D. Ryabko.

    Things Bayes can't do, in: Proceedings of the 27th International Conference on Algorithmic Learning Theory (ALT'16), Bari, Italy, October 2016, vol. LNCS, no 9925, pp. 253-260. [ DOI : 10.1007/978-3-319-46379-7_17 ]

    https://hal.inria.fr/hal-01380063
  • 49F. Strub, R. Gaudel, J. Mary.

    Hybrid Recommender System based on Autoencoders, in: the 1st Workshop on Deep Learning for Recommender Systems, Boston, United States, September 2016, pp. 11 - 16. [ DOI : 10.1145/2988450.2988456 ]

    https://hal.inria.fr/hal-01336912
  • 50A. C. Y. Tossou, C. Dimitrakakis.

    Algorithms for Differentially Private Multi-Armed Bandits, in: AAAI 2016, Phoenix, Arizona, United States, February 2016.

    https://hal.inria.fr/hal-01234427
  • 51Z. Zhang, B. Rubinstein, C. Dimitrakakis.

    On the Differential Privacy of Bayesian Inference, in: AAAI 2016, Phoenix, Arizona, United States, February 2016.

    https://hal.inria.fr/hal-01234215

Conferences without Proceedings

  • 52A. Bérard, C. Servan, O. Pietquin, L. Besacier.

    MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP, in: The 10th edition of the Language Resources and Evaluation Conference (LREC), Portoroz, Slovenia, May 2016.

    https://hal.archives-ouvertes.fr/hal-01335930
  • 53F. Guillou, R. Gaudel, P. Preux.

    Compromis exploration-exploitation pour système de recommandation à grande échelle, in: Conférence francophone sur l'Apprentissage Automatique (CAp'16), Marseille, France, July 2016.

    https://hal.inria.fr/hal-01406439
  • 54F. Strub, J. Mary, R. Gaudel.

    Filtrage Collaboratif Hybride avec des Auto-encodeurs, in: Conférence francophone sur l'Apprentissage Automatique (CAp'16), Marseille, France, July 2016.

    https://hal.inria.fr/hal-01406432

Internal Reports

Other Publications

References in notes
  • 61P. Auer, N. Cesa-Bianchi, P. Fischer.

    Finite-time analysis of the multi-armed bandit problem, in: Machine Learning, 2002, vol. 47, no 2/3, pp. 235–256.
  • 62R. Bellman.

    Dynamic Programming, Princeton University Press, 1957.
  • 63D. Bertsekas, S. Shreve.

    Stochastic Optimal Control (The Discrete Time Case), Academic Press, New York, 1978.
  • 64D. Bertsekas, J. Tsitsiklis.

    Neuro-Dynamic Programming, Athena Scientific, 1996.
  • 65M. Puterman.

    Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley and Sons, 1994.
  • 66H. Robbins.

    Some aspects of the sequential design of experiments, in: Bull. Amer. Math. Soc., 1952, vol. 55, pp. 527–535.
  • 67R. Sutton, A. Barto.

    Reinforcement learning: an introduction, MIT Press, 1998.
  • 68P. Werbos.

    ADP: Goals, Opportunities and Principles, IEEE Press, 2004, pp. 3–44, Handbook of learning and approximate dynamic programming.