Personnel
Overall Objectives
Research Program
Application Domains
Highlights of the Year
New Software and Platforms
New Results
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Bibliography

Major publications by the team in recent years
  • 1O. Cappé, A. Garivier, O.-A. Maillard, R. Munos, G. Stoltz.
    Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation, in: Annals of Statistics, 2013, vol. 41, no 3, pp. 1516-1541, Accepted, to appear in Annals of Statistics.
    https://hal.archives-ouvertes.fr/hal-00738209
  • 2A. Carpentier, M. Valko.
    Revealing graph bandits for maximizing local influence, in: International Conference on Artificial Intelligence and Statistics, Seville, Spain, May 2016.
    https://hal.inria.fr/hal-01304020
  • 3H. De Vries, F. Strub, J. Mary, H. Larochelle, O. Pietquin, A. Courville.
    Modulating early visual processing by language, in: Conference on Neural Information Processing Systems, Long Beach, United States, December 2017.
    https://hal.inria.fr/hal-01648683
  • 4N. Gatti, A. Lazaric, M. Rocco, F. Trovò.
    Truthful Learning Mechanisms for Multi–Slot Sponsored Search Auctions with Externalities, in: Artificial Intelligence, October 2015, vol. 227, pp. 93-139.
    https://hal.inria.fr/hal-01237670
  • 5M. Ghavamzadeh, Y. Engel, M. Valko.
    Bayesian Policy Gradient and Actor-Critic Algorithms, in: Journal of Machine Learning Research, January 2016, vol. 17, no 66, pp. 1-53.
    https://hal.inria.fr/hal-00776608
  • 6H. Kadri, E. Duflos, P. Preux, S. Canu, A. Rakotomamonjy, J. Audiffren.
    Operator-valued Kernels for Learning from Functional Response Data, in: Journal of Machine Learning Research (JMLR), 2016.
    https://hal.archives-ouvertes.fr/hal-01221329
  • 7E. Kaufmann, O. Cappé, A. Garivier.
    On the Complexity of Best Arm Identification in Multi-Armed Bandit Models, in: Journal of Machine Learning Research, January 2016, vol. 17, pp. 1-42.
    https://hal.archives-ouvertes.fr/hal-01024894
  • 8A. Lazaric, M. Ghavamzadeh, R. Munos.
    Analysis of Classification-based Policy Iteration Algorithms, in: Journal of Machine Learning Research, 2016, vol. 17, pp. 1 - 30.
    https://hal.inria.fr/hal-01401513
  • 9R. Munos.
    From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning, in: Foundations and Trends® in Machine Learning, 2014, vol. 7, no 1, pp. 1-129.
    http://dx.doi.org/10.1561/2200000038
  • 10R. Ortner, D. Ryabko, P. Auer, R. Munos.
    Regret bounds for restless Markov bandits, in: Journal of Theoretical Computer Science (TCS), 2014, vol. 558, pp. 62-76. [ DOI : 10.1016/j.tcs.2014.09.026 ]
    https://hal.inria.fr/hal-01074077
Publications of the year

Doctoral Dissertations and Habilitation Theses

  • 11M. Abeille.
    Exploration-Exploitation with Thompson Sampling in Linear Systems, Université de Lille, December 2017.
  • 12D. Calandriello.
    Efficient Sequential Learning in Structured and Constrained Environments, Université de Lille, December 2017.
  • 13P. Gajane.
    Multi-armed bandits with unconventional feedback, Université de Lille, November 2017.
  • 14J. Pérolat.
    Reinforcement learning: the multiplayer case, Université de Lille, December 2017.

Articles in International Peer-Reviewed Journals

International Conferences with Proceedings

  • 21M. Abeille, A. Lazaric.
    Linear Thompson Sampling Revisited, in: AISTATS 2017 - 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, United States, April 2017.
    https://hal.inria.fr/hal-01493561
  • 22M. Abeille, A. Lazaric.
    Thompson Sampling for Linear-Quadratic Control Problems, in: AISTATS 2017 - 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, United States, April 2017.
    https://hal.inria.fr/hal-01493564
  • 23B. Balle, O.-A. Maillard.
    Spectral Learning from a Single Trajectory under Finite-State Policies, in: International conference on Machine Learning, Sidney, France, Proceedings of the International conference on Machine Learning, July 2017.
    https://hal.archives-ouvertes.fr/hal-01590940
  • 24S. Brodeur, E. Perez, A. Anand, F. Golemo, L. Celotti, F. Strub, J. Rouat, H. Larochelle, A. Courville.
    HoME: a Household Multimodal Environment, in: NIPS 2017's Visually-Grounded Interaction and Language Workshop, Long Beach, United States, December 2017, https://arxiv.org/abs/1711.11017.
    https://hal.inria.fr/hal-01653037
  • 25A. Bérard, O. Pietquin, L. Besacier.
    LIG-CRIStAL System for the WMT17 Automatic Post-Editing Task, in: Second conference on machine translation (WMT17) during EMNLP 2017, Copenhague, Denmark, September 2017.
    https://hal.archives-ouvertes.fr/hal-01580881
  • 26D. Calandriello, A. Lazaric, M. Valko.
    Distributed adaptive sampling for kernel matrix approximation, in: International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, United States, 2017.
    https://hal.inria.fr/hal-01482760
  • 27D. Calandriello, A. Lazaric, M. Valko.
    Efficient second-order online kernel learning with adaptive embedding, in: NIPS 2017 : The Thirty-first Annual Conference on Neural Information Processing Systems, Long Beach, United States, December 2017, pp. 1-17.
    https://hal.inria.fr/hal-01643961
  • 28D. Calandriello, A. Lazaric, M. Valko.
    Second-Order Kernel Online Convex Optimization with Adaptive Sketching, in: International Conference on Machine Learning, Sydney, Australia, 2017.
    https://hal.inria.fr/hal-01537799
  • 29H. De Vries, F. Strub, S. Chandar, O. Pietquin, H. Larochelle, A. Courville.
    GuessWhat?! Visual object discovery through multi-modal dialogue, in: Conference on Computer Vision and Pattern Recognition, Honolulu, United States, July 2017, https://arxiv.org/abs/1611.08481.
    https://hal.inria.fr/hal-01549641
  • 30H. De Vries, F. Strub, J. Mary, H. Larochelle, O. Pietquin, A. Courville.
    Modulating early visual processing by language, in: NIPS 2017 - Conference on Neural Information Processing Systems, Long Beach, United States, December 2017, pp. 1-14, https://arxiv.org/abs/1707.00683.
    https://hal.inria.fr/hal-01648683
  • 31A. Erraqabi, A. Lazaric, M. Valko, E. Brunskill, Y.-E. Liu.
    Trading off rewards and errors in multi-armed bandits, in: International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, United States, 2017.
    https://hal.inria.fr/hal-01482765
  • 32C. Z. Felício, K. V. R. Paixão, C. A. Z. Barcelos, P. Preux.
    A Multi-Armed Bandit Model Selection for Cold-Start User Recommendation, in: 25th ACM Conference on User Modelling, Adaptation and Personalization (UMAP), Bratislava, Slovakia, July 2017.
    https://hal.inria.fr/hal-01517967
  • 33R. Fruit, A. Lazaric.
    Exploration–Exploitation in MDPs with Options, in: AISTATS 2017 - 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, United States, April 2017.
    https://hal.inria.fr/hal-01493567
  • 34R. Fruit, M. Pirotta, A. Lazaric, E. Brunskill.
    Regret Minimization in MDPs with Options without Prior Knowledge, in: NIPS 2017 - Neural Information Processing Systems, Long Beach, United States, December 2017, pp. 1-36.
    https://hal.inria.fr/hal-01649082
  • 35G. Gautier, R. Bardenet, M. Valko.
    Zonotope hit-and-run for efficient sampling from projection DPPs, in: International Conference on Machine Learning, Sydney, Australia, 2017.
    https://hal.inria.fr/hal-01526577
  • 36M. Geist, B. Piot, O. Pietquin.
    Is the Bellman residual a bad proxy?, in: NIPS 2017 - Advances in Neural Information Processing Systems, Long Beach, United States, December 2017, pp. 1-13.
    https://hal.archives-ouvertes.fr/hal-01629739
  • 37E. Kaufmann, W. M. Koolen.
    Monte-Carlo Tree Search by Best Arm Identification, in: NIPS 2017 - 31st Annual Conference on Neural Information Processing Systems, Long Beach, United States, Advances in Neural Information Processing Systems, December 2017, pp. 1-23, https://arxiv.org/abs/1706.02986.
    https://hal.archives-ouvertes.fr/hal-01535907
  • 38R. Laroche, M. Barlier.
    Transfer Reinforcement Learning with Shared Dynamics, in: AAAI-17 - Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, United States, February 2017, 7 p.
    https://hal.archives-ouvertes.fr/hal-01548649
  • 39O.-A. Maillard.
    Boundary Crossing for General Exponential Families, in: Algorithmic Learning Theory, Kyoto, Japan, Proceedings of Algorithmic Learning Theory, October 2017, vol. 1, pp. 1 - 34.
    https://hal.archives-ouvertes.fr/hal-01615427
  • 40A. M. Metelli, M. Pirotta, M. Restelli.
    Compatible Reward Inverse Reinforcement Learning, in: The Thirty-first Annual Conference on Neural Information Processing Systems - NIPS 2017, Long Beach, United States, December 2017.
    https://hal.inria.fr/hal-01653328
  • 41J. Mourtada, O.-A. Maillard.
    Efficient tracking of a growing number of experts, in: Algorithmic Learning Theory, Tokyo, Japan, Proceedings of Algorithmic Learning Theory, October 2017, vol. 76, pp. 1 - 23.
    https://hal.archives-ouvertes.fr/hal-01615424
  • 42M. Papini, M. Pirotta, M. Restelli.
    Adaptive Batch Size for Safe Policy Gradients, in: The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS), Long Beach, United States, December 2017.
    https://hal.inria.fr/hal-01653330
  • 43G. Papoudakis, P. Preux, M. Monperrus.
    A generative model for sparse, evolving digraphs, in: 6th International Conference on Complex Networks and their applications, Lyon, France, November 2017, https://arxiv.org/abs/1710.06298. [ DOI : 10.1007/978-3-319-72150-7_43 ]
    https://hal.inria.fr/hal-01617851
  • 44E. Perez, H. De Vries, F. Strub, V. Dumoulin, A. Courville.
    Learning Visual Reasoning Without Strong Priors, in: ICML 2017's Machine Learning in Speech and Language Processing Workshop, Sidney, France, August 2017, https://arxiv.org/abs/1709.07871.
    https://hal.inria.fr/hal-01648684
  • 45E. Perez, F. Strub, H. De Vries, V. Dumoulin, A. Courville.
    FiLM: Visual Reasoning with a General Conditioning Layer, in: AAAI Conference on Artificial Intelligence, New Orleans, United States, February 2018, https://arxiv.org/abs/1707.03017.
    https://hal.inria.fr/hal-01648685
  • 46J. Pérolat, F. Strub, B. Piot, O. Pietquin.
    Learning Nash Equilibrium for General-Sum Markov Games from Batch Data, in: AISTATS 2017 - The 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, United States, April 2017, pp. 1-14.
    https://hal.inria.fr/hal-01648489
  • 47C. Riquelme, M. Ghavamzadeh, A. Lazaric.
    Active Learning for Accurate Estimation of Linear Models, in: ICML 2017 - 34th International Conference on Machine Learning, Sydney, Australia, August 2017, 36 p.
    https://hal.inria.fr/hal-01538762
  • 48D. Ryabko.
    Hypotheses testing on infinite random graphs, in: ALT 2017 - 28th International Conference on Algorithmic Learning Theory, kyoto, Japan, October 2017, pp. 1-12, https://arxiv.org/abs/1708.03131.
    https://hal.inria.fr/hal-01627330
  • 49D. Ryabko.
    Independence clustering (without a matrix), in: NIPS 2017 - Thirty-first Annual Conference on Neural Information Processing Systems, Long Beach, United States, December 2017, pp. 1-14, https://arxiv.org/abs/1703.06700.
    https://hal.inria.fr/hal-01627333
  • 50D. Ryabko.
    Universality of Bayesian mixture predictors, in: ALT 2017 - 28th International Conference on Algorithmic Learning Theory, Kyoto, Japan, October 2017, pp. 1-13, https://arxiv.org/abs/1610.08249.
    https://hal.inria.fr/hal-01627332
  • 51F. Strub, H. De Vries, J. Mary, B. Piot, A. Courville, O. Pietquin.
    End-to-end optimization of goal-driven and visually grounded dialogue systems Harm de Vries, in: International Joint Conference on Artificial Intelligence, Melbourne, Australia, August 2017, https://arxiv.org/abs/1703.05423.
    https://hal.inria.fr/hal-01549642
  • 52S. Tosatto, M. Pirotta, C. D'Eramo, M. Restelli.
    Boosted Fitted Q-Iteration, in: 34th International Conference on Machine Learning (ICML), Sydney, Australia, August 2017.
    https://hal.inria.fr/hal-01653332
  • 53N. Tziortziotis, C. Dimitrakakis.
    Bayesian Inference for Least Squares Temporal Difference Regularization, in: ECML 2017 - European Conference on Machine Learning, Skopje, Macedonia, 2017-09-22, September 2017.
    https://hal.inria.fr/hal-01593212
  • 54Z. Wen, B. Kveton, M. Valko, S. Vaswani.
    Online influence maximization under independent cascade model with semi-bandit feedback, in: NIPS 2017 - Neural Information Processing Systems, Long Beach, United States, December 2017, pp. 1-24.
    https://hal.inria.fr/hal-01643976
  • 55M. Zanon Boito, A. Bérard, A. Villavicencio, L. Besacier.
    Unwritten Languages Demand Attention Too! Word Discovery with Encoder-Decoder Models, in: IEEE Automatic Speech Recognition and Understanding (ASRU), Okinawa, Japan, December 2017.
    https://hal.archives-ouvertes.fr/hal-01592091

National Conferences with Proceedings

  • 56M. Geist, B. Piot, O. Pietquin.
    Faut-il minimiser le résidu de Bellman ou maximiser la valeur moyenne ?, in: Journées Francophones sur la Planification, la Décision et l'Apprentissage pour la conduite de systèmes (JFPDA 2017), Caen, France, Actes des Journées Francophones sur la Planification, la Décision et l'Apprentissage pour la conduite de systèmes (JFPDA 2017), July 2017.
    https://hal.archives-ouvertes.fr/hal-01576347

Conferences without Proceedings

  • 57R. Bonnefoi, L. Besson, C. Moy, E. Kaufmann, J. Palicot.
    Multi-Armed Bandit Learning in IoT Networks: Learning helps even in non-stationary settings, in: CROWNCOM 2017 - 12th EAI International Conference on Cognitive Radio Oriented Wireless Networks, Lisbon, Portugal, September 2017.
    https://hal.archives-ouvertes.fr/hal-01575419
  • 58N. Carrara, R. Laroche, O. Pietquin.
    Online learning and transfer for user adaptation in dialogue systems, in: SIGDIAL/SEMDIAL joint special session on negotiation dialog 2017, Saarbrücken, Germany, August 2017.
    https://hal.archives-ouvertes.fr/hal-01557775

Other Publications

References in notes
  • 64R. Allesiardo, R. Féraud, O.-A. Maillard.
    The Non-stationary Stochastic Multi-armed Bandit Problem, in: International Journal of Data Science and Analytics, 2017, vol. 3, no 4, pp. 267–283. [ DOI : 10.1007/s41060-017-0050-5 ]
    https://hal.archives-ouvertes.fr/hal-01575000
  • 65P. Auer, N. Cesa-Bianchi, P. Fischer.
    Finite-time analysis of the multi-armed bandit problem, in: Machine Learning, 2002, vol. 47, no 2/3, pp. 235–256.
  • 66R. Bellman.
    Dynamic Programming, Princeton University Press, 1957.
  • 67D. Bertsekas, S. Shreve.
    Stochastic Optimal Control (The Discrete Time Case), Academic Press, New York, 1978.
  • 68D. Bertsekas, J. Tsitsiklis.
    Neuro-Dynamic Programming, Athena Scientific, 1996.
  • 69M. Puterman.
    Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley and Sons, 1994.
  • 70H. Robbins.
    Some aspects of the sequential design of experiments, in: Bull. Amer. Math. Soc., 1952, vol. 55, pp. 527–535.
  • 71R. Sutton, A. Barto.
    Reinforcement learning: an introduction, MIT Press, 1998.
  • 72P. Werbos.
    ADP: Goals, Opportunities and Principles, IEEE Press, 2004, pp. 3–44, Handbook of learning and approximate dynamic programming.