EN FR
EN FR


Bibliography

Major publications by the team in recent years
  • 1O. Cappé, A. Garivier, O.-A. Maillard, R. Munos, G. Stoltz.

    Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation, in: Annals of Statistics, 2013, vol. 41, no 3, pp. 1516-1541, Accepted, to appear in Annals of Statistics.

    https://hal.archives-ouvertes.fr/hal-00738209
  • 2A. Carpentier, M. Valko.

    Revealing graph bandits for maximizing local influence, in: International Conference on Artificial Intelligence and Statistics, Seville, Spain, May 2016.

    https://hal.inria.fr/hal-01304020
  • 3H. De Vries, F. Strub, J. Mary, H. Larochelle, O. Pietquin, A. Courville.

    Modulating early visual processing by language, in: Conference on Neural Information Processing Systems, Long Beach, United States, December 2017.

    https://hal.inria.fr/hal-01648683
  • 4N. Gatti, A. Lazaric, M. Rocco, F. Trovò.

    Truthful Learning Mechanisms for Multi–Slot Sponsored Search Auctions with Externalities, in: Artificial Intelligence, October 2015, vol. 227, pp. 93-139.

    https://hal.inria.fr/hal-01237670
  • 5M. Ghavamzadeh, Y. Engel, M. Valko.

    Bayesian Policy Gradient and Actor-Critic Algorithms, in: Journal of Machine Learning Research, January 2016, vol. 17, no 66, pp. 1-53.

    https://hal.inria.fr/hal-00776608
  • 6H. Kadri, E. Duflos, P. Preux, S. Canu, A. Rakotomamonjy, J. Audiffren.

    Operator-valued Kernels for Learning from Functional Response Data, in: Journal of Machine Learning Research (JMLR), 2016.

    https://hal.archives-ouvertes.fr/hal-01221329
  • 7E. Kaufmann, O. Cappé, A. Garivier.

    On the Complexity of Best Arm Identification in Multi-Armed Bandit Models, in: Journal of Machine Learning Research, January 2016, vol. 17, pp. 1-42.

    https://hal.archives-ouvertes.fr/hal-01024894
  • 8A. Lazaric, M. Ghavamzadeh, R. Munos.

    Analysis of Classification-based Policy Iteration Algorithms, in: Journal of Machine Learning Research, 2016, vol. 17, pp. 1 - 30.

    https://hal.inria.fr/hal-01401513
  • 9R. Munos.

    From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning, in: Foundations and Trends in Machine Learning, 2014, vol. 7, no 1, pp. 1-129.

    http://dx.doi.org/10.1561/2200000038
  • 10R. Ortner, D. Ryabko, P. Auer, R. Munos.

    Regret bounds for restless Markov bandits, in: Journal of Theoretical Computer Science (TCS), 2014, vol. 558, pp. 62-76. [ DOI : 10.1016/j.tcs.2014.09.026 ]

    https://hal.inria.fr/hal-01074077
Publications of the year

Doctoral Dissertations and Habilitation Theses

Articles in International Peer-Reviewed Journals

International Conferences with Proceedings

  • 18Y. Abbasi-Yadkori, P. Bartlett, V. Gabillon, A. Malek, M. Valko.

    Best of both worlds: Stochastic & adversarial best-arm identification, in: Conference on Learning Theory, Stockholm, Sweden, 2018.

    https://hal.inria.fr/hal-01808948
  • 19M. Aziz, J. Anderton, E. Kaufmann, J. Aslam.

    Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence, in: ALT 2018 - Algorithmic Learning Theory, Lanzarote, Spain, JMLR Workshop and Conference Proceedings, April 2018, https://arxiv.org/abs/1803.04665.

    https://hal.archives-ouvertes.fr/hal-01729969
  • 20M. Barlier, R. Laroche, O. Pietquin.

    Training Dialogue Systems With Human Advice, in: AAMAS 2018 - the 17th International Conference on Autonomous Agents and Multiagent Systems, Stockholm, Sweden, International Foundation for Autonomous Agents and MultiAgent Systems (IFAAMAS), July 2018, 9 p.

    https://hal.archives-ouvertes.fr/hal-01945831
  • 21P. Bartlett, V. Gabillon, M. Valko.

    A simple parameter-free and adaptive approach to optimization under a minimal local smoothness assumption, in: Algorithmic Learning Theory, Chicago, United States, 2019.

    https://hal.inria.fr/hal-01885368
  • 22L. Besson, E. Kaufmann.

    Multi-Player Bandits Revisited, in: Algorithmic Learning Theory, Lanzarote, Spain, Mehryar Mohri and Karthik Sridharan, April 2018, https://arxiv.org/abs/1711.02317.

    https://hal.inria.fr/hal-01629733
  • 23L. Besson, E. Kaufmann, C. Moy.

    Aggregation of Multi-Armed Bandits Learning Algorithms for Opportunistic Spectrum Access, in: IEEE WCNC - IEEE Wireless Communications and Networking Conference, Barcelona, Spain, April 2018. [ DOI : 10.1109/wcnc.2018.8377070 ]

    https://hal.inria.fr/hal-01705292
  • 24A. Bérard, L. Besacier, A. C. Kocabiyikoglu, O. Pietquin.

    End-to-End Automatic Speech Translation of Audiobooks, in: ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, Alberta, Canada, April 2018.

    https://hal.archives-ouvertes.fr/hal-01709586
  • 25D. Calandriello, I. Koutis, A. Lazaric, M. Valko.

    Improved large-scale graph learning through ridge spectral sparsification, in: International Conference on Machine Learning, Stockholm, Sweden, ICML 2018 - Thirty-fifth International Conference on Machine Learning, July 2018.

    https://hal.inria.fr/hal-01810980
  • 26N. Carrara, R. Laroche, J.-L. Bouraoui, T. Urvoy, O. Pietquin.

    A Fitted-Q Algorithm for Budgeted MDPs, in: EWRL 2018 - 14th European workshop on Reinforcement Learning, Lille, France, October 2018.

    https://hal.archives-ouvertes.fr/hal-01928092
  • 27N. Carrara, R. Laroche, J.-L. Bouraoui, T. Urvoy, O. Pietquin.

    Safe transfer learning for dialogue applications, in: SLSP 2018 - 6th International Conference on Statistical Language and Speech Processing, Mons, Belgium, October 2018.

    https://hal.archives-ouvertes.fr/hal-01928102
  • 28R. Fruit, M. Pirotta, A. Lazaric.

    Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes, in: 32nd Conference on Neural Information Processing Systems, Montréal, Canada, December 2018.

    https://hal.inria.fr/hal-01941220
  • 29R. Fruit, M. Pirotta, A. Lazaric, R. Ortner.

    Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning, in: ICML 2018 - The 35th International Conference on Machine Learning, Stockholm, Sweden, Proceedings of Machine Learning Research, July 2018, vol. 80, pp. 1578-1586.

    https://hal.inria.fr/hal-01941206
  • 30P. Gajane, T. Urvoy, E. Kaufmann.

    Corrupt Bandits for Preserving Local Privacy, in: ALT 2018 - Algorithmic Learning Theory, Lanzarote, Spain, Proceedings of Machine Learning Research, April 2018.

    https://hal.archives-ouvertes.fr/hal-01757297
  • 31J.-B. Grill, M. Valko, R. Munos.

    Optimistic optimization of a Brownian, in: NeurIPS 2018 - Thirty-second Conference on Neural Information Processing Systems, Montréal, Canada, December 2018.

    https://hal.inria.fr/hal-01906601
  • 32J.-H. Jacobsen, A. Smeulders, E. Oyallon.

    i-RevNet: Deep Invertible Networks, in: ICLR 2018 - International Conference on Learning Representations, Vancouver, Canada, April 2018, https://arxiv.org/abs/1802.07088.

    https://hal.archives-ouvertes.fr/hal-01712808
  • 33E. Oyallon, E. Belilovsky, S. Zagoruyko, M. Valko.

    Compressing the Input for CNNs with the First-Order Scattering Transform, in: European Conference on Computer Vision, Munich, Germany, 2018.

    https://hal.inria.fr/hal-01850921
  • 34M. Papini, D. Binaghi, G. Canonaco, M. Pirotta, M. Restelli.

    Stochastic Variance-Reduced Policy Gradient, in: ICML 2018 - 35th International Conference on Machine Learning, Stockholm, Sweden, Proceedings of Machine Learning Research, July 2018, vol. 80, pp. 4026-4035.

    https://hal.inria.fr/hal-01940394
  • 35E. Perez, F. Strub, H. De Vries, V. Dumoulin, A. Courville.

    FiLM: Visual Reasoning with a General Conditioning Layer, in: AAAI Conference on Artificial Intelligence, New Orleans, United States, February 2018, https://arxiv.org/abs/1707.03017.

    https://hal.inria.fr/hal-01648685
  • 36J. Pérolat, B. Piot, O. Pietquin.

    Actor-Critic Fictitious Play in Simultaneous Move Multistage Games, in: AISTATS 2018 - 21st International Conference on Artificial Intelligence and Statistics, Playa Blanca, Lanzarote, Canary Islands, Spain, April 2018.

    https://hal.inria.fr/hal-01724227
  • 37J. Seznec, A. Locatelli, A. Carpentier, A. Lazaric, M. Valko.

    Rotting bandits are no harder than stochastic ones, in: International Conference on Artificial Intelligence and Statistics, Okinawa, Japan, 2019.

    https://hal.inria.fr/hal-01936894
  • 38A. Tirinzoni, A. Sessa, M. Pirotta, M. Restelli.

    Importance Weighted Transfer of Samples in Reinforcement Learning, in: ICML 2018 - The 35th International Conference on Machine Learning, Stockholm, Sweden, Proceedings of Machine Learning Research, July 2018, vol. 80, pp. 4936-4945.

    https://hal.inria.fr/hal-01941213

Conferences without Proceedings

  • 39E. Kaufmann, W. Koolen, A. Garivier.

    Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling, in: Advances in Neural Information Processing Systems (NIPS), Montréal, Canada, December 2018, https://arxiv.org/abs/1806.00973.

    https://hal.archives-ouvertes.fr/hal-01804581
  • 40E. Leurent, Y. Blanco, D. Efimov, O.-A. Maillard.

    Approximate Robust Control of Uncertain Dynamical Systems, in: 32nd Conference on Neural Information Processing Systems (NeurIPS 2018) Workshop, Montréal, Canada, December 2018.

    https://hal.archives-ouvertes.fr/hal-01931744
  • 41X. Shang, E. Kaufmann, M. Valko.

    Adaptive black-box optimization got easier: HCT only needs local smoothness, in: European Workshop on Reinforcement Learning, Lille, France, October 2018.

    https://hal.inria.fr/hal-01874637
  • 42F. Strub, M. Seurin, E. Perez, H. De Vries, J. Mary, P. Preux, A. Courville, O. Pietquin.

    Visual Reasoning with Multi-hop Feature Modulation, in: ECCV 2018 - 15th European Conference on Computer Vision, Munich, Germany, V. Ferrari, M. Hebert, C. Sminchisescu, Y. Weiss (editors), Part of the Lecture Notes in Computer Science book series - LNCS, September 2018, vol. 11205-11220, no 11209, pp. 808-831, https://arxiv.org/abs/1808.04446.

    https://hal.archives-ouvertes.fr/hal-01927811
  • 43R. Warlop, A. Lazaric, J. Mary.

    Fighting Boredom in Recommender Systems with Linear Reinforcement Learning, in: Neural Information Processing Systems, Montreal, Canada, December 2018.

    https://hal.inria.fr/hal-01915468

Other Publications

References in notes
  • 56P. Auer, N. Cesa-Bianchi, P. Fischer.

    Finite-time analysis of the multi-armed bandit problem, in: Machine Learning, 2002, vol. 47, no 2/3, pp. 235–256.
  • 57R. Bellman.

    Dynamic Programming, Princeton University Press, 1957.
  • 58D. Bertsekas, S. Shreve.

    Stochastic Optimal Control (The Discrete Time Case), Academic Press, New York, 1978.
  • 59D. Bertsekas, J. Tsitsiklis.

    Neuro-Dynamic Programming, Athena Scientific, 1996.
  • 60M. Puterman.

    Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley and Sons, 1994.
  • 61H. Robbins.

    Some aspects of the sequential design of experiments, in: Bull. Amer. Math. Soc., 1952, vol. 55, pp. 527–535.
  • 62R. Sutton, A. Barto.

    Reinforcement learning: an introduction, MIT Press, 1998.
  • 63P. Werbos.

    ADP: Goals, Opportunities and Principles, IEEE Press, 2004, pp. 3–44, Handbook of learning and approximate dynamic programming.