Team, Visitors, External Collaborators
Overall Objectives
Research Program
Application Domains
Highlights of the Year
New Software and Platforms
New Results
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Bibliography

Major publications by the team in recent years
  • 1O. Cappé, A. Garivier, O.-A. Maillard, R. Munos, G. Stoltz.
    Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation, in: Annals of Statistics, 2013, vol. 41, no 3, pp. 1516-1541, Accepted, to appear in Annals of Statistics.
    https://hal.archives-ouvertes.fr/hal-00738209
  • 2A. Carpentier, M. Valko.
    Revealing graph bandits for maximizing local influence, in: International Conference on Artificial Intelligence and Statistics, Seville, Spain, May 2016.
    https://hal.inria.fr/hal-01304020
  • 3H. De Vries, F. Strub, J. Mary, H. Larochelle, O. Pietquin, A. Courville.
    Modulating early visual processing by language, in: Conference on Neural Information Processing Systems, Long Beach, United States, December 2017.
    https://hal.inria.fr/hal-01648683
  • 4N. Gatti, A. Lazaric, M. Rocco, F. Trovò.
    Truthful Learning Mechanisms for Multi–Slot Sponsored Search Auctions with Externalities, in: Artificial Intelligence, October 2015, vol. 227, pp. 93-139.
    https://hal.inria.fr/hal-01237670
  • 5M. Ghavamzadeh, Y. Engel, M. Valko.
    Bayesian Policy Gradient and Actor-Critic Algorithms, in: Journal of Machine Learning Research, January 2016, vol. 17, no 66, pp. 1-53.
    https://hal.inria.fr/hal-00776608
  • 6H. Kadri, E. Duflos, P. Preux, S. Canu, A. Rakotomamonjy, J. Audiffren.
    Operator-valued Kernels for Learning from Functional Response Data, in: Journal of Machine Learning Research (JMLR), 2016.
    https://hal.archives-ouvertes.fr/hal-01221329
  • 7E. Kaufmann, O. Cappé, A. Garivier.
    On the Complexity of Best Arm Identification in Multi-Armed Bandit Models, in: Journal of Machine Learning Research, January 2016, vol. 17, pp. 1-42.
    https://hal.archives-ouvertes.fr/hal-01024894
  • 8A. Lazaric, M. Ghavamzadeh, R. Munos.
    Analysis of Classification-based Policy Iteration Algorithms, in: Journal of Machine Learning Research, 2016, vol. 17, pp. 1 - 30.
    https://hal.inria.fr/hal-01401513
  • 9R. Munos.
    From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning, in: Foundations and Trends in Machine Learning, 2014, vol. 7, no 1, pp. 1-129.
    http://dx.doi.org/10.1561/2200000038
  • 10R. Ortner, D. Ryabko, P. Auer, R. Munos.
    Regret bounds for restless Markov bandits, in: Journal of Theoretical Computer Science (TCS), 2014, vol. 558, pp. 62-76. [ DOI : 10.1016/j.tcs.2014.09.026 ]
    https://hal.inria.fr/hal-01074077
Publications of the year

Doctoral Dissertations and Habilitation Theses

Articles in International Peer-Reviewed Journals

International Conferences with Proceedings

  • 18Y. Abbasi-Yadkori, P. Bartlett, V. Gabillon, A. Malek, M. Valko.
    Best of both worlds: Stochastic & adversarial best-arm identification, in: Conference on Learning Theory, Stockholm, Sweden, 2018.
    https://hal.inria.fr/hal-01808948
  • 19M. Aziz, J. Anderton, E. Kaufmann, J. Aslam.
    Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence, in: ALT 2018 - Algorithmic Learning Theory, Lanzarote, Spain, JMLR Workshop and Conference Proceedings, April 2018, https://arxiv.org/abs/1803.04665.
    https://hal.archives-ouvertes.fr/hal-01729969
  • 20M. Barlier, R. Laroche, O. Pietquin.
    Training Dialogue Systems With Human Advice, in: AAMAS 2018 - the 17th International Conference on Autonomous Agents and Multiagent Systems, Stockholm, Sweden, International Foundation for Autonomous Agents and MultiAgent Systems (IFAAMAS), July 2018, 9 p.
    https://hal.archives-ouvertes.fr/hal-01945831
  • 21P. Bartlett, V. Gabillon, M. Valko.
    A simple parameter-free and adaptive approach to optimization under a minimal local smoothness assumption, in: Algorithmic Learning Theory, Chicago, United States, 2019.
    https://hal.inria.fr/hal-01885368
  • 22L. Besson, E. Kaufmann.
    Multi-Player Bandits Revisited, in: Algorithmic Learning Theory, Lanzarote, Spain, Mehryar Mohri and Karthik Sridharan, April 2018, https://arxiv.org/abs/1711.02317.
    https://hal.inria.fr/hal-01629733
  • 23L. Besson, E. Kaufmann, C. Moy.
    Aggregation of Multi-Armed Bandits Learning Algorithms for Opportunistic Spectrum Access, in: IEEE WCNC - IEEE Wireless Communications and Networking Conference, Barcelona, Spain, April 2018. [ DOI : 10.1109/wcnc.2018.8377070 ]
    https://hal.inria.fr/hal-01705292
  • 24A. Bérard, L. Besacier, A. C. Kocabiyikoglu, O. Pietquin.
    End-to-End Automatic Speech Translation of Audiobooks, in: ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, Alberta, Canada, April 2018.
    https://hal.archives-ouvertes.fr/hal-01709586
  • 25D. Calandriello, I. Koutis, A. Lazaric, M. Valko.
    Improved large-scale graph learning through ridge spectral sparsification, in: International Conference on Machine Learning, Stockholm, Sweden, ICML 2018 - Thirty-fifth International Conference on Machine Learning, July 2018.
    https://hal.inria.fr/hal-01810980
  • 26N. Carrara, R. Laroche, J.-L. Bouraoui, T. Urvoy, O. Pietquin.
    A Fitted-Q Algorithm for Budgeted MDPs, in: EWRL 2018 - 14th European workshop on Reinforcement Learning, Lille, France, October 2018.
    https://hal.archives-ouvertes.fr/hal-01928092
  • 27N. Carrara, R. Laroche, J.-L. Bouraoui, T. Urvoy, O. Pietquin.
    Safe transfer learning for dialogue applications, in: SLSP 2018 - 6th International Conference on Statistical Language and Speech Processing, Mons, Belgium, October 2018.
    https://hal.archives-ouvertes.fr/hal-01928102
  • 28R. Fruit, M. Pirotta, A. Lazaric.
    Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes, in: 32nd Conference on Neural Information Processing Systems, Montréal, Canada, December 2018.
    https://hal.inria.fr/hal-01941220
  • 29R. Fruit, M. Pirotta, A. Lazaric, R. Ortner.
    Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning, in: ICML 2018 - The 35th International Conference on Machine Learning, Stockholm, Sweden, Proceedings of Machine Learning Research, July 2018, vol. 80, pp. 1578-1586.
    https://hal.inria.fr/hal-01941206
  • 30P. Gajane, T. Urvoy, E. Kaufmann.
    Corrupt Bandits for Preserving Local Privacy, in: ALT 2018 - Algorithmic Learning Theory, Lanzarote, Spain, Proceedings of Machine Learning Research, April 2018.
    https://hal.archives-ouvertes.fr/hal-01757297
  • 31J.-B. Grill, M. Valko, R. Munos.
    Optimistic optimization of a Brownian, in: NeurIPS 2018 - Thirty-second Conference on Neural Information Processing Systems, Montréal, Canada, December 2018.
    https://hal.inria.fr/hal-01906601
  • 32J.-H. Jacobsen, A. Smeulders, E. Oyallon.
    i-RevNet: Deep Invertible Networks, in: ICLR 2018 - International Conference on Learning Representations, Vancouver, Canada, April 2018, https://arxiv.org/abs/1802.07088.
    https://hal.archives-ouvertes.fr/hal-01712808
  • 33E. Oyallon, E. Belilovsky, S. Zagoruyko, M. Valko.
    Compressing the Input for CNNs with the First-Order Scattering Transform, in: European Conference on Computer Vision, Munich, Germany, 2018.
    https://hal.inria.fr/hal-01850921
  • 34M. Papini, D. Binaghi, G. Canonaco, M. Pirotta, M. Restelli.
    Stochastic Variance-Reduced Policy Gradient, in: ICML 2018 - 35th International Conference on Machine Learning, Stockholm, Sweden, Proceedings of Machine Learning Research, July 2018, vol. 80, pp. 4026-4035.
    https://hal.inria.fr/hal-01940394
  • 35E. Perez, F. Strub, H. De Vries, V. Dumoulin, A. Courville.
    FiLM: Visual Reasoning with a General Conditioning Layer, in: AAAI Conference on Artificial Intelligence, New Orleans, United States, February 2018, https://arxiv.org/abs/1707.03017.
    https://hal.inria.fr/hal-01648685
  • 36J. Pérolat, B. Piot, O. Pietquin.
    Actor-Critic Fictitious Play in Simultaneous Move Multistage Games, in: AISTATS 2018 - 21st International Conference on Artificial Intelligence and Statistics, Playa Blanca, Lanzarote, Canary Islands, Spain, April 2018.
    https://hal.inria.fr/hal-01724227
  • 37J. Seznec, A. Locatelli, A. Carpentier, A. Lazaric, M. Valko.
    Rotting bandits are no harder than stochastic ones, in: International Conference on Artificial Intelligence and Statistics, Okinawa, Japan, 2019.
    https://hal.inria.fr/hal-01936894
  • 38A. Tirinzoni, A. Sessa, M. Pirotta, M. Restelli.
    Importance Weighted Transfer of Samples in Reinforcement Learning, in: ICML 2018 - The 35th International Conference on Machine Learning, Stockholm, Sweden, Proceedings of Machine Learning Research, July 2018, vol. 80, pp. 4936-4945.
    https://hal.inria.fr/hal-01941213

Conferences without Proceedings

  • 39E. Kaufmann, W. Koolen, A. Garivier.
    Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling, in: Advances in Neural Information Processing Systems (NIPS), Montréal, Canada, December 2018, https://arxiv.org/abs/1806.00973.
    https://hal.archives-ouvertes.fr/hal-01804581
  • 40E. Leurent, Y. Blanco, D. Efimov, O.-A. Maillard.
    Approximate Robust Control of Uncertain Dynamical Systems, in: 32nd Conference on Neural Information Processing Systems (NeurIPS 2018) Workshop, Montréal, Canada, December 2018.
    https://hal.archives-ouvertes.fr/hal-01931744
  • 41X. Shang, E. Kaufmann, M. Valko.
    Adaptive black-box optimization got easier: HCT only needs local smoothness, in: European Workshop on Reinforcement Learning, Lille, France, October 2018.
    https://hal.inria.fr/hal-01874637
  • 42F. Strub, M. Seurin, E. Perez, H. De Vries, J. Mary, P. Preux, A. Courville, O. Pietquin.
    Visual Reasoning with Multi-hop Feature Modulation, in: ECCV 2018 - 15th European Conference on Computer Vision, Munich, Germany, V. Ferrari, M. Hebert, C. Sminchisescu, Y. Weiss (editors), Part of the Lecture Notes in Computer Science book series - LNCS, September 2018, vol. 11205-11220, no 11209, pp. 808-831, https://arxiv.org/abs/1808.04446.
    https://hal.archives-ouvertes.fr/hal-01927811
  • 43R. Warlop, A. Lazaric, J. Mary.
    Fighting Boredom in Recommender Systems with Linear Reinforcement Learning, in: Neural Information Processing Systems, Montreal, Canada, December 2018.
    https://hal.inria.fr/hal-01915468

Other Publications

References in notes
  • 56P. Auer, N. Cesa-Bianchi, P. Fischer.
    Finite-time analysis of the multi-armed bandit problem, in: Machine Learning, 2002, vol. 47, no 2/3, pp. 235–256.
  • 57R. Bellman.
    Dynamic Programming, Princeton University Press, 1957.
  • 58D. Bertsekas, S. Shreve.
    Stochastic Optimal Control (The Discrete Time Case), Academic Press, New York, 1978.
  • 59D. Bertsekas, J. Tsitsiklis.
    Neuro-Dynamic Programming, Athena Scientific, 1996.
  • 60M. Puterman.
    Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley and Sons, 1994.
  • 61H. Robbins.
    Some aspects of the sequential design of experiments, in: Bull. Amer. Math. Soc., 1952, vol. 55, pp. 527–535.
  • 62R. Sutton, A. Barto.
    Reinforcement learning: an introduction, MIT Press, 1998.
  • 63P. Werbos.
    ADP: Goals, Opportunities and Principles, IEEE Press, 2004, pp. 3–44, Handbook of learning and approximate dynamic programming.