Inria | Raweb 2019 | Presentation of the Project-Team SEQUEL | SEQUEL Web Site


	PDF	e-Pub

Previous |

Home

Bibliography

Major publications by the team in recent years

1O. Cappé, A. Garivier, O.-A. Maillard, R. Munos, G. Stoltz.
Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation, in: Annals of Statistics, 2013, vol. 41, n^o 3, pp. 1516-1541.
https://hal.archives-ouvertes.fr/hal-00738209
2A. Carpentier, M. Valko.
Revealing Graph Bandits for Maximizing Local Influence, in: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, Cadiz, Spain, A. Gretton, C. C. Robert (editors), Proceedings of Machine Learning Research, PMLR, May 2016, vol. 51, pp. 10-18.
http://proceedings.mlr.press/v51/carpentier16a.html
3H. De Vries, F. Strub, J. Mary, H. Larochelle, O. Pietquin, A. Courville.
Modulating early visual processing by language, in: Conference on Neural Information Processing Systems, Long Beach, United States, December 2017, pp. 6594-6604.
https://hal.inria.fr/hal-01648683
4N. Gatti, A. Lazaric, M. Rocco, F. Trovò.
Truthful Learning Mechanisms for Multi–Slot Sponsored Search Auctions with Externalities, in: Artificial Intelligence, October 2015, vol. 227, pp. 93-139.
https://hal.inria.fr/hal-01237670
5M. Ghavamzadeh, Y. Engel, M. Valko.
Bayesian Policy Gradient and Actor-Critic Algorithms, in: Journal of Machine Learning Research, January 2016, vol. 17, n^o 66, pp. 1-53.
https://hal.inria.fr/hal-00776608
6H. Kadri, E. Duflos, P. Preux, S. Canu, A. Rakotomamonjy, J. Audiffren.
Operator-valued Kernels for Learning from Functional Response Data, in: Journal of Machine Learning Research (JMLR), April 2016, vol. 17, n^o 20, pp. 1-54.
https://hal.archives-ouvertes.fr/hal-01221329
7E. Kaufmann, O. Cappé, A. Garivier.
On the Complexity of Best Arm Identification in Multi-Armed Bandit Models, in: Journal of Machine Learning Research, January 2016, vol. 17, pp. 1-42.
https://hal.archives-ouvertes.fr/hal-01024894
8A. Lazaric, M. Ghavamzadeh, R. Munos.
Analysis of Classification-based Policy Iteration Algorithms, in: Journal of Machine Learning Research, 2016, vol. 17, pp. 1-30.
https://hal.inria.fr/hal-01401513
9R. Munos.
From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning, in: Foundations and Trends in Machine Learning, 2014, vol. 7, n^o 1, pp. 1-129.
http://dx.doi.org/10.1561/2200000038
10R. Ortner, D. Ryabko, P. Auer, R. Munos.
Regret bounds for restless Markov bandits, in: Journal of Theoretical Computer Science (TCS), 2014, vol. 558, pp. 62-76. [ DOI : 10.1016/j.tcs.2014.09.026 ]
https://hal.inria.fr/hal-01074077

Publications of the year

Doctoral Dissertations and Habilitation Theses

11N. Carrara.
Reinforcement learning for Dialogue Systems optimization with user adaptation, Ecole Doctoral Science pour l'Ingénieur Université Lille Nord-de-France, December 2019.
https://tel.archives-ouvertes.fr/tel-02422691
12R. Fruit.
Exploration-exploitation dilemma in Reinforcement Learning under various form of prior knowledge, Université de Lille 1, Sciences et Technologies; CRIStAL UMR 9189, November 2019.
https://tel.archives-ouvertes.fr/tel-02388395
13O.-A. Maillard.
Mathematics of Statistiscal Sequential Decision Making, Université de Lille Nord de France, February 2019, Habilitation à diriger des recherches.
https://hal.archives-ouvertes.fr/tel-02077035

Articles in International Peer-Reviewed Journals

14M.-A. Charpagne, F. Strub, T. M. Pollock.
Accurate reconstruction of EBSD datasets by a multimodal data approach using an evolutionary algorithm, in: Materials Characterization, April 2019, vol. 150, pp. 184-198, https://arxiv.org/abs/1903.02988 - A short version of this paper exists towards people working in Machine Learning, namely arxiv:1903.02982. [ DOI : 10.1016/j.matchar.2019.01.033 ]
https://hal.archives-ouvertes.fr/hal-02062098
15A. R. Luedtke, E. Kaufmann, A. Chambaz.
Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits, in: Machine Learning Journal, September 2019, vol. 108, n^o 11, pp. 1919-1949, https://arxiv.org/abs/1606.09388.
https://hal.archives-ouvertes.fr/hal-01338733

International Conferences with Proceedings

17P. Bartlett, V. Gabillon, J. Healey, M. Valko.
Scale-free adaptive planning for deterministic dynamics & discounted rewards, in: International Conference on Machine Learning, Long Beach, United States, 2019.
https://hal.inria.fr/hal-02387484
18P. Bartlett, V. Gabillon, M. Valko.
A simple parameter-free and adaptive approach to optimization under a minimal local smoothness assumption, in: Algorithmic Learning Theory, Chicago, United States, 2019.
https://hal.inria.fr/hal-01885368
19D. Calandriello, L. Carratino, A. Lazaric, M. Valko, L. Rosasco.
Gaussian process optimization with adaptive sketching: Scalable and no regret, in: Conference on Learning Theory, Phoenix, United States, 2019.
https://hal.inria.fr/hal-02144311
20N. Carrara, E. Leurent, R. Laroche, T. Urvoy, O.-A. Maillard, O. Pietquin.
Budgeted Reinforcement Learning in Continuous State Space, in: Conference on Neural Information Processing Systems, Vancouver, Canada, Advances in Neural Information Processing Systems, December 2019, vol. 32, https://arxiv.org/abs/1903.01004.
https://hal.archives-ouvertes.fr/hal-02375727
21M. Dereziński, D. Calandriello, M. Valko.
Exact sampling of determinantal point processes with sublinear time preprocessing, in: Neural Information Processing Systems, Vancouver, Canada, 2019.
https://hal.inria.fr/hal-02387524
22C. Dimitrakakis, Y. Liu, D. Parkes, G. Radanovic.
Bayesian Fairness, in: AAAI 2019 - Thirty-Third AAAI Conference on Artificial Intelligence, Honolulu, United States, January 2019.
https://hal.inria.fr/hal-01953311
23G. Gautier, R. Bardenet, M. Valko.
On two ways to use determinantal point processes for Monte Carlo integration – Long version, in: NeurIPS 2019 - Thirty-third Conference on Neural Information Processing Systems, Vancouver, Canada, Advances in Neural Information Processing Systems, 2019.
https://hal.archives-ouvertes.fr/hal-02277739
24J.-B. Grill, O. D. Domingues, P. Ménard, R. Munos, M. Valko.
Planning in entropy-regularized Markov decision processes and games, in: Neural Information Processing Systems, Vancouver, Canada, 2019.
https://hal.inria.fr/hal-02387515
25E. Leurent, O.-A. Maillard.
Practical Open-Loop Optimistic Planning, in: European Conference on Machine Learning, Würzburg, Germany, European Conference on Machine Learning, September 2019, https://arxiv.org/abs/1904.04700.
https://hal.archives-ouvertes.fr/hal-02375697
26A. Locatelli, A. Carpentier, M. Valko.
Active multiple matrix completion with adaptive confidence sets, in: International Conference on Artificial Intelligence and Statistics, Okinawa, Japan, 2019.
https://hal.inria.fr/hal-02387468
27O.-A. Maillard.
Sequential change-point detection: Laplace concentration of scan statistics and non-asymptotic delay bounds, in: Algorithmic Learning Theory, Chicago, United States, 2019, vol. 98, pp. 1 - 23.
https://hal.archives-ouvertes.fr/hal-02351665
28C. Moy, L. Besson.
Decentralized Spectrum Learning for IoT Wireless Networks Collision Mitigation, in: ISIoT 2019 - 1st International Workshop on Intelligent Systems for the Internet of Things, Santorin, Greece, May 2019, https://arxiv.org/abs/1906.00614.
https://hal.inria.fr/hal-02144465
29R. Ortner, M. Pirotta, R. Fruit, A. Lazaric, O.-A. Maillard.
Regret Bounds for Learning State Representations in Reinforcement Learning, in: Conference on Neural Information Processing Systems, Vancouver, Canada, Conference on Neural Information Processing Systems, December 2019.
https://hal.archives-ouvertes.fr/hal-02375715
30P. Perrault, V. Perchet, M. Valko.
Exploiting structure of uncertainty for efficient matroid semi-bandits, in: International Conference on Machine Learning, Long Beach, United States, 2019.
https://hal.inria.fr/hal-02387478
31P. Perrault, V. Perchet, M. Valko.
Finding the bandit in a graph: Sequential search-and-stop, in: International Conference on Artificial Intelligence and Statistics, Okinawa, Japan, 2019.
https://hal.inria.fr/hal-02387465
32J. Seznec, A. Locatelli, A. Carpentier, A. Lazaric, M. Valko.
Rotting bandits are not harder than stochastic ones, in: International Conference on Artificial Intelligence and Statistics, Naha, Japan, 2019.
https://hal.inria.fr/hal-01936894
33X. Shang, E. Kaufmann, M. Valko.
A simple dynamic bandit algorithm for hyper-parameter tuning, in: Workshop on Automated Machine Learning at International Conference on Machine Learning, Long Beach, United States, AutoML@ICML 2019 - 6th ICML Workshop on Automated Machine Learning, June 2019.
https://hal.inria.fr/hal-02145200
34X. Shang, E. Kaufmann, M. Valko.
General parallel optimization without a metric, in: Algorithmic Learning Theory, Chicago, United States, 2019, vol. 98.
https://hal.inria.fr/hal-02047225
35M. S. Talebi, O.-A. Maillard.
Learning Multiple Markov Chains via Adaptive Allocation, in: Advances in Neural Information Processing Systems 32 (NIPS 2019), Vancouver, Canada, December 2019.
https://hal.archives-ouvertes.fr/hal-02387345

National Conferences with Proceedings

36L. Besson, E. Kaufmann.
Non-asymptotic analysis of a sequential rupture detection test and its application to non-stationary bandits, in: GRETSI 2019 - XXVIIème Colloque francophone de traitement du signal et des images, Lille, France, August 2019.
https://hal.inria.fr/hal-02152243

Conferences without Proceedings

37L. Besson, R. Bonnefoi, C. Moy.
GNU Radio Implementation of MALIN: "Multi-Armed bandits Learning for Internet-of-things Networks", in: IEEE WCNC 2019 - IEEE Wireless Communications and Networking Conference, Marrakech, Morocco, April 2019, https://arxiv.org/abs/1902.01734.
https://hal.inria.fr/hal-02006825
38R. Bonnefoi, L. Besson, J. Manco-Vasquez, C. Moy.
Upper-Confidence Bound for Channel Selection in LPWA Networks with Retransmissions, in: The 1st International Workshop on Mathematical Tools and technologies for IoT and mMTC Networks Modeling, Marrakech, Morocco, Philippe Mary, Samir Perlaza, Petar Popovski, April 2019, https://arxiv.org/abs/1902.10615 - The source code (MATLAB or Octave) used for the simula-tions and the figures is open-sourced under the MIT License, atBitbucket.org/scee_ietr/ucb_smart_retrans.
https://hal.inria.fr/hal-02049824
39Y. Flet-Berliac, P. Preux.
MERL: Multi-Head Reinforcement Learning, in: NeurIPS 2019 Deep Reinforcement Learning Workshop, Vancouver, Canada, December 2019, https://arxiv.org/abs/1909.11939.
https://hal.inria.fr/hal-02305105
40G. Gautier, R. Bardenet, M. Valko.
On two ways to use determinantal point processes for Monte Carlo integration, in: NEGDEPML 2019 - ICML Workshop on Negative Dependence in ML, Long Beach, CA, United States, June 2019.
https://hal.archives-ouvertes.fr/hal-02160382
41T. Levent, P. Preux, E. Le Pennec, J. Badosa, G. Henri, Y. Bonnassieux.
Energy Management for Microgrids: a Reinforcement Learning Approach, in: ISGT-Europe 2019 - IEEE PES Innovative Smart Grid Technologies Europe, Bucharest, France, IEEE, September 2019, pp. 1-5. [ DOI : 10.1109/ISGTEurope.2019.8905538 ]
https://hal.archives-ouvertes.fr/hal-02382232
42M. Seurin, P. Preux, O. Pietquin.
"I'm sorry Dave, I'm afraid I can't do that" Deep Q-Learning From Forbidden Actions, in: Workshop on Safety and Robustness in Decision Making (NeurIPS 2019), Vancouver, Canada, December 2019.
https://hal.inria.fr/hal-02387419

Other Publications

43L. Besson, E. Kaufmann.
The Generalized Likelihood Ratio Test meets klUCB: an Improved Algorithm for Piece-Wise Non-Stationary Bandits, February 2019, https://arxiv.org/abs/1902.01575 - working paper or preprint.
https://hal.inria.fr/hal-02006471
44E. Boursier, E. Kaufmann, A. Mehrabian, V. Perchet.
A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players, May 2019, https://arxiv.org/abs/1902.01239 - working paper or preprint.
https://hal.archives-ouvertes.fr/hal-02006069
45G. Cideron, M. Seurin, F. Strub, O. Pietquin.
Self-Educated Language Agent With Hindsight Experience Replay For Instruction Following, November 2019, https://arxiv.org/abs/1910.09451 - working paper or preprint. [ DOI : 10.09451 ]
https://hal.archives-ouvertes.fr/hal-02386585
46R. Degenne, W. M. Koolen, P. Ménard.
Non-Asymptotic Pure Exploration by Solving Games, December 2019, working paper or preprint.
https://hal.archives-ouvertes.fr/hal-02402665
47Y. Flet-Berliac, P. Preux.
High-Dimensional Control Using Generalized Auxiliary Tasks, November 2019, working paper or preprint.
https://hal.inria.fr/hal-02295705
48Y. Flet-Berliac, P. Preux.
Samples Are Useful? Not Always: denoising policy gradient updates using variance explained, September 2019, https://arxiv.org/abs/1904.04025 - working paper or preprint.
https://hal.inria.fr/hal-02091547
49A. Garivier, H. Hadiji, P. Ménard, G. Stoltz.
KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints, November 2019, working paper or preprint.
https://hal.archives-ouvertes.fr/hal-01785705
50A. Garivier, E. Kaufmann.
Non-Asymptotic Sequential Tests for Overlapping Hypotheses and application to near optimal arm identification in bandit models, May 2019, https://arxiv.org/abs/1905.03495 - working paper or preprint.
https://hal.archives-ouvertes.fr/hal-02123833
51E. Leurent, Y. Blanco, D. Efimov, O.-A. Maillard.
Approximate Robust Control of Uncertain Dynamical Systems, February 2019, https://arxiv.org/abs/1903.00220 - working paper or preprint.
https://hal.archives-ouvertes.fr/hal-01931744
52E. Leurent, J. Mercat.
Social Attention for Autonomous Decision-Making in Dense Traffic, November 2019, https://arxiv.org/abs/1911.12250 - working paper or preprint.
https://hal.archives-ouvertes.fr/hal-02383940
53O.-A. Maillard, T. A. Mann, R. Ortner, S. Mannor.
Active Roll-outs in MDP with Irreversible Dynamics, July 2019, working paper or preprint.
https://hal.archives-ouvertes.fr/hal-02177808
54X. Shang, R. De Heide, E. Kaufmann, P. Ménard, M. Valko.
Fixed-confidence guarantees for Bayesian best-arm identification, October 2019, https://arxiv.org/abs/1910.10945 - working paper or preprint.
https://hal.archives-ouvertes.fr/hal-02330187
55F. Strub, M.-A. Charpagne, T. M. Pollock.
Accurate reconstruction of EBSD datasets by a multimodal data approach using an evolutionary algorithm, March 2019, https://arxiv.org/abs/1903.02988 - A short version of this paper exists towards people working in Machine Learning, namely arxiv:1903.02982. [ DOI : 10.1016/j.matchar.2019.01.033 ]
https://hal.archives-ouvertes.fr/hal-02062104
56C. Trinh, E. Kaufmann, C. Vernade, R. Combes.
Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling, December 2019, https://arxiv.org/abs/1912.03074 - working paper or preprint.
https://hal.archives-ouvertes.fr/hal-02396943

References in notes

57P. Auer, N. Cesa-Bianchi, P. Fischer.
Finite-time analysis of the multi-armed bandit problem, in: Machine Learning, 2002, vol. 47, n^o 2/3, pp. 235–256.
58R. Bellman.
Dynamic Programming, Princeton University Press, 1957.
59D. Bertsekas, S. Shreve.
Stochastic Optimal Control (The Discrete Time Case), Academic Press, New York, 1978.
60D. Bertsekas, J. Tsitsiklis.
Neuro-Dynamic Programming, Athena Scientific, 1996.
61M. Puterman.
Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley and Sons, 1994.
62H. Robbins.
Some aspects of the sequential design of experiments, in: Bull. Amer. Math. Soc., 1952, vol. 55, pp. 527–535.
63R. Sutton, A. Barto.
Reinforcement learning: an introduction, MIT Press, 1998.
64P. Werbos.
ADP: Goals, Opportunities and Principles, IEEE Press, 2004, pp. 3–44, Handbook of learning and approximate dynamic programming.

Previous |

Home