Inria | Raweb 2017 | Presentation of the Project-Team SEQUEL | SEQUEL Web Site


	PDF	e-Pub

Previous |

Home

Bibliography

Major publications by the team in recent years

1O. Cappé, A. Garivier, O.-A. Maillard, R. Munos, G. Stoltz.
Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation, in: Annals of Statistics, 2013, vol. 41, n^o 3, pp. 1516-1541, Accepted, to appear in Annals of Statistics.
https://hal.archives-ouvertes.fr/hal-00738209
2A. Carpentier, M. Valko.
Revealing graph bandits for maximizing local influence, in: International Conference on Artificial Intelligence and Statistics, Seville, Spain, May 2016.
https://hal.inria.fr/hal-01304020
3H. De Vries, F. Strub, J. Mary, H. Larochelle, O. Pietquin, A. Courville.
Modulating early visual processing by language, in: Conference on Neural Information Processing Systems, Long Beach, United States, December 2017.
https://hal.inria.fr/hal-01648683
4N. Gatti, A. Lazaric, M. Rocco, F. Trovò.
Truthful Learning Mechanisms for Multi–Slot Sponsored Search Auctions with Externalities, in: Artificial Intelligence, October 2015, vol. 227, pp. 93-139.
https://hal.inria.fr/hal-01237670
5M. Ghavamzadeh, Y. Engel, M. Valko.
Bayesian Policy Gradient and Actor-Critic Algorithms, in: Journal of Machine Learning Research, January 2016, vol. 17, n^o 66, pp. 1-53.
https://hal.inria.fr/hal-00776608
6H. Kadri, E. Duflos, P. Preux, S. Canu, A. Rakotomamonjy, J. Audiffren.
Operator-valued Kernels for Learning from Functional Response Data, in: Journal of Machine Learning Research (JMLR), 2016.
https://hal.archives-ouvertes.fr/hal-01221329
7E. Kaufmann, O. Cappé, A. Garivier.
On the Complexity of Best Arm Identification in Multi-Armed Bandit Models, in: Journal of Machine Learning Research, January 2016, vol. 17, pp. 1-42.
https://hal.archives-ouvertes.fr/hal-01024894
8A. Lazaric, M. Ghavamzadeh, R. Munos.
Analysis of Classification-based Policy Iteration Algorithms, in: Journal of Machine Learning Research, 2016, vol. 17, pp. 1 - 30.
https://hal.inria.fr/hal-01401513
9R. Munos.
From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning, in: Foundations and Trends® in Machine Learning, 2014, vol. 7, n^o 1, pp. 1-129.
http://dx.doi.org/10.1561/2200000038
10R. Ortner, D. Ryabko, P. Auer, R. Munos.
Regret bounds for restless Markov bandits, in: Journal of Theoretical Computer Science (TCS), 2014, vol. 558, pp. 62-76. [ DOI : 10.1016/j.tcs.2014.09.026 ]
https://hal.inria.fr/hal-01074077

Publications of the year

Doctoral Dissertations and Habilitation Theses

11M. Abeille.
Exploration-Exploitation with Thompson Sampling in Linear Systems, Université de Lille, December 2017.
12D. Calandriello.
Efficient Sequential Learning in Structured and Constrained Environments, Université de Lille, December 2017.
13P. Gajane.
Multi-armed bandits with unconventional feedback, Université de Lille, November 2017.
14J. Pérolat.
Reinforcement learning: the multiplayer case, Université de Lille, December 2017.

Articles in International Peer-Reviewed Journals

15B. Danglot, P. Preux, B. Baudry, M. Monperrus.
Correctness Attraction: A Study of Stability of Software Behavior Under Runtime Perturbation, in: Empirical Software Engineering, 2017, https://arxiv.org/abs/1611.09187. [ DOI : 10.1007/s10664-017-9571-8 ]
https://hal.archives-ouvertes.fr/hal-01378523
16C. Dimitrakakis, B. Nelson, Z. Zhang, A. Mitrokotsa, B. I. P. Rubinstein.
Differential Privacy for Bayesian Inference through Posterior Sampling, in: Journal of Machine Learning Research, April 2017, vol. 18, n^o 11, 1−39 p.
https://hal.inria.fr/hal-01500302
17E. Kaufmann, T. Bonald, M. Lelarge.
A Spectral Algorithm with Additive Clustering for the Recovery of Overlapping Communities in Networks, in: Journal of Theoretical Computer Science (TCS), 2017, https://arxiv.org/abs/1506.04158, forthcoming.
https://hal.archives-ouvertes.fr/hal-01163147
18E. Kaufmann, A. Garivier.
Learning the distribution with largest mean: two bandit frameworks, in: ESAIM: Proceedings and Surveys, 2017, vol. 2017, pp. 1 - 10, https://arxiv.org/abs/1702.00001, forthcoming.
https://hal.archives-ouvertes.fr/hal-01449822
19E. Kaufmann.
On Bayesian index policies for sequential resource allocation, in: Annals of Statistics, 2017, https://arxiv.org/abs/1601.01190, forthcoming.
https://hal.archives-ouvertes.fr/hal-01251606
20V. Musco, M. Monperrus, P. Preux.
A Large-scale Study of Call Graph-based Impact Prediction using Mutation Testing, in: Software Quality Journal, September 2017, vol. 25, n^o 3, pp. 921–950. [ DOI : 10.1007/s11219-016-9332-8 ]
https://hal.inria.fr/hal-01346046

International Conferences with Proceedings

21M. Abeille, A. Lazaric.
Linear Thompson Sampling Revisited, in: AISTATS 2017 - 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, United States, April 2017.
https://hal.inria.fr/hal-01493561
22M. Abeille, A. Lazaric.
Thompson Sampling for Linear-Quadratic Control Problems, in: AISTATS 2017 - 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, United States, April 2017.
https://hal.inria.fr/hal-01493564
23B. Balle, O.-A. Maillard.
Spectral Learning from a Single Trajectory under Finite-State Policies, in: International conference on Machine Learning, Sidney, France, Proceedings of the International conference on Machine Learning, July 2017.
https://hal.archives-ouvertes.fr/hal-01590940
24S. Brodeur, E. Perez, A. Anand, F. Golemo, L. Celotti, F. Strub, J. Rouat, H. Larochelle, A. Courville.
HoME: a Household Multimodal Environment, in: NIPS 2017's Visually-Grounded Interaction and Language Workshop, Long Beach, United States, December 2017, https://arxiv.org/abs/1711.11017.
https://hal.inria.fr/hal-01653037
25A. Bérard, O. Pietquin, L. Besacier.
LIG-CRIStAL System for the WMT17 Automatic Post-Editing Task, in: Second conference on machine translation (WMT17) during EMNLP 2017, Copenhague, Denmark, September 2017.
https://hal.archives-ouvertes.fr/hal-01580881
26D. Calandriello, A. Lazaric, M. Valko.
Distributed adaptive sampling for kernel matrix approximation, in: International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, United States, 2017.
https://hal.inria.fr/hal-01482760
27D. Calandriello, A. Lazaric, M. Valko.
Efficient second-order online kernel learning with adaptive embedding, in: NIPS 2017 : The Thirty-first Annual Conference on Neural Information Processing Systems, Long Beach, United States, December 2017, pp. 1-17.
https://hal.inria.fr/hal-01643961
28D. Calandriello, A. Lazaric, M. Valko.
Second-Order Kernel Online Convex Optimization with Adaptive Sketching, in: International Conference on Machine Learning, Sydney, Australia, 2017.
https://hal.inria.fr/hal-01537799
29H. De Vries, F. Strub, S. Chandar, O. Pietquin, H. Larochelle, A. Courville.
GuessWhat?! Visual object discovery through multi-modal dialogue, in: Conference on Computer Vision and Pattern Recognition, Honolulu, United States, July 2017, https://arxiv.org/abs/1611.08481.
https://hal.inria.fr/hal-01549641
30H. De Vries, F. Strub, J. Mary, H. Larochelle, O. Pietquin, A. Courville.
Modulating early visual processing by language, in: NIPS 2017 - Conference on Neural Information Processing Systems, Long Beach, United States, December 2017, pp. 1-14, https://arxiv.org/abs/1707.00683.
https://hal.inria.fr/hal-01648683
31A. Erraqabi, A. Lazaric, M. Valko, E. Brunskill, Y.-E. Liu.
Trading off rewards and errors in multi-armed bandits, in: International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, United States, 2017.
https://hal.inria.fr/hal-01482765
32C. Z. Felício, K. V. R. Paixão, C. A. Z. Barcelos, P. Preux.
A Multi-Armed Bandit Model Selection for Cold-Start User Recommendation, in: 25th ACM Conference on User Modelling, Adaptation and Personalization (UMAP), Bratislava, Slovakia, July 2017.
https://hal.inria.fr/hal-01517967
33R. Fruit, A. Lazaric.
Exploration–Exploitation in MDPs with Options, in: AISTATS 2017 - 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, United States, April 2017.
https://hal.inria.fr/hal-01493567
34R. Fruit, M. Pirotta, A. Lazaric, E. Brunskill.
Regret Minimization in MDPs with Options without Prior Knowledge, in: NIPS 2017 - Neural Information Processing Systems, Long Beach, United States, December 2017, pp. 1-36.
https://hal.inria.fr/hal-01649082
35G. Gautier, R. Bardenet, M. Valko.
Zonotope hit-and-run for efficient sampling from projection DPPs, in: International Conference on Machine Learning, Sydney, Australia, 2017.
https://hal.inria.fr/hal-01526577
36M. Geist, B. Piot, O. Pietquin.
Is the Bellman residual a bad proxy?, in: NIPS 2017 - Advances in Neural Information Processing Systems, Long Beach, United States, December 2017, pp. 1-13.
https://hal.archives-ouvertes.fr/hal-01629739
37E. Kaufmann, W. M. Koolen.
Monte-Carlo Tree Search by Best Arm Identification, in: NIPS 2017 - 31st Annual Conference on Neural Information Processing Systems, Long Beach, United States, Advances in Neural Information Processing Systems, December 2017, pp. 1-23, https://arxiv.org/abs/1706.02986.
https://hal.archives-ouvertes.fr/hal-01535907
38R. Laroche, M. Barlier.
Transfer Reinforcement Learning with Shared Dynamics, in: AAAI-17 - Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, United States, February 2017, 7 p.
https://hal.archives-ouvertes.fr/hal-01548649
39O.-A. Maillard.
Boundary Crossing for General Exponential Families, in: Algorithmic Learning Theory, Kyoto, Japan, Proceedings of Algorithmic Learning Theory, October 2017, vol. 1, pp. 1 - 34.
https://hal.archives-ouvertes.fr/hal-01615427
40A. M. Metelli, M. Pirotta, M. Restelli.
Compatible Reward Inverse Reinforcement Learning, in: The Thirty-first Annual Conference on Neural Information Processing Systems - NIPS 2017, Long Beach, United States, December 2017.
https://hal.inria.fr/hal-01653328
41J. Mourtada, O.-A. Maillard.
Efficient tracking of a growing number of experts, in: Algorithmic Learning Theory, Tokyo, Japan, Proceedings of Algorithmic Learning Theory, October 2017, vol. 76, pp. 1 - 23.
https://hal.archives-ouvertes.fr/hal-01615424
42M. Papini, M. Pirotta, M. Restelli.
Adaptive Batch Size for Safe Policy Gradients, in: The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS), Long Beach, United States, December 2017.
https://hal.inria.fr/hal-01653330
43G. Papoudakis, P. Preux, M. Monperrus.
A generative model for sparse, evolving digraphs, in: 6th International Conference on Complex Networks and their applications, Lyon, France, November 2017, https://arxiv.org/abs/1710.06298. [ DOI : 10.1007/978-3-319-72150-7_43 ]
https://hal.inria.fr/hal-01617851
44E. Perez, H. De Vries, F. Strub, V. Dumoulin, A. Courville.
Learning Visual Reasoning Without Strong Priors, in: ICML 2017's Machine Learning in Speech and Language Processing Workshop, Sidney, France, August 2017, https://arxiv.org/abs/1709.07871.
https://hal.inria.fr/hal-01648684
45E. Perez, F. Strub, H. De Vries, V. Dumoulin, A. Courville.
FiLM: Visual Reasoning with a General Conditioning Layer, in: AAAI Conference on Artificial Intelligence, New Orleans, United States, February 2018, https://arxiv.org/abs/1707.03017.
https://hal.inria.fr/hal-01648685
46J. Pérolat, F. Strub, B. Piot, O. Pietquin.
Learning Nash Equilibrium for General-Sum Markov Games from Batch Data, in: AISTATS 2017 - The 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, United States, April 2017, pp. 1-14.
https://hal.inria.fr/hal-01648489
47C. Riquelme, M. Ghavamzadeh, A. Lazaric.
Active Learning for Accurate Estimation of Linear Models, in: ICML 2017 - 34th International Conference on Machine Learning, Sydney, Australia, August 2017, 36 p.
https://hal.inria.fr/hal-01538762
48D. Ryabko.
Hypotheses testing on infinite random graphs, in: ALT 2017 - 28th International Conference on Algorithmic Learning Theory, kyoto, Japan, October 2017, pp. 1-12, https://arxiv.org/abs/1708.03131.
https://hal.inria.fr/hal-01627330
49D. Ryabko.
Independence clustering (without a matrix), in: NIPS 2017 - Thirty-first Annual Conference on Neural Information Processing Systems, Long Beach, United States, December 2017, pp. 1-14, https://arxiv.org/abs/1703.06700.
https://hal.inria.fr/hal-01627333
50D. Ryabko.
Universality of Bayesian mixture predictors, in: ALT 2017 - 28th International Conference on Algorithmic Learning Theory, Kyoto, Japan, October 2017, pp. 1-13, https://arxiv.org/abs/1610.08249.
https://hal.inria.fr/hal-01627332
51F. Strub, H. De Vries, J. Mary, B. Piot, A. Courville, O. Pietquin.
End-to-end optimization of goal-driven and visually grounded dialogue systems Harm de Vries, in: International Joint Conference on Artificial Intelligence, Melbourne, Australia, August 2017, https://arxiv.org/abs/1703.05423.
https://hal.inria.fr/hal-01549642
52S. Tosatto, M. Pirotta, C. D'Eramo, M. Restelli.
Boosted Fitted Q-Iteration, in: 34th International Conference on Machine Learning (ICML), Sydney, Australia, August 2017.
https://hal.inria.fr/hal-01653332
53N. Tziortziotis, C. Dimitrakakis.
Bayesian Inference for Least Squares Temporal Difference Regularization, in: ECML 2017 - European Conference on Machine Learning, Skopje, Macedonia, 2017-09-22, September 2017.
https://hal.inria.fr/hal-01593212
54Z. Wen, B. Kveton, M. Valko, S. Vaswani.
Online influence maximization under independent cascade model with semi-bandit feedback, in: NIPS 2017 - Neural Information Processing Systems, Long Beach, United States, December 2017, pp. 1-24.
https://hal.inria.fr/hal-01643976
55M. Zanon Boito, A. Bérard, A. Villavicencio, L. Besacier.
Unwritten Languages Demand Attention Too! Word Discovery with Encoder-Decoder Models, in: IEEE Automatic Speech Recognition and Understanding (ASRU), Okinawa, Japan, December 2017.
https://hal.archives-ouvertes.fr/hal-01592091

National Conferences with Proceedings

56M. Geist, B. Piot, O. Pietquin.
Faut-il minimiser le résidu de Bellman ou maximiser la valeur moyenne ?, in: Journées Francophones sur la Planification, la Décision et l'Apprentissage pour la conduite de systèmes (JFPDA 2017), Caen, France, Actes des Journées Francophones sur la Planification, la Décision et l'Apprentissage pour la conduite de systèmes (JFPDA 2017), July 2017.
https://hal.archives-ouvertes.fr/hal-01576347

Conferences without Proceedings

57R. Bonnefoi, L. Besson, C. Moy, E. Kaufmann, J. Palicot.
Multi-Armed Bandit Learning in IoT Networks: Learning helps even in non-stationary settings, in: CROWNCOM 2017 - 12th EAI International Conference on Cognitive Radio Oriented Wireless Networks, Lisbon, Portugal, September 2017.
https://hal.archives-ouvertes.fr/hal-01575419
58N. Carrara, R. Laroche, O. Pietquin.
Online learning and transfer for user adaptation in dialogue systems, in: SIGDIAL/SEMDIAL joint special session on negotiation dialog 2017, Saarbrücken, Germany, August 2017.
https://hal.archives-ouvertes.fr/hal-01557775

Other Publications

59L. Besson, E. Kaufmann.
Multi-Player Bandits Models Revisited, October 2017, https://arxiv.org/abs/1711.02317 - working paper or preprint.
https://hal.inria.fr/hal-01629733
60C. Dimitrakakis, F. Jarboui, D. Parkes, L. Seeman.
Multi-view Sequential Games: The Helper-Agent Problem, February 2017, working paper or preprint.
https://hal.archives-ouvertes.fr/hal-01408294
61C. Dimitrakakis, Y. Liu, D. Parkes, G. Radanovic.
Subjective Fairness: Fairness is in the eye of the beholder, July 2017, https://arxiv.org/abs/1706.00119 - working paper or preprint.
https://hal.inria.fr/hal-01531849
62A. R. Luedtke, E. Kaufmann, A. Chambaz.
Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits, October 2017, https://arxiv.org/abs/1606.09388 - working paper or preprint.
https://hal.archives-ouvertes.fr/hal-01338733
63O.-A. Maillard.
Basic Concentration Properties of Real-Valued Distributions, September 2017, Lecture.
https://hal.archives-ouvertes.fr/cel-01632228

References in notes

64R. Allesiardo, R. Féraud, O.-A. Maillard.
The Non-stationary Stochastic Multi-armed Bandit Problem, in: International Journal of Data Science and Analytics, 2017, vol. 3, n^o 4, pp. 267–283. [ DOI : 10.1007/s41060-017-0050-5 ]
https://hal.archives-ouvertes.fr/hal-01575000
65P. Auer, N. Cesa-Bianchi, P. Fischer.
Finite-time analysis of the multi-armed bandit problem, in: Machine Learning, 2002, vol. 47, n^o 2/3, pp. 235–256.
66R. Bellman.
Dynamic Programming, Princeton University Press, 1957.
67D. Bertsekas, S. Shreve.
Stochastic Optimal Control (The Discrete Time Case), Academic Press, New York, 1978.
68D. Bertsekas, J. Tsitsiklis.
Neuro-Dynamic Programming, Athena Scientific, 1996.
69M. Puterman.
Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley and Sons, 1994.
70H. Robbins.
Some aspects of the sequential design of experiments, in: Bull. Amer. Math. Soc., 1952, vol. 55, pp. 527–535.
71R. Sutton, A. Barto.
Reinforcement learning: an introduction, MIT Press, 1998.
72P. Werbos.
ADP: Goals, Opportunities and Principles, IEEE Press, 2004, pp. 3–44, Handbook of learning and approximate dynamic programming.

Previous |

Home