Inria | Raweb 2018 | Presentation of the Project-Team SEQUEL | SEQUEL Web Site


	PDF	e-Pub

Previous |

Home

Bibliography

Major publications by the team in recent years

1O. Cappé, A. Garivier, O.-A. Maillard, R. Munos, G. Stoltz.
Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation, in: Annals of Statistics, 2013, vol. 41, n^o 3, pp. 1516-1541, Accepted, to appear in Annals of Statistics.
https://hal.archives-ouvertes.fr/hal-00738209
2A. Carpentier, M. Valko.
Revealing graph bandits for maximizing local influence, in: International Conference on Artificial Intelligence and Statistics, Seville, Spain, May 2016.
https://hal.inria.fr/hal-01304020
3H. De Vries, F. Strub, J. Mary, H. Larochelle, O. Pietquin, A. Courville.
Modulating early visual processing by language, in: Conference on Neural Information Processing Systems, Long Beach, United States, December 2017.
https://hal.inria.fr/hal-01648683
4N. Gatti, A. Lazaric, M. Rocco, F. Trovò.
Truthful Learning Mechanisms for Multi–Slot Sponsored Search Auctions with Externalities, in: Artificial Intelligence, October 2015, vol. 227, pp. 93-139.
https://hal.inria.fr/hal-01237670
5M. Ghavamzadeh, Y. Engel, M. Valko.
Bayesian Policy Gradient and Actor-Critic Algorithms, in: Journal of Machine Learning Research, January 2016, vol. 17, n^o 66, pp. 1-53.
https://hal.inria.fr/hal-00776608
6H. Kadri, E. Duflos, P. Preux, S. Canu, A. Rakotomamonjy, J. Audiffren.
Operator-valued Kernels for Learning from Functional Response Data, in: Journal of Machine Learning Research (JMLR), 2016.
https://hal.archives-ouvertes.fr/hal-01221329
7E. Kaufmann, O. Cappé, A. Garivier.
On the Complexity of Best Arm Identification in Multi-Armed Bandit Models, in: Journal of Machine Learning Research, January 2016, vol. 17, pp. 1-42.
https://hal.archives-ouvertes.fr/hal-01024894
8A. Lazaric, M. Ghavamzadeh, R. Munos.
Analysis of Classification-based Policy Iteration Algorithms, in: Journal of Machine Learning Research, 2016, vol. 17, pp. 1 - 30.
https://hal.inria.fr/hal-01401513
9R. Munos.
From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning, in: Foundations and Trends in Machine Learning, 2014, vol. 7, n^o 1, pp. 1-129.
http://dx.doi.org/10.1561/2200000038
10R. Ortner, D. Ryabko, P. Auer, R. Munos.
Regret bounds for restless Markov bandits, in: Journal of Theoretical Computer Science (TCS), 2014, vol. 558, pp. 62-76. [ DOI : 10.1016/j.tcs.2014.09.026 ]
https://hal.inria.fr/hal-01074077

Publications of the year

Doctoral Dissertations and Habilitation Theses

11R. Warlop.
Novel Learning and Exploration-Exploitation Methods for Effective Recommender Systems, Lille1, October 2018.
https://hal.inria.fr/tel-01915499

Articles in International Peer-Reviewed Journals

12B. Danglot, P. Preux, B. Baudry, M. Monperrus.
Correctness Attraction: A Study of Stability of Software Behavior Under Runtime Perturbation, in: Empirical Software Engineering, August 2018, vol. 23, n^o 4, pp. 2086–2119, https://arxiv.org/abs/1611.09187. [ DOI : 10.1007/s10664-017-9571-8 ]
https://hal.archives-ouvertes.fr/hal-01378523
13V. Dumoulin, E. Perez, H. Vries, F. Strub, N. Schucher, A. Courville, Y. Bengio.
Feature-wise transformations: A simple and surprisingly effective family of conditioning mechanisms, in: Distill, July 2018, vol. 3, n^o 7. [ DOI : 10.23915/distill.00011 ]
https://hal.inria.fr/hal-01841985
14A. Durand, O.-A. Maillard, J. Pineau.
Streaming kernel regression with provably adaptive mean, variance, and regularization, in: Journal of Machine Learning Research, 2018, vol. 1, pp. 1 - 48, https://arxiv.org/abs/1708.00768.
https://hal.archives-ouvertes.fr/hal-01927007
15E. Kaufmann, T. Bonald, M. Lelarge.
A spectral algorithm with additive clustering for the recovery of overlapping communities in networks, in: Theoretical Computer Science, September 2018, vol. 742, pp. 3-26.
https://hal.archives-ouvertes.fr/hal-01963868
16O.-A. Maillard.
Boundary Crossing Probabilities for General Exponential Families, in: Mathematical Methods of Statistics, 2018, vol. 27.
https://hal.archives-ouvertes.fr/hal-01737150
17M. S. Talebi, O.-A. Maillard.
Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs, in: Journal of Machine Learning Research, April 2018, pp. 1-36.
https://hal.archives-ouvertes.fr/hal-01737142

International Conferences with Proceedings

18Y. Abbasi-Yadkori, P. Bartlett, V. Gabillon, A. Malek, M. Valko.
Best of both worlds: Stochastic & adversarial best-arm identification, in: Conference on Learning Theory, Stockholm, Sweden, 2018.
https://hal.inria.fr/hal-01808948
19M. Aziz, J. Anderton, E. Kaufmann, J. Aslam.
Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence, in: ALT 2018 - Algorithmic Learning Theory, Lanzarote, Spain, JMLR Workshop and Conference Proceedings, April 2018, https://arxiv.org/abs/1803.04665.
https://hal.archives-ouvertes.fr/hal-01729969
20M. Barlier, R. Laroche, O. Pietquin.
Training Dialogue Systems With Human Advice, in: AAMAS 2018 - the 17th International Conference on Autonomous Agents and Multiagent Systems, Stockholm, Sweden, International Foundation for Autonomous Agents and MultiAgent Systems (IFAAMAS), July 2018, 9 p.
https://hal.archives-ouvertes.fr/hal-01945831
21P. Bartlett, V. Gabillon, M. Valko.
A simple parameter-free and adaptive approach to optimization under a minimal local smoothness assumption, in: Algorithmic Learning Theory, Chicago, United States, 2019.
https://hal.inria.fr/hal-01885368
22L. Besson, E. Kaufmann.
Multi-Player Bandits Revisited, in: Algorithmic Learning Theory, Lanzarote, Spain, Mehryar Mohri and Karthik Sridharan, April 2018, https://arxiv.org/abs/1711.02317.
https://hal.inria.fr/hal-01629733
23L. Besson, E. Kaufmann, C. Moy.
Aggregation of Multi-Armed Bandits Learning Algorithms for Opportunistic Spectrum Access, in: IEEE WCNC - IEEE Wireless Communications and Networking Conference, Barcelona, Spain, April 2018. [ DOI : 10.1109/wcnc.2018.8377070 ]
https://hal.inria.fr/hal-01705292
24A. Bérard, L. Besacier, A. C. Kocabiyikoglu, O. Pietquin.
End-to-End Automatic Speech Translation of Audiobooks, in: ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, Alberta, Canada, April 2018.
https://hal.archives-ouvertes.fr/hal-01709586
25D. Calandriello, I. Koutis, A. Lazaric, M. Valko.
Improved large-scale graph learning through ridge spectral sparsification, in: International Conference on Machine Learning, Stockholm, Sweden, ICML 2018 - Thirty-fifth International Conference on Machine Learning, July 2018.
https://hal.inria.fr/hal-01810980
26N. Carrara, R. Laroche, J.-L. Bouraoui, T. Urvoy, O. Pietquin.
A Fitted-Q Algorithm for Budgeted MDPs, in: EWRL 2018 - 14th European workshop on Reinforcement Learning, Lille, France, October 2018.
https://hal.archives-ouvertes.fr/hal-01928092
27N. Carrara, R. Laroche, J.-L. Bouraoui, T. Urvoy, O. Pietquin.
Safe transfer learning for dialogue applications, in: SLSP 2018 - 6th International Conference on Statistical Language and Speech Processing, Mons, Belgium, October 2018.
https://hal.archives-ouvertes.fr/hal-01928102
28R. Fruit, M. Pirotta, A. Lazaric.
Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes, in: 32nd Conference on Neural Information Processing Systems, Montréal, Canada, December 2018.
https://hal.inria.fr/hal-01941220
29R. Fruit, M. Pirotta, A. Lazaric, R. Ortner.
Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning, in: ICML 2018 - The 35th International Conference on Machine Learning, Stockholm, Sweden, Proceedings of Machine Learning Research, July 2018, vol. 80, pp. 1578-1586.
https://hal.inria.fr/hal-01941206
30P. Gajane, T. Urvoy, E. Kaufmann.
Corrupt Bandits for Preserving Local Privacy, in: ALT 2018 - Algorithmic Learning Theory, Lanzarote, Spain, Proceedings of Machine Learning Research, April 2018.
https://hal.archives-ouvertes.fr/hal-01757297
31J.-B. Grill, M. Valko, R. Munos.
Optimistic optimization of a Brownian, in: NeurIPS 2018 - Thirty-second Conference on Neural Information Processing Systems, Montréal, Canada, December 2018.
https://hal.inria.fr/hal-01906601
32J.-H. Jacobsen, A. Smeulders, E. Oyallon.
i-RevNet: Deep Invertible Networks, in: ICLR 2018 - International Conference on Learning Representations, Vancouver, Canada, April 2018, https://arxiv.org/abs/1802.07088.
https://hal.archives-ouvertes.fr/hal-01712808
33E. Oyallon, E. Belilovsky, S. Zagoruyko, M. Valko.
Compressing the Input for CNNs with the First-Order Scattering Transform, in: European Conference on Computer Vision, Munich, Germany, 2018.
https://hal.inria.fr/hal-01850921
34M. Papini, D. Binaghi, G. Canonaco, M. Pirotta, M. Restelli.
Stochastic Variance-Reduced Policy Gradient, in: ICML 2018 - 35th International Conference on Machine Learning, Stockholm, Sweden, Proceedings of Machine Learning Research, July 2018, vol. 80, pp. 4026-4035.
https://hal.inria.fr/hal-01940394
35E. Perez, F. Strub, H. De Vries, V. Dumoulin, A. Courville.
FiLM: Visual Reasoning with a General Conditioning Layer, in: AAAI Conference on Artificial Intelligence, New Orleans, United States, February 2018, https://arxiv.org/abs/1707.03017.
https://hal.inria.fr/hal-01648685
36J. Pérolat, B. Piot, O. Pietquin.
Actor-Critic Fictitious Play in Simultaneous Move Multistage Games, in: AISTATS 2018 - 21st International Conference on Artificial Intelligence and Statistics, Playa Blanca, Lanzarote, Canary Islands, Spain, April 2018.
https://hal.inria.fr/hal-01724227
37J. Seznec, A. Locatelli, A. Carpentier, A. Lazaric, M. Valko.
Rotting bandits are no harder than stochastic ones, in: International Conference on Artificial Intelligence and Statistics, Okinawa, Japan, 2019.
https://hal.inria.fr/hal-01936894
38A. Tirinzoni, A. Sessa, M. Pirotta, M. Restelli.
Importance Weighted Transfer of Samples in Reinforcement Learning, in: ICML 2018 - The 35th International Conference on Machine Learning, Stockholm, Sweden, Proceedings of Machine Learning Research, July 2018, vol. 80, pp. 4936-4945.
https://hal.inria.fr/hal-01941213

Conferences without Proceedings

39E. Kaufmann, W. Koolen, A. Garivier.
Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling, in: Advances in Neural Information Processing Systems (NIPS), Montréal, Canada, December 2018, https://arxiv.org/abs/1806.00973.
https://hal.archives-ouvertes.fr/hal-01804581
40E. Leurent, Y. Blanco, D. Efimov, O.-A. Maillard.
Approximate Robust Control of Uncertain Dynamical Systems, in: 32nd Conference on Neural Information Processing Systems (NeurIPS 2018) Workshop, Montréal, Canada, December 2018.
https://hal.archives-ouvertes.fr/hal-01931744
41X. Shang, E. Kaufmann, M. Valko.
Adaptive black-box optimization got easier: HCT only needs local smoothness, in: European Workshop on Reinforcement Learning, Lille, France, October 2018.
https://hal.inria.fr/hal-01874637
42F. Strub, M. Seurin, E. Perez, H. De Vries, J. Mary, P. Preux, A. Courville, O. Pietquin.
Visual Reasoning with Multi-hop Feature Modulation, in: ECCV 2018 - 15th European Conference on Computer Vision, Munich, Germany, V. Ferrari, M. Hebert, C. Sminchisescu, Y. Weiss (editors), Part of the Lecture Notes in Computer Science book series - LNCS, September 2018, vol. 11205-11220, n^o 11209, pp. 808-831, https://arxiv.org/abs/1808.04446.
https://hal.archives-ouvertes.fr/hal-01927811
43R. Warlop, A. Lazaric, J. Mary.
Fighting Boredom in Recommender Systems with Linear Reinforcement Learning, in: Neural Information Processing Systems, Montreal, Canada, December 2018.
https://hal.inria.fr/hal-01915468

Other Publications

44R. Alami, O.-A. Maillard, R. Féraud.
Memory Bandits: Towards the Switching Bandit Problem Best Resolution, August 2018, MLSS 2018 - Machine Learning Summer School, Poster.
https://hal.archives-ouvertes.fr/hal-01879251
45L. Besson.
A Note on the Ei Function and a Useful Sum-Inequality, July 2018.
https://hal.inria.fr/hal-01847480
46L. Besson.
SMPyBandits: an Experimental Framework for Single and Multi-Players Multi-Arms Bandits Algorithms in Python, July 2018, working paper or preprint.
https://hal.inria.fr/hal-01840022
47L. Besson, E. Kaufmann.
What Doubling Tricks Can and Can't Do for Multi-Armed Bandits, February 2018, https://arxiv.org/abs/1803.06971 - working paper or preprint.
https://hal.inria.fr/hal-01736357
48N. Carrara, R. Laroche, J.-L. Bouraoui, T. Urvoy, O. Pietquin.
A Fitted-Q Algorithm for Budgeted MDPs, August 2018, Workshop on Safety, Risk and Uncertainty in Reinforcement Learning. https://sites.google.com/view/rl-uai2018/.
https://hal.archives-ouvertes.fr/hal-01867353
49G. Gautier, R. Bardenet, M. Valko.
DPPy: Sampling Determinantal Point Processes with Python, September 2018, working paper or preprint.
https://hal.inria.fr/hal-01879424
50E. Kaufmann, W. M. Koolen.
Mixture Martingales Revisited with Applications to Sequential Tests and Confidence Intervals, October 2018, https://arxiv.org/abs/1811.11419 - working paper or preprint.
https://hal.archives-ouvertes.fr/hal-01886612
51E. Leurent.
A Survey of State-Action Representations for Autonomous Driving, October 2018, working paper or preprint.
https://hal.archives-ouvertes.fr/hal-01908175
52Y. Liu, G. Radanovic, C. Dimitrakakis, D. Mandal, D. C. Parkes.
Calibrated Fairness in Bandits, December 2018, working paper or preprint. [ DOI : 10.1145/nnnnnnn.nnnnnnn ]
https://hal.inria.fr/hal-01953314
53O.-A. Maillard, M. Asadi.
Upper Confidence Reinforcement Learning exploiting state-action equivalence, December 2018, working paper or preprint.
https://hal.archives-ouvertes.fr/hal-01945034
54K. Villatel, E. Smirnova, J. Mary, P. Preux.
Recurrent Neural Networks for Long and Short-Term Sequential Recommendation, July 2018, https://arxiv.org/abs/1807.09142 - working paper or preprint.
https://hal.inria.fr/hal-01847127
55H. van Hasselt, Y. Doron, F. Strub, M. Hessel, N. Sonnerat, J. Modayil.
Deep Reinforcement Learning and the Deadly Triad, December 2018, https://arxiv.org/abs/1812.02648 - working paper or preprint.
https://hal.archives-ouvertes.fr/hal-01949304

References in notes

56P. Auer, N. Cesa-Bianchi, P. Fischer.
Finite-time analysis of the multi-armed bandit problem, in: Machine Learning, 2002, vol. 47, n^o 2/3, pp. 235–256.
57R. Bellman.
Dynamic Programming, Princeton University Press, 1957.
58D. Bertsekas, S. Shreve.
Stochastic Optimal Control (The Discrete Time Case), Academic Press, New York, 1978.
59D. Bertsekas, J. Tsitsiklis.
Neuro-Dynamic Programming, Athena Scientific, 1996.
60M. Puterman.
Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley and Sons, 1994.
61H. Robbins.
Some aspects of the sequential design of experiments, in: Bull. Amer. Math. Soc., 1952, vol. 55, pp. 527–535.
62R. Sutton, A. Barto.
Reinforcement learning: an introduction, MIT Press, 1998.
63P. Werbos.
ADP: Goals, Opportunities and Principles, IEEE Press, 2004, pp. 3–44, Handbook of learning and approximate dynamic programming.

Previous |

Home