Bibliography
Publications of the year
Doctoral Dissertations and Habilitation Theses
-
1A. Carpentier.
Toward optimal sampling in low and high dimension, Université Lille 1, Lille, France, Octobre 2012. -
2E. Delande.
Multi-sensor PHD filtering with application to sensor management, Ecole Centrale, Lille, France, Octobre 2012.
http://www. theses. fr/ 2012ECLI0001 -
3J. F. Hren.
Planification optimiste pour systèmes dèterministes, Université Lille 1, Lille, France, Juin 2012. -
4C. Salperwyck.
Apprentissage incrémental en ligne sur flux de données, Université de Lille 3, Nov 2012.
Articles in International Peer-Reviewed Journals
-
5M. G. Azar, R. Munos, H. Kappen.
Minimax PAC-Bounds on the Sample Complexity of Reinforcement Learning with a Generative Model, in: Machine Learning Journal, 2012, To appear. -
6O. Cappé, A. Garivier, O.-A. Maillard, R. Munos, G. Stoltz.
Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation, in: Annals of Statistics, 2012, Submitted to. -
7A. Carpentier, A. Lazaric, M. Ghavamzadeh, R. Munos, P. Auer.
Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits, in: Theoretical Computer Science, 2012, To appear. -
8A. Carpentier, R. Munos, A. Antos.
Minimax strategy for Stratified Sampling for Monte Carlo, in: Journal of Machine Learning Research, 2012, Submitted to. -
9G. Dulac-Arnold, L. Denoyer, P. Preux, P. Gallinari.
Sequential approaches for learning datum-wise sparse representations, in: Machine Learning, October 2012, vol. 89, no 1-2, p. 87-122. -
10J. Fruitet, A. Carpentier, R. Munos, M. Clerc.
Automatic motor task selection via a bandit algorithm for a brain-controlled button, in: Journal of Neural Engineering, 2012, To appear. -
11S. Girgin, J. Mary, P. Preux, O. Nicol.
Managing advertising campaigns – an approximate planning approach, in: Frontiers of Computer Science, 2012, vol. 6, no 2, p. 209-229. [ DOI : 10.1007/s11704-012-2873-5 ]
http://hal. inria. fr/ hal-00747722 -
12M. Hauskrecht, I. Batal, M. Valko, S. Visweswaran, G. F. Cooper, G. Clermont.
Outlier detection for patient monitoring and alerting., in: Journal of Biomedical Informatics, August 2012. [ DOI : 10.1016/j.jbi.2012.08.004 ]
http://hal. inria. fr/ hal-00742097 -
13A. Lazaric, M. Ghavamzadeh, R. Munos.
Analysis of Classification-based Policy Iteration Algorithms, in: Journal of Machine learning Research, 2012, Submitted to. -
14A. Lazaric, M. Ghavamzadeh, R. Munos.
Finite-Sample Analysis of Least-Squares Policy Iteration, in: Journal of Machine Learning Research, 2012, vol. 13, p. 3041-3074. -
15A. Lazaric, R. Munos.
Learning with stochastic inputs and adversarial outputs, in: Journal of Computer and System Sciences (JCSS), 2012, vol. 78, no 5, p. 1516–1537. [ DOI : 10.1016/j.jcss.2011.12.027 ]
http://www. sciencedirect. com/ science/ article/ pii/ S002200001200027X -
16O.-A. Maillard, R. Munos.
Linear Regression with Random Projections, in: Journal of Machine learning Research, 2012, vol. 13, p. 2735-2772. -
17R. Munos.
The Optimistic Principle applied to Games, Optimization and Planning: Towards Foundations of Monte-Carlo Tree Search, in: Foundations and Trends in Machine Learning, 2012, Submitted to.
http://hal. archives-ouvertes. fr/ hal-00747575 -
18O. Nicol, J. Mary, P. Preux.
ICML Exploration & Exploitation challenge: Keep it simple!, in: Journal of Machine Learning research Workshop and Conference Proceedings, 2012, vol. 26, p. 62-85.
http://hal. inria. fr/ hal-00747725 -
19A. Rabaoui, N. Viandier, J. Marais, E. Duflos, P. Vanheeghe.
Dirichlet Process Mixtures for Density Estimation in Dynamic Nonlinear Modeling: Application to GPS Positioning in Urban Canyons, in: IEEE Transactions on Signal Processing, April 2012, vol. 60, no 4, p. 1638 - 1655. [ DOI : 10.1109/TSP.2011.2180901 ]
http://hal. inria. fr/ hal-00712718 -
20S. Razavi, E. Duflos, C. Haas, P. Vanheeghe.
Dislocation detection in field environments: A belief functions contribution, in: Expert Systems with Applications, August 2012, vol. 39, no 10, p. 8505-8513. [ DOI : 10.1016/j.eswa.2011.12.014 ]
http://hal. inria. fr/ hal-00712720 -
21D. Ryabko.
Testing composite hypotheses about discrete ergodic processes, in: Test, 2012, vol. 21, no 2, p. 317-329. -
22D. Ryabko.
Uniform hypothesis testing for finite-valued stationary processes, in: Statistics, 2013. -
23M. Valko, M. Ghavamzadeh, A. Lazaric.
Semi-Supervised Apprenticeship Learning, in: Journal of Machine Learning Research: Workshop and Conference Proceedings, November 2012, vol. 24.
http://hal. inria. fr/ hal-00747921
International Conferences with Proceedings
-
24M. G. Azar, R. Munos, H. Kappen.
On the Sample Complexity of Reinforcement Learning with a Generative Model, in: International Conference on Machine Learning, 2012. -
25L. Busoniu, R. Munos.
Optimistic planning in Markov decision processes, in: International conference on Artificial Intelligence and Statistics, 2012. -
26A. Carpentier, R. Munos.
Adaptive Stratified Sampling for Monte-Carlo integration of Differentiable functions, in: Advances in Neural Information Processing Systems, 2012. -
27A. Carpentier, R. Munos.
Bandit Theory meets Compressed Senseing for high dimensional Stochastic Linear Bandit, in: International conference on Artificial Intelligence and Statistics, 2012. -
28A. Carpentier, R. Munos.
Minimax number of strata for online Stratified Sampling given Noisy Samples, in: International Conference on Algorithmic Learning Theory, 2012. -
29P. Chainais.
Towards dictionary learning from images with non Gaussian noise, in: IEEE Int. Workshop on Machine Learning for Signal Processing, Santander, Spain, September 2012, 0000 p.
http://hal. inria. fr/ hal-00749035 -
30R. Coulom.
CLOP: Confident Local Optimization for Noisy Black-Box Parameter Tuning, in: Advances in Computer Games - 13th International Conference, Tilburg, Pays-Bas, H. J. van den Herik, A. Plaat (editors), Lecture Notes in Computer Science, Springer, 2012, vol. 7168, p. 146-157. [ DOI : 10.1007/978-3-642-31866-5_13 ]
http://hal. inria. fr/ hal-00750326 -
31G. Dulac-Arnold, L. Denoyer, P. Preux, P. Gallinari.
Fast Reinforcement Learning with Large Action Sets Using Error-Correcting Output Codes for MDP Factorization, in: European Conference on Machine Learning, Bristol, United Kingdom, Springer, 2012, vol. 2, p. 180-194. [ DOI : 10.1007/978-3-642-33486-3_12 ]
http://hal. inria. fr/ hal-00747729 -
32J. Fruitet, A. Carpentier, R. Munos, M. Clerc.
Bandit Algorithms boost motor-task selection for Brain Computer Interfaces, in: Advances in Neural Information Processing Systems, 2012. -
33V. Gabillon, M. Ghavamzadeh, A. Lazaric.
Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence, in: Proceedings of Advances in Neural Information Processing Systems 25, MIT Press, 2012. -
34N. Gatti, A. Lazaric, F. Trovò.
A Truthful Learning Mechanism for Multi-Slot Sponsored Search Auctions with Externalities (Extended Abstract), in: AAMAS, 2012. -
35N. Gatti, A. Lazaric, F. Trovò.
A Truthful Learning Mechanism for Multi-Slot Sponsored Search Auctions with Externalities, in: Proceedings of the 13th ACM Conference on Electronic Commerce (EC'12), 2012. -
36M. Geist, B. Scherrer, A. Lazaric, M. Ghavamzadeh.
A Dantzig Selector Approach to Temporal Difference Learning, in: Proceedings of the Twenty-Ninth International Conference on Machine Learning, 2012, p. 1399-1406. -
37M. Ghavamzadeh, A. Lazaric.
Conservative and Greedy Approaches to Classification-based Policy Iteration, in: Proceedings of the Twenty-Sixth Conference on Artificial Intelligence, 2012, p. 914-920. -
38E. Kauffmann, N. Korda, R. Munos.
Thompson Sampling: an Asymptotically Optimal Finite Time Analysis, in: International Conference on Algorithmic Learning Theory, 2012. -
39A. Khaleghi, D. Ryabko.
Locating Changes in Highly Dependent Data with Unknown Number of Change Points, in: NIPS, Lake Tahoe, USA, 2012. -
40A. Khaleghi, D. Ryabko, J. Mary, P. Preux.
Online Clustering of Processes, in: AISTATS, JMLR W&CP 22, 2012, p. 601-609. -
41B. Kveton, M. Valko.
Learning from a Single Labeled Face and a Stream of Unlabeled Data, in: 10th IEEE International Conference on Automatic Face and Gesture Recognition, Shanghai, China, November 2012.
http://hal. inria. fr/ hal-00749197 -
42O.-A. Maillard, A. Carpentier.
Online allocation and homogeneous partitioning for piecewise constant mean approximation, in: Advances in Neural Information Processing Systems, 2012. -
43R. Ortner, D. Ryabko, P. Auer, R. Munos.
Regret Bounds for Restless Markov Bandits, in: Proc. 23th International Conf. on Algorithmic Learning Theory (ALT'12), Lyon, France, LNCS 7568, Springer, Berlin, 2012, p. 214–228. -
44R. Ortner, D. Ryabko.
Online Regret Bounds for Undiscounted Continuous Reinforcement Learning, in: NIPS, Lake Tahoe, USA, 2012. -
45D. Ryabko, J. Mary.
Reducing statistical time-series problems to binary classification, in: NIPS, Lake Tahoe, USA, 2012. -
46A. Sani, A. Lazaric, R. Munos.
Risk-Aversion in Multi-Armed Bandits, in: Advances in Neural Information Processing Systems, 2012. -
47B. Scherrer, M. Ghavamzadeh, V. Gabillon, M. Geist.
Approximate Modified Policy Iteration, in: Proceedings of the Twenty-Ninth International Conference on Machine Learning, 2012, p. 1207-1214.
National Conferences with Proceeding
-
48G. Dulac-Arnold, L. Denoyer, P. Preux, P. Gallinari.
Apprentissage par renforcement rapide pour des grands ensembles d'actions en utilisant des codes correcteurs d'erreur, in: Journées Francophones sur la planification, la décision et l'apprentissage pour le contrôle des systèmes - JFPDA 2012, Villers-lès-Nancy, France, O. Buffet (editor), 2012, 12 p p.
http://hal. inria. fr/ hal-00736322 -
49M. Geist, B. Scherrer, A. Lazaric, M. Ghavamzadeh.
Un sélecteur de Dantzig pour l'apprentissage par différences temporelles, in: Journées Francophones sur la planification, la décision et l'apprentissage pour le contrôle des systèmes - JFPDA 2012, Villers-lès-Nancy, France, O. Buffet (editor), 2012, 13 p p.
http://hal. inria. fr/ hal-00736229 -
50N. Jaoua, E. Duflos, P. Vanheeghe.
DPM pour l'inférence dans les modèles dynamiques non linéaires avec des bruits de mesure alpha-stable, in: 44ème Journées de Statistique, Bruxelles, Belgium, May 2012, p. 1-4.
http://hal. inria. fr/ hal-00713857 -
51B. Scherrer, V. Gabillon, M. Ghavamzadeh, M. Geist.
Approximations de l'Algorithme Itérations sur les Politiques Modifié, in: Journées Francophones sur la planification, la décision et l'apprentissage pour le contrôle des systèmes - JFPDA 2012, Villers-lès-Nancy, France, O. Buffet (editor), 2012, 1 p p, Le corps de cet article est paru, en langue anglaise, dans ICML'2012 (Proceedings of the International Conference on Machine Learning).
http://hal. inria. fr/ hal-00736226
Conferences without Proceedings
-
52C. Dhanjal, R. Gaudel, S. Clémençon.
Incremental Spectral Clustering with the Normalised Laplacian, in: DISCML - 3rd NIPS Workshop on Discrete Optimization in Machine Learning - 2011, Sierra Nevada, Espagne, 2012.
http://hal. inria. fr/ hal-00745666 -
53A. Farahmand, D. Precup, M. Ghavamzadeh.
On Classification-based Approximate Policy Iteration, in: Thirtieth International Conference on Machine Learning, 2012, submitted. -
54D. Ryabko.
Asymptotic statistics of stationary ergodic time series, in: WITMSE, Amsterdam, 2012.
Scientific Books (or Scientific Book chapters)
-
55L. Busoniu, A. Lazaric, M. Ghavamzadeh, R. Munos, R. Babuska, B. De Schutter.
Least-Squares Methods for Policy Iteration, in: Reinforcement Learning: State of the Art, M. Wiering, M. van Otterlo (editors), Springer Verlag, 2012, p. 75-110. -
56A. Lazaric.
Transfer in Reinforcement Learning: a Framework and a Survey, in: Reinforcement Learning: State of the Art, M. Wiering, M. van Otterlo (editors), Springer, 2012. -
57N. Vlassis, M. Ghavamzadeh, S. Mannor, P. Poupart.
Bayesian Reinforcement Learning, in: Reinforcement Learning: State of the Art, M. Wiering, M. van Otterlo (editors), Springer Verlag, 2012, p. 359-386.
Internal Reports
-
58V. Gabillon, M. Ghavamzadeh, A. Lazaric.
Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence, Inria, 2012, no inria-00747005. -
59H. Kadri, M. Ghavamzadeh, P. Preux.
A Generalized Kernel Approach to Structured Output Learning, Inria, May 2012, no RR-7956.
http://hal. inria. fr/ hal-00695631 -
60H. Kadri, A. Rakotomamonjy, F. Bach, P. Preux.
Multiple Operator-valued Kernel Learning, Inria, March 2012, no RR-7900.
http://hal. inria. fr/ hal-00677012 -
61B. Pires, M. Ghavamzadeh, Cs. Szepesváari.
Risk Bounds in Cost-sensitive Multiclass Classification: an Application to Reinforcement Learning, Inria, 2012, in preparation. -
62B. Scherrer, V. Gabillon, M. Ghavamzadeh, M. Geist.
Approximate Modified Policy Iteration, Inria, May 2012.
http://hal. inria. fr/ hal-00697169
-
63P. Auer, N. Cesa-Bianchi, P. Fischer.
Finite-time analysis of the multi-armed bandit problem, in: Machine Learning, 2002, vol. 47, no 2/3, p. 235–256. -
64R. Bellman.
Dynamic Programming, Princeton University Press, 1957. -
65D. Bertsekas, S. Shreve.
Stochastic Optimal Control (The Discrete Time Case), Academic Press, New York, 1978. -
66D. Bertsekas, J. Tsitsiklis.
Neuro-Dynamic Programming, Athena Scientific, 1996. -
67T. Ferguson.
A Bayesian Analysis of Some Nonparametric Problems, in: The Annals of Statistics, 1973, vol. 1, no 2, p. 209–230. -
68T. Hastie, R. Tibshirani, J. Friedman.
The elements of statistical learning — Data Mining, Inference, and Prediction, Springer, 2001. -
69W. Powell.
Approximate Dynamic Programming, Wiley, 2007. -
70M. Puterman.
Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley and Sons, 1994. -
71H. Robbins.
Some aspects of the sequential design of experiments, in: Bull. Amer. Math. Soc., 1952, vol. 55, p. 527–535. -
72J. Rust.
How Social Security and Medicare Affect Retirement Behavior in a World of Incomplete Market, in: Econometrica, July 1997, vol. 65, no 4, p. 781–831.
http://gemini. econ. umd. edu/ jrust/ research/ rustphelan. pdf -
73J. Rust.
On the Optimal Lifetime of Nuclear Power Plants, in: Journal of Business & Economic Statistics, 1997, vol. 15, no 2, p. 195–208. -
74R. Sutton, A. Barto.
Reinforcement learning: an introduction, MIT Press, 1998. -
75G. Tesauro.
Temporal Difference Learning and TD-Gammon, in: Communications of the ACM, March 1995, vol. 38, no 3.
http://www. research. ibm. com/ massive/ tdl. html -
76P. Werbos.
ADP: Goals, Opportunities and Principles, IEEE Press, 2004, p. 3–44, Handbook of learning and approximate dynamic programming.