Bibliography
Publications of the year
Doctoral Dissertations and Habilitation Theses
-
1A. Khaleghi.
Sur quelques problèmes non-supervisés impliquant des séries temporelles hautement dèpendantes, Institut national de recherche en informatique et en automatique (Inria), November 2013.
http://hal.inria.fr/tel-00920184
Articles in International Peer-Reviewed Journals
-
2M. G. Azar, R. Munos, H. Kappen.
Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model, in: Machine Learning, 2013, vol. 91, no 3, pp. 325-349.
http://hal.inria.fr/hal-00831875 -
3O. Cappé, A. Garivier, O.-A. Maillard, R. Munos, G. Stoltz.
Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation, in: Annals of Statistics, 2013, vol. 41, no 3, pp. 1516-1541, Accepted.
http://hal.inria.fr/hal-00738209 -
4J. Fruitet, A. Carpentier, R. Munos, M. Clerc.
Automatic motor task selection via a bandit algorithm for a brain-controlled button, in: Journal of Neural Engineering, January 2013, vol. 10, no 1. [ DOI : 10.1088/1741-2560/10/1/016012 ]
http://hal.inria.fr/hal-00798561 -
5M. Hauskrecht, I. Batal, M. Valko, S. Visweswaran, G. F. Cooper, G. Clermont.
Outlier detection for patient monitoring and alerting, in: Journal of Biomedical Informatics, February 2013, vol. 46, pp. 47-55. [ DOI : 10.1016/j.jbi.2012.08.004 ]
http://hal.inria.fr/hal-00742097 -
6D. Ryabko, J. Mary.
A Binary-Classification-Based Metric between Time-Series Distributions and Its Use in Statistical and Learning Problems, in: Journal of Machine Learning Research, 2013, vol. 14, pp. 2837-2856.
http://hal.inria.fr/hal-00913240 -
7B. Ryabko, D. Ryabko.
A confidence-set approach to signal denoising, in: Statistical Methodology, 2013, vol. 15, pp. 115–120.
http://hal.inria.fr/hal-00913253
International Conferences with Proceedings
-
8B. Avila Pires, M. Ghavamzadeh, C. Szepesvari.
Cost-sensitive Multiclass Classification Risk Bounds, in: International Conference on Machine Learning, Atlanta, United States, 2013.
http://hal.inria.fr/hal-00840485 -
9A. Carpentier, R. Munos.
Toward optimal stratification for stratified monte-carlo integration, in: International Conference on Machine Learning, United States, 2013.
http://hal.inria.fr/hal-00923685 -
10P. Chainais, C. Richard.
Learning a common dictionary over a sensor network, in: CAMSAP 2013, Saint-Martin, France, December 2013, pp. 1-4.
http://hal.inria.fr/hal-00923742 -
11R. Fonteneau, L. Busoniu, R. Munos.
Optimistic planning for belief-augmented Markov decision processes, in: IEEE International Symposium on Adaptive Dynamic Programming and reinforcement Learning, ADPRL 2013, Singapore, April 2013.
http://hal.inria.fr/hal-00840202 -
12V. Gabillon, M. Ghavamzadeh, B. Scherrer.
Approximate Dynamic Programming Finally Performs Well in the Game of Tetris, in: Neural Information Processing Systems (NIPS) 2013, South Lake Tahoe, United States, 2013.
http://hal.inria.fr/hal-00921250 -
13M. Gheshlaghi Azar, A. Lazaric, B. Emma.
Regret Bounds for Reinforcement Learning with Policy Advice, in: ECML/PKDD - European conference on machine learning and principles and practice of knowledge discovery in databases - 2013, Prague, Czech Republic, September 2013.
http://hal.inria.fr/hal-00924021 -
14M. Gheshlaghi Azar, A. Lazaric, B. Emma.
Sequential Transfer in Multi-armed Bandit with Finite Set of Models, in: NIPS - Advances in Neural Information Processing Systems 25 - 2013, Lake Tahoe, United States, December 2013.
http://hal.inria.fr/hal-00924025 -
15H. Kadri, M. Ghavamzadeh, P. Preux.
A Generalized Kernel Approach to Structured Output Learning, in: International Conference on Machine Learning (ICML), Atlanta, United States, 2013.
http://hal.inria.fr/hal-00695631 -
16G. Kedenburg, R. Fonteneau, R. Munos.
Aggregating optimistic planning trees for solving markov decision processes, in: Advances in Neural Information Processing Systems, United States, 2013, pp. 2382-2390.
http://hal.inria.fr/hal-00923681 -
17A. Khaleghi, D. Ryabko.
Nonparametric multiple change point estimation in highly dependent time series, in: Proc. 24th International Conf. on Algorithmic Learning Theory (ALT'13), Singapore, Springer, 2013, pp. 382-396.
http://hal.inria.fr/hal-00913250 -
18N. Korda, E. Kaufmann, R. Munos.
Thompson sampling for one-dimensional exponential family bandits, in: Advances in Neural Information Processing Systems, United States, 2013.
http://hal.inria.fr/hal-00923683 -
19B. Kveton, M. Valko.
Learning from a Single Labeled Face and a Stream of Unlabeled Data, in: 10th IEEE International Conference on Automatic Face and Gesture Recognition, Shanghai, China, January 2013.
http://hal.inria.fr/hal-00749197 -
20O.-A. Maillard, P. Nguyen, R. Ortner, D. Ryabko.
Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning, in: ICML - 30th International Conference on Machine Learning, Atlanta, USA, United States, 2013, vol. 28(1), pp. 543-551.
http://hal.inria.fr/hal-00778586 -
21P. Nguyen, O.-A. Maillard, D. Ryabko, R. Ortner.
Competing with an Infinite Set of Models in Reinforcement Learning, in: AISTATS, Arizona, United States, JMLR W&CP, 2013, vol. 31, pp. 463-471.
http://hal.inria.fr/hal-00823230 -
22D. Ryabko.
Time-series information and learning, in: ISIT - International Symposium on Information Theory, Istanbul, Turkey, 2013, pp. 1392-1395.
http://hal.inria.fr/hal-00823233 -
23D. Ryabko.
Unsupervised model-free representation learning, in: Proc. 24th International Conf. on Algorithmic Learning Theory (ALT'13), Singapore, Springer, 2013, pp. 354-366.
http://hal.inria.fr/hal-00913244 -
24B. Szorenyi, R. Busa-Fekete, I. Hegedüs, R. Ormandi, M. Jelasity, B. Kégl.
Gossip-based distributed stochastic bandit algorithms, in: 30th International Conference on Machine Learning (ICML 2013), Atlanta, United States, S. Dasgupta, D. McAllester (editors), 2013, vol. 28, pp. 19-27.
http://hal.inria.fr/in2p3-00907406 -
25E. M. Thomas, M. Clerc, A. Carpentier, E. Daucé, D. Devlaminck, R. Munos.
Optimizing P300-speller sequences by RIP-ping groups apart, in: IEEE/EMBS 6th international conference on neural engineering (2013), San Diego, United States, IEEE/EMBS, November 2013.
http://hal.inria.fr/hal-00907781 -
26M. Valko, A. Carpentier, R. Munos.
Stochastic Simultaneous Optimistic Optimization, in: 30th International Conference on Machine Learning, Atlanta, United States, February 2013.
http://hal.inria.fr/hal-00789606 -
27M. Valko, N. Korda, R. Munos, I. Flaounas, N. Cristianini.
Finite-Time Analysis of Kernelised Contextual Bandits, in: The 29th Conference on Uncertainty in Artificial Intelligence, Bellevue, United States, 2013.
http://hal.inria.fr/hal-00826946
National Conferences with Proceedings
-
28P. Bas, P. Chainais, E. Zidel - Cauffet.
Quantification adaptative pour la stéganalyse d'images texturées, in: GRETSI 2013, Brest, France, September 2013.
http://hal.inria.fr/hal-00868550 -
29P. Chainais, C. Richard.
Distributed dictionary learning over a sensor network, in: CaP 2013, Villeneuve d'Ascq, France, July 2013, pp. 1-4.
http://hal.inria.fr/hal-00923741
Scientific Books (or Scientific Book chapters)
-
30L. Busoniu, R. Munos, R. Babuska.
A review of optimistic planning in Markov decision processes, in: Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control, F. Lewis, D. Liu (editors), IEEE Press Series on Computational Intelligence, Wiley-IEEE Press, January 2013, chap. 22, pp. 494-516.
http://hal.inria.fr/hal-00756742
Internal Reports
-
31M. Ghavamzadeh, Y. Engel.
Bayesian Policy Gradient and Actor-Critic Algorithms, January 2013.
http://hal.inria.fr/hal-00776608 -
32P. L.A., M. Ghavamzadeh.
Actor-Critic Algorithms for Risk-Sensitive MDPs, February 2013.
http://hal.inria.fr/hal-00794721 -
33R. Munos.
From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning, 2013.
http://hal.inria.fr/hal-00747575
-
34P. Auer, N. Cesa-Bianchi, P. Fischer.
Finite-time analysis of the multi-armed bandit problem, in: Machine Learning, 2002, vol. 47, no 2/3, pp. 235–256. -
35R. Bellman.
Dynamic Programming, Princeton University Press, 1957. -
36D. Bertsekas, S. Shreve.
Stochastic Optimal Control (The Discrete Time Case), Academic Press, New York, 1978. -
37D. Bertsekas, J. Tsitsiklis.
Neuro-Dynamic Programming, Athena Scientific, 1996. -
38T. Ferguson.
A Bayesian Analysis of Some Nonparametric Problems, in: The Annals of Statistics, 1973, vol. 1, no 2, pp. 209–230. -
39T. Hastie, R. Tibshirani, J. Friedman.
The elements of statistical learning — Data Mining, Inference, and Prediction, Springer, 2001. -
40W. Powell.
Approximate Dynamic Programming, Wiley, 2007. -
41M. Puterman.
Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley and Sons, 1994. -
42H. Robbins.
Some aspects of the sequential design of experiments, in: Bull. Amer. Math. Soc., 1952, vol. 55, pp. 527–535. -
43J. Rust.
How Social Security and Medicare Affect Retirement Behavior in a World of Incomplete Market, in: Econometrica, July 1997, vol. 65, no 4, pp. 781–831.
http://gemini.econ.umd.edu/jrust/research/rustphelan.pdf -
44J. Rust.
On the Optimal Lifetime of Nuclear Power Plants, in: Journal of Business & Economic Statistics, 1997, vol. 15, no 2, pp. 195–208. -
45R. Sutton, A. Barto.
Reinforcement learning: an introduction, MIT Press, 1998. -
46G. Tesauro.
Temporal Difference Learning and TD-Gammon, in: Communications of the ACM, March 1995, vol. 38, no 3. -
47P. Werbos.
ADP: Goals, Opportunities and Principles, IEEE Press, 2004, pp. 3–44, Handbook of learning and approximate dynamic programming.