Publications of the year

Articles in International Peer-Reviewed Journals

  • 1R. Busa-Fekete, W. Cheng, E. Hüllermeier, B. Szörényi, P. Weng.

    Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm, in: Machine Learning, December 2014, vol. 97, no 3, pp. 327-351. [ DOI : 10.1007/s10994-014-5458-8 ]

  • 2C. Dhanjal, R. Gaudel, S. Clémençon.

    Efficient Eigen-updating for Spectral Graph Clustering, in: Neurocomputing, May 2014, vol. 131, pp. 440-452, Correction of several typos. [ DOI : 10.1016/j.neucom.2013.11.015 ]

  • 3A. György, G. Neu.

    Near-Optimal Rates for Limited-Delay Universal Lossy Source Coding, in: IEEE Transactions on Information Theory, 2014, pp. 2823-2834. [ DOI : 10.1109/TIT.2014.2307062 ]

  • 4G. Neu, A. György, C. Szepesvári, A. Antos.

    Online Markov Decision Processes Under Bandit Feedback, in: IEEE Transactions on Automatic Control, 2014, vol. 59, pp. 676 - 691. [ DOI : 10.1109/TAC.2013.2292137 ]

  • 5R. Ortner, D. Ryabko, P. Auer, R. Munos.

    Regret bounds for restless Markov bandits, in: Journal of Theoretical Computer Science (TCS), 2014, vol. 558, pp. 62-76. [ DOI : 10.1016/j.tcs.2014.09.026 ]

  • 6D. Ryabko.

    Uniform hypothesis testing for finite-valued stationary processes, in: Statistics, 2014, vol. 48, no 1, pp. 121-128. [ DOI : 10.1080/02331888.2012.719511 ]

  • 7B. Scherrer, M. Ghavamzadeh, V. Gabillon, B. Lesner, M. Geist.

    Approximate Modified Policy Iteration and its Application to the Game of Tetris, in: Journal of Machine Learning Research, 2015, 47 p, forthcoming.


International Conferences with Proceedings

  • 8R. Busa-Fekete, E. Hüllermeier, B. Szörényi.

    Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows, in: Proceedings of The 31st International Conference on Machine Learning, Beijing, China, JMLR Workshop and Conference Proceedings, June 2014, vol. 32.

  • 9D. Calandriello, A. Lazaric, M. Restelli.

    Sparse Multi-task Reinforcement Learning, in: NIPS - Advances in Neural Information Processing Systems 26, Montreal, Canada, December 2014.

  • 10A. Carpentier, M. Valko.

    Extreme bandits, in: Advances in Neural Information Processing Systems 27, Montréal, Canada, December 2014.

  • 11P. Chainais, P. Pfennig, A. Leray.

    Quantitative control of the error bounds of a fast super-resolution technique for microscopy and astronomy, in: Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, May 2014, pp. 2853 - 2857. [ DOI : 10.1109/ICASSP.2014.6854121 ]

  • 12P. Chainais, C. Richard.

    A diffusion strategy for distributed dictionary learning, in: 2nd "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14), Namur, Belgium, Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14), Laurent Jacques, August 2014.

  • 13E. Daucé, E. Thomas.

    Evidence build-up facilitates on-line adaptivity in dynamic environments: example of the BCI P300-speller, in: 22nd European Symposium on Artificial Neural Networks, Bruges, Belgium, April 2014.

  • 14C. Dhanjal, R. Gaudel, S. Clémençon.

    Online Matrix Completion Through Nuclear Norm Regularisation, in: SDM - SIAM International Conference on Data Mining, Philadelphia, United States, April 2014, Corrected a typo in the affiliation. [ DOI : 10.1137/1.9781611973440.72 ]

  • 15M. Gheshlaghi Azar, A. Lazaric, E. Brunskill.

    Online Stochastic Optimization under Correlated Bandit Feedback, in: 31st International Conference on Machine Learning, Beijing, China, June 2014.

  • 16S. Iván, Á. D. Lelkes, J. Nagy-György, B. Szörényi, G. Turán.

    Biclique Coverings, Rectifier Networks and the Cost of ε-Removal, in: 16th International Workshop on Descriptional Complexity of Formal Systems, Proceedings, Turku, Finland, August 2014, pp. 174 - 185. [ DOI : 10.1007/978-3-319-09704-6_16 ]

  • 17A. Khaleghi, D. Ryabko.

    Asymptotically consistent estimation of the number of change points in highly dependent time series, in: International Conference on Machine Learning (ICML), Beijing, China, June 2014, pp. 539-547.

  • 18T. Kocák, G. Neu, M. Valko, R. Munos.

    Efficient learning by implicit exploration in bandit problems with side observations, in: Advances in Neural Information Processing Systems 27, Montréal, Canada, December 2014.

  • 19T. Kocák, M. Valko, R. Munos, S. Agrawal.

    Spectral Thompson Sampling, in: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, Québec City, Canada, July 2014.

  • 20T. Kocák, M. Valko, R. Munos, B. Kveton, S. Agrawal.

    Spectral Bandits for Smooth Graph Functions with Applications in Recommender Systems, in: AAAI Workshop on Sequential Decision-Making with Big Data, Québec City, Canada, July 2014.

  • 21G. Neu, M. Valko.

    Online combinatorial optimization with stochastic decision sets and adversarial losses, in: Advances in Neural Information Processing Systems 27, Montréal, Canada, December 2014.

  • 22O. Nicol, J. Mary, P. Preux.

    Improving offline evaluation of contextual bandit algorithms via bootstrapping techniques, in: International Conference on Machine Learning, Beijing, China, E. Xing, T. Jebara (editors), Journal of Machine Learning Research, Workshop and Conference Proceedings; Proceedings of The 31st International Conference on Machine Learning, June 2014, vol. 32.

  • 23R. Ortner, O.-A. Maillard, D. Ryabko.

    Selecting Near-Optimal Approximate State Representations in Reinforcement Learning, in: International Conference on Algorithmic Learning Theory (ALT), Bled, Slovenia, LNCS, Springer, October 2014, vol. 8776, pp. 140-154.

  • 24O. Pietquin, H. Glaude, C. Enderli.

    Subspace Identification for Predictive State Representation by Nuclear Norm Minimization, in: IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2014), Orlando, United States, December 2014.

  • 25B. Piot, M. Geist, O. Pietquin.

    Difference of Convex Functions Programming for Reinforcement Learning, in: Advances in Neural Information Processing Systems (NIPS 2014), Montreal, Canada, December 2014.

  • 26B. Piot, O. Pietquin, M. Geist.

    Predicting when to laugh with structured classification, in: InterSpeech 2014, Singapore, September 2014, pp. 1786-1790.

  • 27P. Preux, R. Munos, M. Valko.

    Bandits attack function optimization, in: IEEE Congress on Evolutionary Computation, Beijing, China, July 2014.

  • 28A. Sani, G. Neu, A. Lazaric.

    Exploiting easy data in online optimization, in: Advances in Neural Information Processing 27, Montreal, Canada, December 2014.

  • 29M. Soare, A. Lazaric, R. Munos.

    Best-Arm Identification in Linear Bandits, in: NIPS - Advances in Neural Information Processing Systems 27, Montreal, Canada, December 2014.

  • 30B. Szörényi, G. Kedenburg, R. Munos.

    Optimistic planning in Markov decision processes using a generative model, in: Advances in Neural Information Processing Systems 27, Montréal, Canada, December 2014.

  • 31E. Thomas, E. Daucé, D. Devlaminck, L. Mahé, A. Carpentier, R. Munos, M. Perrin, E. Maby, J. Mattout, T. Papadopoulo, M. Clerc.

    CoAdapt P300 speller: optimized flashing sequences and online learning, in: 6th International Brain Computer Interface Conference, Graz, Austria, September 2014.

  • 32M. Valko, R. Munos, B. Kveton, T. Kocák.

    Spectral Bandits for Smooth Graph Functions, in: 31th International Conference on Machine Learning, Beijing, China, May 2014.


National Conferences with Proceedings

  • 33M. Pachebat, N. Totaro, P. Chainais, O. Collery.

    Synthèse en espace et temps du rayonnement acoustique d'une paroi sous excitation turbulente par synthèse spectrale 2D+T et formulation vibro-acoustique directe, in: Congrès Français d'acoustique 2014, Poitiers, France, April 2014, pp. 1915-1921, 6 Pages, 20 Refs, papier N183.


Conferences without Proceedings

  • 34B. Piot, M. Geist, O. Pietquin.

    Méthode de minimisation du résidu de Bellman boostée qui tient compte des démonstrations expertes., in: JFPDA - 9èmes Journées Francophones de Planification, Décision et Apprentissage, Liège, Belgium, May 2014.


Internal Reports

Other Publications

  • 38P. Chainais, A. Leray.

    Statistical performance analysis of a fast super-resolution technique using noisy translations, November 2014, 15 pages, forthcoming.

  • 39F. Guillou, R. Gaudel, J. Mary, P. Preux.

    User Engagement as Evaluation: a Ranking or a Regression Problem?, October 2014, 1. Introduction 2. Recsys Challenge 2014: Data and Protocol 2.1 Data Characteristics and Statistics 2.2 About User Engagement as Evaluation 2.3 Input Features for the Model 3. Method 3.1 LambdaMART Model 3.2 Random Forests 3.3 Description of the Approach 4. Experiments 4.1 Experimental results 4.2 Relevant Features 5. Discussions 6. Conclusions 7. Acknowledgments 8. References. [ DOI : 10.1145/2668067.2668073 ]

References in notes
  • 40P. Auer, N. Cesa-Bianchi, P. Fischer.

    Finite-time analysis of the multi-armed bandit problem, in: Machine Learning, 2002, vol. 47, no 2/3, pp. 235–256.
  • 41R. Bellman.

    Dynamic Programming, Princeton University Press, 1957.
  • 42D. Bertsekas, S. Shreve.

    Stochastic Optimal Control (The Discrete Time Case), Academic Press, New York, 1978.
  • 43D. Bertsekas, J. Tsitsiklis.

    Neuro-Dynamic Programming, Athena Scientific, 1996.
  • 44T. Ferguson.

    A Bayesian Analysis of Some Nonparametric Problems, in: The Annals of Statistics, 1973, vol. 1, no 2, pp. 209–230.
  • 45T. Hastie, R. Tibshirani, J. Friedman.

    The elements of statistical learning — Data Mining, Inference, and Prediction, Springer, 2001.
  • 46P. Nguyen, O.-A. Maillard, D. Ryabko, R. Ortner.

    Competing with an Infinite Set of Models in Reinforcement Learning, in: AISTATS, Arizona, United States, JMLR W&CP, 2013, vol. 31, pp. 463-471.

  • 47W. Powell.

    Approximate Dynamic Programming, Wiley, 2007.
  • 48M. Puterman.

    Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley and Sons, 1994.
  • 49H. Robbins.

    Some aspects of the sequential design of experiments, in: Bull. Amer. Math. Soc., 1952, vol. 55, pp. 527–535.
  • 50J. Rust.

    How Social Security and Medicare Affect Retirement Behavior in a World of Incomplete Market, in: Econometrica, July 1997, vol. 65, no 4, pp. 781–831.

  • 51J. Rust.

    On the Optimal Lifetime of Nuclear Power Plants, in: Journal of Business & Economic Statistics, 1997, vol. 15, no 2, pp. 195–208.
  • 52R. Sutton, A. Barto.

    Reinforcement learning: an introduction, MIT Press, 1998.
  • 53G. Tesauro.

    Temporal Difference Learning and TD-Gammon, in: Communications of the ACM, March 1995, vol. 38, no 3.
  • 54P. Werbos.

    ADP: Goals, Opportunities and Principles, IEEE Press, 2004, pp. 3–44, Handbook of learning and approximate dynamic programming.