Bibliography

Publications of the year

Doctoral Dissertations and Habilitation Theses

1E. Grave.

A Markovian approach to distributional semantics, Université Pierre et Marie Curie - Paris VI, January 2014.

http://hal.inria.fr/tel-00940575
2M. Solnon.

Apprentissage statistique multi-tâches, Université Pierre et Marie Curie - Paris VI, November 2013.

http://hal.inria.fr/tel-00911498

Articles in International Peer-Reviewed Journals

3Z. Harchaoui, F. Bach, O. Cappé, E. Moulines.

Kernel-Based Methods for Hypothesis Testing: A Unified View, in: IEEE Signal Processing Magazine, June 2013, vol. 30, n^o 4, pp. 87-97. [ DOI : 10.1109/MSP.2013.2253631 ]

http://hal.inria.fr/hal-00841978
4B. Mishra, G. Meyer, F. Bach, R. Sepulchre.

Low-rank optimization with trace norm penalty, in: SIAM Journal on Optimization, 2013, vol. 23, n^o 4, pp. 2124-2149. [ DOI : 10.1137/110859646 ]

http://hal.inria.fr/hal-00924110
5A. d'Aspremont, N. E. Karoui.

Weak Recovery Conditions from Graph Partitioning Bounds and Order Statistics, in: Mathematics of Operations Research, July 2013, vol. 38, n^o 2, Final version.

http://pubsonline.informs.org/doi/abs/10.1287/moor.1120.0581, http://hal.inria.fr/hal-00907541

International Conferences with Proceedings

6A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, A. J. Smola.

Distributed Large-scale Natural Graph Factorization, in: IW3C2 - International World Wide Web Conference, Rio de Janeiro, Brazil, May 2013, 37 p.

http://hal.inria.fr/hal-00918478
7F. Bach.

Sharp analysis of low-rank kernel matrix approximations, in: International Conference on Learning Theory (COLT), United States, 2013.

http://hal.inria.fr/hal-00723365
8F. Bach, E. Moulines.

Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n), in: Neural Information Processing Systems (NIPS), United States, 2013.

http://hal.inria.fr/hal-00831977
9P. Bojanowski, F. Bach, I. Laptev, J. Ponce, C. Schmid, J. Sivic.

Finding Actors and Actions in Movies, in: ICCV 2013 - IEEE International Conference on Computer Vision, Sydney, Australia, IEEE, 2013.

http://hal.inria.fr/hal-00904991
10M. Cuturi, A. d'Aspremont.

Mean Reversion with a Variance Threshold, in: International Conference on Machine Learning, United States, October 2013, pp. 271-279.

http://hal.inria.fr/hal-00939566
11M. Eickenberg, F. Pedregosa, S. Mehdi, A. Gramfort, B. Thirion.

Second order scattering descriptors predict fMRI activity due to visual textures, in: PRNI 2013 - 3nd International Workshop on Pattern Recognition in NeuroImaging, Philadelphia, United States, Conference Publishing Services, June 2013.

http://hal.inria.fr/hal-00834928
12F. Fogel, R. Jenatton, F. Bach, A. d'Aspremont.

Convex Relaxations for Permutation Problems, in: Neural Information Processing Systems (NIPS) 2013, United States, August 2013.

http://nips.cc/Conferences/2013/Program/speaker-info.php?ID=12863, http://hal.inria.fr/hal-00907528
13E. Grave, G. Obozinski, F. Bach.

Hidden Markov tree models for semantic class induction, in: CoNLL - Seventeenth Conference on Computational Natural Language Learning, Sofia, Bulgaria, 2013.

http://hal.inria.fr/hal-00833288
14P. Gronat, G. Obozinski, J. Sivic, T. Pajdla.

Learning and calibrating per-location classifiers for visual place recognition, in: CVPR 2013 - 26th IEEE Conference on Computer Vision and Pattern Recognition, Portland, United States, June 2013.

http://hal.inria.fr/hal-00934332
15S. Jegelka, F. Bach, S. Sra.

Reflection methods for user-friendly submodular optimization, in: NIPS 2013 - Neural Information Processing Systems, Lake Tahoe, Nevada, United States, 2013.

http://hal.inria.fr/hal-00905258
16S. Lacoste-Julien, M. Jaggi, M. Schmidt, P. Pletscher.

Block-Coordinate Frank-Wolfe Optimization for Structural SVMs, in: ICML 2013 International Conference on Machine Learning, Atlanta, United States, 2013, pp. 53-61.

http://hal.inria.fr/hal-00720158
17S. Lacoste-Julien, K. Palla, A. Davies, G. Kasneci, T. Graepel, Z. Ghahramani.

SiGMa: Simple Greedy Matching for Aligning Large Knowledge Bases, in: KDD 2013 - The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, United States, August 2013, pp. 572-580. [ DOI : 10.1145/2487575.2487592 ]

http://hal.inria.fr/hal-00918671
18N. Le Roux, F. Bach.

Local Component Analysis, in: ICLR - International Conference on Learning Representations 2013, Scottsdale, United States, 2013.

http://hal.inria.fr/inria-00617965
19A. Nelakanti, C. Archambeau, J. Mairal, F. Bach, G. Bouchard.

Structured Penalties for Log-linear Language Models, in: EMNLP - Empirical Methods in Natural Language Processing - 2013, Seattle, United States, Association for Computational Linguistics, October 2013, pp. 233-243.

http://hal.inria.fr/hal-00904820
20F. Pedregosa, M. Eickenberg, B. Thirion, A. Gramfort.

HRF estimation improves sensitivity of fMRI encoding and decoding models, in: 3nd International Workshop on Pattern Recognition in NeuroImaging, Philadelphia, United States, May 2013.

http://hal.inria.fr/hal-00821946
21E. Richard, F. Bach, J.-P. Vert.

Intersecting singularities for multi-structured estimation, in: ICML 2013 - 30th International Conference on Machine Learning, Atlanta, United States, 2013.

http://hal.inria.fr/hal-00918253
22G. Rigaill, T. D. Hocking, F. Bach, J.-P. Vert.

Learning Sparse Penalties for Change-Point Detection using Max Margin Interval Regression, in: ICML 2013 - 30 th International Conference on Machine Learning, Atlanta, United States, Supported by the International Machine Learning Society (IMLS), May 2013.

http://hal.inria.fr/hal-00824075
23T. Schatz, V. Peddinti, F. Bach, A. Jansen, H. Hermansky, E. Dupoux.

Evaluating speech features with the Minimal-Pair ABX task: Analysis of the classical MFC/PLP pipeline, in: INTERSPEECH 2013 : 14th Annual Conference of the International Speech Communication Association, Lyon, France, 2013, pp. 1-5.

http://hal.inria.fr/hal-00918599
24K. S. Sesh Kumar, F. Bach.

Convex Relaxations for Learning Bounded Treewidth Decomposable Graphs, in: International Conference on Machine Learning, Atlanta, United States, 2013, Extended version of the ICML-2013 paper..

http://hal.inria.fr/hal-00763921

Conferences without Proceedings

25E. Grave, G. Obozinski, F. Bach.

Domain adaptation for sequence labeling using hidden Markov models, in: New Directions in Transfer and Multi-Task: Learning Across Domains and Tasks (NIPS Workshop), Lake Tahoe, United States, 2013.

http://hal.inria.fr/hal-00918371

Scientific Books (or Scientific Book chapters)

26F. Bach.

Learning with Submodular Functions: A Convex Optimization Perspective, Foundations and Trends in Machine Learning, Now Publishers, 2013, 228 p. [ DOI : 10.1561/2200000039 ]

http://hal.inria.fr/hal-00645271

Other Publications

27F. Bach.

Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression, October 2013.

http://hal.inria.fr/hal-00804431
28F. Bach.

Convex relaxations of structured matrix factorizations, September 2013.

http://hal.inria.fr/hal-00861118
29F. Fogel, I. Waldspurger, A. d'Aspremont.

Phase retrieval for imaging problems, 2013.

http://hal.inria.fr/hal-00907529
30R. Gribonval, R. Jenatton, F. Bach, M. Kleinsteuber, M. Seibert.

Sample Complexity of Dictionary Learning and other Matrix Factorizations, December 2013, submitted.

http://hal.inria.fr/hal-00918142
31R. Lajugie, S. Arlot, F. Bach.

Large-Margin Metric Learning for Partitioning Problems, March 2013.

http://hal.inria.fr/hal-00796921
32M. Schmidt, N. Le Roux, F. Bach.

Minimizing Finite Sums with the Stochastic Average Gradient, September 2013.

http://hal.inria.fr/hal-00860051
33M. Schmidt, N. Le Roux.

Fast Convergence of Stochastic Gradient Descent under a Strong Growth Condition, August 2013.

http://hal.inria.fr/hal-00855113
34K. S. Sesh Kumar, F. Bach.

Maximizing submodular functions using probabilistic graphical models, September 2013.

http://hal.inria.fr/hal-00860575
35M. Solnon.

Comparison bewteen multi-task and single-task oracle risks in kernel ridge regression, 2013, Submitted to the Electronic Journal of Statistics.

http://hal.inria.fr/hal-00846715
36I. Waldspurger, A. d'Aspremont, S. Mallat.

Phase Recovery, MaxCut and Complex Semidefinite Programming, 2013, Submitted revision.

http://hal.inria.fr/hal-00907535
37A. d'Aspremont, M. Jaggi.

An Optimal Affine Invariant Smooth Minimization Algorithm, 2013.

http://hal.inria.fr/hal-00907547

References in notes

38F. Bach.

Learning with Submodular Functions: A Convex Optimization Perspective, in: ArXiv e-prints, 2011.
39F. Bach, M. Jordan.

Thin junction trees, in: Adv. NIPS, 2002.
40F. Bach, M. Jordan.

Learning spectral clustering, in: Adv. NIPS, 2003.
41A. Bar-Hillel, T. Hertz, N. Shental, D. Weinshall.

Learning a mahalanobis metric from equivalence constraints, in: Journal of Machine Learning Research, 2006, vol. 6, n^o 1, 937 p.
42C. Bishop, et al..

Pattern recognition and machine learning, springer New York, 2006.
43D. Blatt, A. O. Hero, H. Gauchman.

A convergent incremental gradient method with a constant step size, in: SIOPT, 2007, vol. 18, n^o 1, pp. 29–51.
44Y. Boykov, O. Veksler, R. Zabih.

Fast approximate energy minimization via graph cuts, in: IEEE Trans. PAMI, 2001, vol. 23, n^o 11, pp. 1222–1239.
45L. Burget, P. Matejka, P. Schwarz, O. Glembek, J. Cernocky.

Analysis of Feature Extraction and Channel Compensation in a GMM Speaker Recognition System, in: IEEE Transactions on Audio, Speech and Language Processing, September 2007, vol. 15, n^o 7, pp. 1979-1986.
46M. A. Carlin, S. Thomas, A. Jansen, H. Hermansky.

Rapid evaluation of speech representations for spoken term discovery, in: Proceedings of Interspeech, 2011.
47Y.-W. Chang, M. Collins.

Exact Decoding of Phrase-based Translation Models through Lagrangian Relaxation, in: Proceedings of the Conference on Empirical Methods for Natural Language Processing, 2011, pp. 26–37.
48A. Chechetka, C. Guestrin.

Efficient Principled Learning of Thin Junction Trees, in: Adv. NIPS, 2007.
49J. Chen, A. K. Gupta.

Parametric Statistical Change Point Analysis, Birkhäuser, 2011.
50S. Chen, R. Rosenfeld.

A survey of smoothing techniques for ME models, in: IEEE Transactions on Speech and Audio Processing, 2000, vol. 8, n^o 1, pp. 37–50.
51Y. Cheng.

Mean shift, mode seeking, and clustering, in: IEEE Trans. PAMI, 1995, vol. 17, n^o 8, pp. 790–799.
52C. I. Chow, C. N. Liu.

Approximating discrete probability distributions with dependence trees, in: IEEE Trans. Inf. Theory, 1968, vol. 14.
53F. De la Torre, T. Kanade.

Discriminative cluster analysis, in: Proc. ICML, 2006.
54F. Desobry, M. Davy, C. Doncarli.

An online kernel change detection algorithm, in: IEEE Trans. Sig. Proc., 2005, vol. 53, n^o 8, pp. 2961–2974.
55B. Efron, C. N. Morris.

Stein's paradox in statistics, in: Scientific American, 1977, vol. 236, pp. 119–127.
56T. Evgeniou, C. A. Micchelli, M. Pontil.

Learning Multiple Tasks with Kernel Methods, in: Journal of Machine Learning Research, 2005, vol. 6, pp. 615–637.
57P. Fousek, P. Svojanovsky, F. Grezl, H. Hermansky.

New Nonsense Syllables Database – Analyses and Preliminary ASR Experiments, in: Proceedings of the International Conference on Spoken Language Processing (ICSLP), 2004, pp. 2004-29.
58S. Fujishige.

Submodular Functions and Optimization, Annals of Discrete Mathematics, Elsevier, 2005.
59V. Gogate, W. Webb, P. Domingos.

Learning Efficient Markov Networks, in: Adv. NIPS, 2010.
60J. Goodman.

A bit of progress in language modelling, in: Computer Speech and Language, October 2001, pp. 403–434.
61J. C. Gower, G. J. S. Ross.

Minimum spanning trees and single linkage cluster analysis, in: Applied statistics, 1969, pp. 54–64.
62T. D. Hocking, G. Schleiermacher, I. Janoueix-Lerosey, O. Delattre, F. Bach, J.-P. Vert.

Learning smoothing models of copy number profiles using breakpoint annotations, in: HAL, archives ouvertes, 2012.
63L. Jacob, F. Bach, J.-P. Vert.

Clustered Multi-Task Learning: A Convex Formulation, in: Computing Research Repository, 2008, pp. -1–1.
64W. James, C. Stein.

Estimation with quadratic loss, in: Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, 1961, vol. 1, n^o 1961, pp. 361–379.
65R. Jenatton, J. Mairal, G. Obozinski, F. Bach.

Proximal Methods for Hierarchical Sparse Coding, in: Journal of Machine Learning Research, 2011, pp. 2297-2334.
66R. Kneser, H. Ney.

Improved backing-off for m-gram language modeling, in: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 1995, vol. 1.
67D. Koller, N. Friedman.

Probabilistic graphical models: principles and techniques, MIT press, 2009.
68V. Kolmogorov, T. Schoenemann.

Generalized sequential tree-reweighted message passing, in: ArXiv e-prints, May 2012.
69A. Krause, C. Guestrin.

Submodularity and its Applications in Optimized Information Gathering, in: ACM Transactions on Intelligent Systems and Technology, 2011, vol. 2, n^o 4.
70H. Lin, J. Bilmes.

A Class of Submodular Functions for Document Summarization, in: Proc. NAACL/HLT, 2011.
71D. Luenberger, Y. Ye.

Linear and nonlinear programming, Springer Verlag, 2008.
72N. A. Macmillan, C. D. Creelman.

Detection theory: A user's guide, Lawrence Erlbaum, 2004.
73F. Malvestuto.

Approximating discrete probability distributions with decomposable models, in: IEEE Trans. Systems, Man, Cybernetics, 1991, vol. 21, n^o 5.
74A. F. T. Martins, N. A. Smith, A. M. Q. Pedro, M. A. T. Figueiredo.

Structured sparsity in structured prediction, in: Proceedings of the Conference on Empirical Methods for Natural Language Processing, 2011, pp. 1500–1511.
75M. Narasimhan, J. Bilmes.

PAC-learning bounded tree-width graphical models, in: Proc. UAI, 2004.
76G. Nemhauser, L. Wolsey, M. Fisher.

An analysis of approximations for maximizing submodular set functions–I, in: Mathematical Programming, 1978, vol. 14, n^o 1, pp. 265–294.
77A. Nemirovski, A. Juditsky, G. Lan, A. Shapiro.

Robust stochastic approximation approach to stochastic programming, in: SIOPT, 2009, vol. 19, n^o 4, pp. 1574–1609.
78A. Nemirovski.

Efficient methods in convex programming, in: Lecture notes, 1994.
79Y. Nesterov.

Introductory lectures on convex optimization: A basic course, Springer, 2004.
80A. Y. Ng, M. Jordan, Y. Weiss.

On spectral clustering: Analysis and an algorithm, in: Adv. NIPS, 2002.
81B. Roark, M. Saraclar, M. Collins, M. Johnson.

Discriminative language modeling with conditional random fields and the perceptron algorithm, in: Proceedings of the Association for Computation Linguistics, 2004.
82L. Saul, M. Jordan.

Exploiting Tractable Substructures in Intractable Networks, in: Adv. NIPS, 1995.
83H. D. Sherali, W. P. Adams.

A Hierarchy of Relaxations Between the Continuous and Convex Hull Representations for Zero-One Programming Problems, in: SIAM J. Discrete Math., 1990.
84J. Shi, J. Malik.

Normalized Cuts and Image Segmentation, in: IEEE Trans. PAMI, 1997, vol. 22, pp. 888–905.
85GSVS. Sivaram, H. Hermansky.

Sparse Multilayer Perceptron for Phoneme Recognition, in: IEEE Transactions on Audio, Speech, and Language Processing, 2012, vol. 20, n^o 1, pp. 23-29.
86M. Solnon, S. Arlot, F. Bach.

Multi-task Regression using Minimal Penalties, in: Journal of Machine Learning Research, September 2012, vol. 13, pp. 2773-2812.
87M. Solodov.

Incremental gradient algorithms with stepsizes bounded away from zero, in: Computational Optimization and Applications, 1998, vol. 11, n^o 1, pp. 23–35.
88C. Stein.

Inadmissibility of the usual estimator for the mean of a multivariate normal distribution, in: Proceedings of the Third Berkeley symposium on mathematical statistics and probability, 1956, vol. 1, n^o 399, pp. 197–206.
89T. Szántai, E. Kovács.

Discovering a junction tree behind a Markov network by a greedy algorithm, in: ArXiv e-prints, April 2011.
90P. Tseng.

An incremental gradient(-projection) method with momentum term and adaptive stepsize rule, in: SIOPT, 1998, vol. 8, n^o 2, pp. 506-531.
91I. Tsochantaridis, T. Hofmann, T. Joachims, Y. Altun.

Support Vector Machine Learning for Interdependent and Structured Output Spaces, in: Proc. ICML, 2004.
92S. Vargas, P. Castells, D. Vallet.

Explicit relevance models in intent-oriented information retrieval diversification, in: Proceedings of the 35th ACM SIGIR International Conference on Research and development in information retrieval, Portland, Oregon, USA, SIGIR'12, ACM, 2012, pp. 75-84.

http://doi.acm.org/10.1145/2348283.2348297
93M. Wainwright, M. Jordan.

Graphical models, exponential families, and variational inference, in: Found. and Trends in Mach. Learn., 2008, vol. 1, n^o 1-2.
94F. Wood, C. Archambeau, J. Gasthaus, J. Lancelot, Y.-W. Teh.

A Stochastic Memoizer for Sequence Data, in: Proceedings of the 26th International Conference on Machine Learning, 2009.
95E. P. Xing, A. Y. Ng, M. Jordan, S. Russell.

Distance metric learning with applications to clustering with side-information, in: Adv. NIPS, 2002.
96P. Zhao, G. Rocha, B. Yu.

The composite absolute penalties family for grouped and hierarchical variable selection, in: The Annals of Statistics, 2009, vol. 37(6A), pp. 3468-3497.

Previous |

Home