Bibliography

Major publications by the team in recent years

1M. Baboulin, D. Becker, J. Dongarra.

A Parallel Tiled Solver for Dense Symmetric Indefinite Systems on Multicore Architectures, in: Proceedings of IEEE International Parallel & Distributed Processing Symposium (IPDPS 2012), 2012, pp. 14-24.
2M. Baboulin, S. Donfack, J. Dongarra, L. Grigori, A. Rémy, S. Tomov.

A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines, in: International Conference on Computational Science (ICCS 2012), Procedia Computer Science, Elsevier, 2012, vol. 9, pp. 17–26.
3M. Baboulin, J. Dongarra, J. Herrmann, S. Tomov.

Accelerating linear system solutions using randomization techniques, in: ACM Trans. Math. Softw., 2013, vol. 39, n^o 2.
4M. Baboulin, S. Gratton.

A contribution to the conditioning of the total least squares problem, in: SIAM J. Matrix Anal. and Appl., 2011, vol. 32, n^o 3, pp. 685–699.
5M. Bahi, C. Eisenbeis.

Impact of Reverse Computing on Information Locality in Register Allocation for High Performance Computing, in: International Journal of Parallel Programming, 2012, pp. 1–28.
6D. Barthou, O. Brand-Foissac, O. Pene, G. Grosdidier, R. Dolbeau, C. Eisenbeis, M. Kruse, K. Petrov, C. Tadonki.

Automated Code Generation for Lattice Quantum Chromodynamics and beyond, in: Journal of Physics: Conference Series, 2014, vol. 510, 012005 p, LPT-Orsay-13-142. [ DOI : 10.1088/1742-6596/510/1/012005 ]

http://hal.inria.fr/hal-00926513
7P. Esterie, J. Falcou, M. Gaunard, J.-T. Lapresté, L. Lacassagne.

The Numerical Template toolbox: A Modern C++ Design for Scientific Computing, in: Journal of Parallel and Distributed Computing, July 2014. [ DOI : 10.1016/j.jpdc.2014.07.002 ]

https://hal.inria.fr/hal-01061305
8P. Esterie, M. Gaunard, J. Falcou, J.-T. Lapresté.

Exploiting Multimedia Extensions in C++: A Portable Approach, in: Computing in Science & Engineering, 2012, vol. 14, n^o 5, pp. 72–77.
9A. Ferreira Leite.

A User-Centered and Autonomic Multi-Cloud Architecture for High Performance Computing Applications, Paris-Sud XI ; Universidade de Brasília, December 2014.

https://hal.inria.fr/tel-01097295
10G. Fursin, Y. Kashnikov, A. W. Memon, Z. Chamski, O. Temam, M. Namolaru, E. Yom-Tov, B. Mendelson, A. Zaks, E. Courtois, F. Bodin, P. Barnard, E. Ashton, E. Bonilla, J. Thomson, C. Williams, M. F. P. O'Boyle.

Milepost GCC: Machine Learning Enabled Self-tuning Compiler, in: International Journal of Parallel Programming, 2011, vol. 39, pp. 296-327, 10.1007/s10766-010-0161-2.
11M. Kruse.

Lattice QCD Optimization and Polytopic Representations of Distributed Memory, Paris-Sud XI, September 2014.

https://hal.inria.fr/tel-01078440
12S. Tomov, J. Dongarra, M. Baboulin.

Towards dense linear algebra for hybrid GPU accelerated manycore systems, in: Parallel Computing, 2010, vol. 36, n^o 5&6, pp. 232–240.

Publications of the year

Doctoral Dissertations and Habilitation Theses

13A. Ferreira Leite.

A User-Centered and Autonomic Multi-Cloud Architecture for High Performance Computing Applications, Paris-Sud XI, December 2014.

https://hal.inria.fr/tel-01097295
14M. Kruse.

Lattice QCD Optimization and Polytopic Representations of Distributed Memory, Paris-Sud XI, September 2014.

https://hal.inria.fr/tel-01078440

Articles in International Peer-Reviewed Journals

15M. Baboulin, D. Becker, G. Bosilca, A. Danalis, J. Dongarra.

An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems, in: Parallel Computing, July 2014, vol. 40, n^o 7, pp. 213-223. [ DOI : 10.1016/j.parco.2013.12.003 ]

https://hal.inria.fr/hal-01024857
16M. Baboulin, S. Gratton, R. Lacroix, A. J. Laub.

Statistical estimates for the conditioning of linear least squares problems, in: Lecture notes in computer science, 2014, vol. 8384, pp. 124-133. [ DOI : 10.1007/978-3-642-55224-3_13 ]

https://hal.inria.fr/hal-00991710
17D. Barthou, O. Brand-Foissac, O. Pene, G. Grosdidier, R. Dolbeau, C. Eisenbeis, M. Kruse, K. Petrov, C. Tadonki.

Automated Code Generation for Lattice Quantum Chromodynamics and beyond, in: Journal of Physics: Conference Series, 2014, vol. 510, 012005 p, LPT-Orsay-13-142. [ DOI : 10.1088/1742-6596/510/1/012005 ]

https://hal.inria.fr/hal-00926513
18P. Esterie, J. Falcou, M. Gaunard, J.-T. Lapresté, L. Lacassagne.

The Numerical Template toolbox: A Modern C++ Design for Scientific Computing, in: Journal of Parallel and Distributed Computing, July 2014. [ DOI : 10.1016/j.jpdc.2014.07.002 ]

https://hal.inria.fr/hal-01061305
19G. Fursin, R. Miceli, A. Lokhmotov, M. Gerndt, M. Baboulin, A. D. Malony, Z. Chamski, D. Novillo, D. D. Vento.

Collective mind: Towards practical and collaborative auto-tuning, in: Scientific Programming, July 2014, vol. 22, n^o 4, pp. 309-329. [ DOI : 10.3233/SPR-140396 ]

https://hal.inria.fr/hal-01054763
20A. Romero, L. Lacassagne, M. Gouiffès, A. Hassan Zahraee.

Covariance tracking: architecture optimizations for embedded systems, in: EURASIP Journal on Advances in Signal Processing, December 2014, 25 p. [ DOI : 10.1186/1687-6180-2014-175 ]

https://hal.inria.fr/hal-01094903
21M. Szydlarski, P. Esterie, J. Falcou, L. Grigori, R. Stompor.

Spherical harmonic transform on heterogeneous architectures using hybrid programming, in: Concurrency and Computation Practice and Experience, March 2014, vol. 26, n^o 3, 28 p. [ DOI : 10.1002/cpe.3038 ]

https://hal.inria.fr/hal-01091256

International Conferences with Proceedings

22L. Bagnères, C. Bastoul.

Switchable Scheduling for Runtime Adaptation of Optimization, in: Euro-Par 2014 Parallel Processing, Porto, Portugal, Lecture Notes in Computer Science, Springer International Publishing, August 2014, vol. 8632, pp. 222 - 233. [ DOI : 10.1007/978-3-319-09873-9_19 ]

https://hal.inria.fr/hal-01097200
23L. Cabaret, L. Lacassagne.

What Is the World's Fastest Connected Component Labeling Algorithm?, in: SiPS: IEEE International Workshop on Signal Processing Systems, Belfast, United Kingdom, IEEE, October 2014, 6 p.

https://hal.inria.fr/hal-01094905
24L. Cabaret, L. Lacassagne, L. Oudni.

A Review of World's Fastest Connected Component Labeling Algorithms: Speed and Energy Estimation, in: International Conference on Design and Architectures for Signal and Image Processing, Madrid, Spain, October 2014.

https://hal.inria.fr/hal-01081962
25A. Ferreira Leite, C. Tadonki, C. Eisenbeis, T. Raiol, M. E. Walter, A. C. Alves De Melo.

Excalibur: An Autonomic Cloud Architecture for Executing Parallel Applications, in: Fourth International Workshop on Cloud Data and Platforms (CloudDP), Amsterdam, Netherlands, April 2014. [ DOI : 10.1145/2592784.2592786 ]

https://hal-mines-paristech.archives-ouvertes.fr/hal-01087315
26L. Lacassagne, D. Etiemble, A. Hassan Zahraee, A. Dominguez, P. Vezolle.

High Level Transforms for SIMD and Low-Level Computer Vision Algorithms, in: Symposium on Principles and Practice of Parallel Programming / WPMVP, Orlando, Florida, United States, February 2014, 8 p. [ DOI : 10.1145/2568058.2568067 ]

https://hal.inria.fr/hal-01094906
27A. Leite, C. Tadonki, C. Eisenbeis, A. De Melo.

A Fine-grained Approach for Power Consumption Analysis and Prediction, in: International Conference on Computational Science - ICCS, Cairns, Australia, June 2014. [ DOI : 10.1016/j.procs.2014.05.211 ]

https://hal.inria.fr/hal-01074959
28A. Tran Tan, J. Falcou, D. Etiemble, H. Kaiser.

Automatic Task-based Code Generation for High Performance Domain Specific Embedded Language, in: HLPP 2014, Amsterdam, Netherlands, July 2014.

https://hal.inria.fr/hal-01061423
29O. Zinenko, C. Bastoul, S. Huot.

Manipulating Visualization, Not Codes, in: International Workshop on Polyhedral Compilation Techniques (IMPACT), Amsterdam, Netherlands, January 2015, 8 p.

https://hal.inria.fr/hal-01100974

Scientific Books (or Scientific Book chapters)

30A. Rémy, M. Baboulin, M. Sosonkina, B. Rozoy.

Locality Optimization on a NUMA Architecture for Hybrid LU Factorization, in: Advances in Parallel Computing, 2014, vol. 25, pp. 153-162. [ DOI : 10.3233/978-1-61499-381-0-153 ]

https://hal.inria.fr/hal-00987284

Internal Reports

31M. Baboulin, J. Dongarra, R. Lacroix.

Computing least squares condition numbers on hybrid multicore/GPU systems, February 2014, n^o RR-8479.

https://hal.inria.fr/hal-00947204
32M. Baboulin, J. Falcou, I. Masliah.

Towards an automatic generation of dense linear algebra solvers on parallel architectures, Université Paris-Sud, October 2014, n^o RR-8615, 20 p.

https://hal.inria.fr/hal-01075663
33M. Baboulin, X. S. Li, F.-H. Rouet.

Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods, Inria, February 2014, n^o RR-8481, Also appeared as Lapack Working Note 285.

https://hal.inria.fr/hal-00950612
34G. Fursin, C. Dubach.

Experience report: community-driven reviewing and validation of publications, June 2014.

https://hal.inria.fr/hal-01006563
35A. Rémy, M. Baboulin, M. Sosonkina, B. Rozoy.

Locality optimization on a NUMA architecture for hybrid LU factorization, March 2014, n^o RR-8497.

https://hal.inria.fr/hal-00957673

Other Publications

36D. Barthou, O. Brand-Foissac, R. Dolbeau, G. Grosdidier, C. Eisenbeis, M. Kruse, O. Pene, K. Petrov, C. Tadonki.

Automated Code Generation for Lattice Quantum Chromodynamics and beyond, January 2014.

https://hal.archives-ouvertes.fr/hal-00930288
37J. Lambert, H. Chouh, G. Rougeron, V. Bergeaud, S. Chatillon, L. Lacassagne, J.-C. Iehl, J.-P. Farrugia, V. Ostromoukhov.

Interactive Ultrasonic Field Simulation For Non-Destructive Testing, June 2014, vol. 33, n^o 2, 25th Eurographics Symposium on Rendering.

https://hal.inria.fr/hal-01093294
38J. Lambert, G. Rougeron, L. Lacassagne.

Calcul de champ ultrasonore interactif pour le contrôle non destructif, May 2014, Les Journées COFREND.

https://hal.inria.fr/hal-01093131

References in notes

39The HiPEAC vision on high-performance and embedded architecture and compilation (2012-2020), 2012.

http://www.hipeac.net/roadmap
40European Union Framework Program 6 MILEPOST project No 035307 (MachIne Learning for Embedded PrOgramS opTimization).

http://cordis.europa.eu/project/rcn/79763_en.html
41PRACE: Partnership for Advanced Computing in Europe.

http://www.prace-project.eu
42AMD.

AMD Core Math Library.

http://developer.amd.com/libraries/acml/
43E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. D. Croz, A. Greenbaum, S. Hammarling, A. McKenney, D. Sorensen.

LAPACK Users' Guide, SIAM, 1999, Third edition.
44K. Aneja, F. Laguzet, L. Lacassagne, A. Merigot.

Video rate image segmentation by means of region splitting and merging, in: IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2009.
45M. Arioli, M. Baboulin, S. Gratton.

A partial condition number for linear least-squares problems, in: SIAM J. Matrix Anal. and Appl., 2007, vol. 29, n^o 2, pp. 413–433.
46K. Asanovic.

The landscape of parallel computing research: a view from Berkeley, Electrical Engineering and Computer Sciences, University of California at Berkeley, December 2006, n^o UCB/EECS-2006-183.

http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.pdf
47A. Avron, P. Maymounkov, S. Toledo.

Blendenpick: Supercharging LAPACK’s least-squares solvers, in: SIAM J. Sci. Comput., 2010, vol. 32, pp. 1217–1236.
48M. Baboulin, D. Becker, G. Bosilca, A. Danalis, J. Dongarra.

An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems, in: Parallel Computing, 2014, vol. 40, n^o 7, pp. 213–223.
49M. Baboulin, D. Becker, J. Dongarra.

A Parallel Tiled Solver for Dense Symmetric Indefinite Systems on Multicore Architectures, in: Proceedings of IEEE International Parallel & Distributed Processing Symposium (IPDPS 2012), 2012, pp. 14-24.
50M. Baboulin, A. Buttari, J. Dongarra, J. Kurzak, J. Langou, J. Langou, P. Luszczek, S. Tomov.

Accelerating scientific computations with mixed precision algorithms, in: Computer Physics Communications, 2009, vol. 180, n^o 12, pp. 2526–2533.
51M. Baboulin, S. Donfack, J. Dongarra, L. Grigori, A. Rémy, S. Tomov.

A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines, in: International Conference on Computational Science (ICCS 2012), Procedia Computer Science, Elsevier, 2012, vol. 9, pp. 17–26.
52M. Baboulin, J. Dongarra, J. Demmel, S. Tomov, V. Volkov.

Enhancing the performance of dense linear algebra solvers on GPUs in the MAGMA project, November 15, 2008.

http://www.lri.fr/~baboulin/SC08.pdf
53M. Baboulin, J. Dongarra, S. Gratton, J. Langou.

Computing the conditioning of the components of a linear least squares solution, in: Numerical Linear Algebra with Applications, 2009, vol. 16, n^o 7, pp. 517–533.
54M. Baboulin, J. Dongarra, J. Herrmann, S. Tomov.

Accelerating linear system solutions using randomization techniques, in: ACM Trans. Math. Softw., 2013, vol. 39, n^o 2.
55M. Baboulin, J. Dongarra, R. Lacroix.

Computing least squares condition numbers on hybrid multicore/GPU systems, in: Proceedings of the International Conference of Applied Mathematics, Modeling and Computational Science (AMMCS 2013), 2013.
56M. Baboulin, J. Dongarra, S. Tomov.

Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures, in: 9th International Workshop on State-of-the-Art in Scientific and Parallel Computing (PARA'08), Lecture Notes in Computer Science, Springer-Verlag, 2008, vol. 6126-6127.
57M. Baboulin, S. Gratton.

A contribution to the conditioning of the total least squares problem, in: SIAM J. Matrix Anal. and Appl., 2011, vol. 32, n^o 3, pp. 685–699.
58M. Baboulin, S. Gratton, R. Lacroix, A. J. Laub.

Statistical estimates for the conditioning of linear least squares problems, in: 10th International Conference on Parallel Processing and Applied Mathematics (PPAM 2013), Heidelberg, R. Wyrzykowski (editor), Lecture Notes in Computer Science, Springer-Verlag, 2014, vol. 8384, pp. 124-133.
59M. Baboulin, X. S. Li, F.-H. Rouet.

Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods, in: Proceedings of VECPAR 2014, 2014.
60J. C. Baez, M. Stay.

Algorithmic thermodynamics, in: Mathematical Structures in Computer Science, 2012, vol. 22, n^o 5, pp. 771–787.

http://dx.doi.org/10.1017/S0960129511000521
61M. Bahi, C. Eisenbeis.

Spatial complexity of reversibly computable DAG, in: Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems, ACM, 2009, pp. 47–56.
62M. Bahi, C. Eisenbeis.

Impact of Reverse Computing on Information Locality in Register Allocation for High Performance Computing, in: International Journal of Parallel Programming, 2012, pp. 1–28.
63D. Barthou, O. Brand-Foissac, O. Pene, G. Grosdidier, R. Dolbeau, C. Eisenbeis, M. Kruse, K. Petrov, C. Tadonki.

Automated Code Generation for Lattice Quantum Chromodynamics and beyond, in: Journal of Physics: Conference Series, 2014, vol. 510, 012005, LPT-Orsay-13-142. [ DOI : 10.1088/1742-6596/510/1/012005 ]

http://hal.inria.fr/hal-00926513
64D. Barthou, G. Grosdidier, C. Eisenbeis, P. Guichon, M. Kruse, O. Pene, K. Petrov, C. Tadonki.

PetaQCD: En Route for the automatic code generation for lattice QCD, in: Proceedings of the 29th International Symposium on Lattice field theory (Lattice 2011), 2011, vol. 2011.
65P. Basu, S. Williams, B. V. Straalen, A. Venkat, L. Oliker, M. Hall.

Compiler Generation and Autotuning of Communication-Avoiding Operators for Geometric Multigrid, in: High Performance Computing Conference (HiPC), december 2013.
66D. Becker, M. Baboulin, J. Dongarra.

Reducing the amount of pivoting in symmetric indefinite systems, in: 9th International Conference on Parallel Processing and Applied Mathematics (PPAM 2011), Heidelberg, R. Wyrzykowski (editor), Lecture Notes in Computer Science, Springer-Verlag, 2012, vol. 7203, pp. 133–142.
67T. Betcke, N. J. Higham, V. Mehrmann, C. Schröder, F. Tisseur.

NLEVP: A Collection of Nonlinear Eigenvalue Problems, in: ACM Trans. Math. Software, February 2013, vol. 39, n^o 2, pp. 7:1-7:28. [ DOI : 0.1145/2427023.2427024 ]
68L. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, R. Whaley.

ScaLAPACK Users' Guide, SIAM, 1997, pp. 58–60.
69Blaze.

The Blaze Library, 2014.

https://code.google.com/p/blaze-lib/
70G. Bradski.

The OpenCV Library, in: Dr. Dobb's Journal of Software Tools, 2000.
71L. Cabaret, L. Lacassagne.

A Review of Worlds Fastest Connected Component Labeling Algorithms : Speed and Energy Estimation, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2014, pp. 1-8.
72L. Cabaret, L. Lacassagne.

What is the world fastest Connected Component Labeling Algorithm ?, in: IEEE International Workshop on Signal Processing Systems (SiPS), 2014, pp. 1-6.
73V. G. Cerf.

Where is the science in computer science?, in: Communications of the ACM, 2012, vol. 55, n^o 10, pp. 5-5.
74M. O. Cheema, L. Lacassagne, O. Hammami.

System-Platforms-Based SystemC TLM Design of Image Processing Chains for Embedded Applications, in: EURASIP Journal on Embedded Systems, 2007, pp. 1-14. [ DOI : 10.1155/2007/71043 ]
75P. Courbin, A. Pédron, T. Saidani, L. Lacassagne.

Parallélisation d'opérateurs de TI: multi-coeurs, Cell ou GPU ?, in: GRETSI, 2009.
76K. Czarnecki, U. W. Eisenecker, R. Glück, D. Vandevoorde, T. L. Veldhuizen.

Generative Programming and Active Libraries, in: Generic Programming, 1998, pp. 25-39.
77P. I. Davies, N. J. Higham.

Numerically Stable Generation of Correlation Matrices and their Factors, in: BIT, 2000, vol. 40, n^o 4, pp. 640-651.
78J. W. Demmel, L. Grigori, M. Hoemmen, J. Langou.

Communication-optimal parallel and sequential QR and LU factorizations, in: SIAM Journal on Scientific Computing, 2012, vol. 34, n^o 1, pp. 206–239.
79J. W. Demmel, A. McKenney.

A Test Matrix Generation Suite, Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, USA, March 1989, n^o MCS-P69-0389, 16 p, LAPACK Working Note 9.
80J. Dongarra et.al..

The International Exascale Software Project roadmap, in: Int. J. High Perform. Comput. Appl., February 2011, vol. 25, n^o 1, pp. 3–60.

http://dx.doi.org/10.1177/1094342010391989
81A. Elouardi, S. Bouaziz, A. Dupret, L. Lacassagne, J. Klein, R. Reynaud.

A smart sensor based vision system: implementation and evaluation, in: Journal of Applied Physics, 2006, vol. 39, pp. 1694-1705. [ DOI : 10.1088/0022-3727/39/8/033 ]
82A. Elouardi, S. Bouaziz, A. Dupret, L. Lacassagne, J. Klein, R. Reynaud.

A Smart Architecture for Low-Level Image Computing, in: International Journal of Computer Sciences and Application, 2008, vol. 5,3, pp. 1-19.
83P. Esterie, J. Falcou, M. Gaunard, J.-T. Lapresté, L. Lacassagne.

The numerical template toolbox: A modern C++ design for scientific computing, in: Journal of Parallel and Distributed Computing, 2014.
84P. Esterie, M. Gaunard, J. Falcou, J.-T. Lapresté.

Exploiting Multimedia Extensions in C++: A Portable Approach, in: Computing in Science & Engineering, 2012, vol. 14, n^o 5, pp. 72–77.
85P. Estérie, M. Gaunard, J. Falcou.

A proposal to add single instruction multiple data computation to the standard library, in: N3561, 2013.
86D. Etiemble, S. Piskorski, L. Lacassagne.

Performance evaluation of Altera C2H compiler on image processing benchmarks, in: TCHA: Workshop on Tools And Compiler for Hardware Acceleration, 2006.
87J. Falcou, L. Lacassagne, S. Schaetz.

Cell MPI: Mastering the Cell Broadband Engine architecture through a Boost based parallel communication library, in: Boost Conference, 2011.
88J. Falcou, T. Saidani, L. Lacassagne, D. Etiemble.

Programmation par squelettes algorithmiques pour le processeur Cell, in: SYMPA, 2008.
89J. Falcou, J. Sérot, L. Pech, J.-T. Lapresté.

Meta-programming applied to automatic SMP parallelization of linear algebra code, in: Euro-Par 2008–Parallel Processing, Springer Berlin Heidelberg, 2008, pp. 729–738.
90G. Fursin, C. Dubach.

Experience report: community-driven reviewing and validation of publications, in: Proceedings of the 1st Workshop on Reproducible Research Methodologies and New Publication Models in Computer Engineering (ACM SIGPLAN TRUST'14), ACM, 2014.

http://dx.doi.org/10.1145/2618137.2618142
91G. Fursin.

Collective Tuning Initiative: automating and accelerating development and optimization of computing systems, in: Proceedings of the GCC Developers' Summit, June 2009.
92G. Fursin, Y. Kashnikov, A. W. Memon, Z. Chamski, O. Temam, M. Namolaru, E. Yom-Tov, B. Mendelson, A. Zaks, E. Courtois, F. Bodin, P. Barnard, E. Ashton, E. Bonilla, J. Thomson, C. Williams, M. F. P. O'Boyle.

Milepost GCC: Machine Learning Enabled Self-tuning Compiler, in: International Journal of Parallel Programming, 2011, vol. 39, pp. 296-327, 10.1007/s10766-010-0161-2.
93G. Fursin, R. Miceli, A. Lokhmotov, M. Gerndt, M. Baboulin, A. D. Malony, Z. Chamski, D. Novillo, D. D. Vento.

Collective Mind: towards practical and collaborative auto-tuning, in: Special issue on Automatic Performance Tuning for HPC Architectures, Scientific Programming Journal, 2014.
94M. Gouiffès, F. Laguzet, L. Lacassagne.

Color Connectedness Degree For Mean-Shift Tracking, in: IEEE International Conference on Pattern Recognition (ICPR), 2010.
95M. Gouiffès, F. Laguzet, L. Lacassagne.

Projection Histogram For Mean-Shift Tracking, in: IEEE International Conference on Image Processing (ICIP), 2010.
96C. Grana, D. Borghesani, R. Cucchiara.

Connected Component Labeling Techniques on Modern Architectures, in: ICIAP, IEEE, 2009, pp. 816-824.
97L. Grigori, J. Demmel, H. Xiang.

CALU: a communication optimal LU factorization algorithm, in: SIAM J. Matrix Anal. and Appl., 2011, vol. 32, pp. 1317-1350.
98M. Gu, S. C. Eisenstat.

Efficient Algorithms for Computing a Strong Rank-revealing QR Factorization, in: SIAM Journal on Scientific Computing, July 1996, vol. 17, n^o 4, pp. 848–869.

http://dx.doi.org/10.1137/0917055
99S. Guelton, J. Falcou, P. Brunet.

Exploring the vectorization of python constructs using pythran and boost SIMD, in: Proceedings of the 2014 Workshop on Workshop on programming models for SIMD/Vector processing, ACM, 2014, pp. 79–86.
100G. Guennebaud, B. Jacob.

Eigen v3, 2010.

http://eigen.tuxfamily.org
101N. Halko, P. G. Martinsson, J. A. Tropp.

Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions, in: SIAM Review, 2011, vol. 53, pp. 217–288.
102C. Harris, M. Stephens.

A combined corner and edge detector, in: 4th ALVEY Vision Conference, Editions Hermes, Paris, 1988.
103L. He, Y. Chao, K. Suzuki.

A run-based two-scan labeling algorithm, in: ICIAR, LNCS 4633, 2007, pp. 131-142.
104R. M. Heiberger.

Algorithm AS 127: Generation of Random Orthogonal Matrices, in: J. Roy. Statist. Soc. Ser. C (Applied Statistics), 1978, vol. 27, n^o 2, pp. 199-206.
105N. J. Higham.

$J$ -Orthogonal Matrices: Properties and Generation, in: SIAM Rev., September 2003, vol. 45, n^o 3, pp. 504-519. [ DOI : 10.1137/S0036144502414930 ]
106G. E. Hinton, S. Osindero.

A fast learning algorithm for deep belief nets, in: Neural Computation, 2006, vol. 18.
107S. Horowitz, T. Pavlidis.

Picture segmentation by a tree traversal algorithm, in: Journal of the ACM, 1976, vol. 23, pp. 368-388.
108T. Ikegami, T. Sakurai, U. Nagashima.

A filter diagonalization for generalized eigenvalue problems based on the Sakurai-Sugiura projection method, in: Journal of Computational and Applied Mathematics, 2010, vol. 233, n^o 8, pp. 1927–1936.
109Intel.

Math Kernel Library.

http://developer.intel.com/software/products/mkl/
110V. Jimenez, I. Gelado, L. Vilanova, M. Gil, G. Fursin, N. Navarro.

Predictive runtime code scheduling for heterogeneous architectures, in: Proceedings of the International Conference on High Performance Embedded Architectures & Compilers (HiPEAC 2009), January 2009.
111C. S. Kenney, A. J. Laub.

Small-sample statistical condition estimates for general matrix functions, in: SIAM J. Sci. Comput., 1994, vol. 15, pp. 36–61.
112A. Khabou, J. Demmel, L. Grigori, M. Gu.

LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version, in: SIAM Journal on Matrix Analysis and Applications, 2013, vol. 34, n^o 3, pp. 1401-1429.

http://epubs.siam.org/doi/abs/10.1137/120863691
113M. Kruse.

Lattice QCD Optimization and Polytopic Representations of Distributed Memory, Université Paris-Sud 11, September, 26th 2014.
114T. Kunlin, L. Lacassagne, A. Mérigot.

A Fast image segmentation scheme, in: International Conference on Information and Communication Technologies, IEEE, 2004.
115L. Lacassagne, D. Etiemble, A. Hassan Zahraee, A. Dominguez, P. Vezolle.

High Level Transforms for SIMD and low-level computer vision algorithms, in: ACM Workshop on Programming Models for SIMD/Vector Processing (PPoPP), 2014, pp. 49-56.
116L. Lacassagne, D. Etiemble, S. Kablia.

16-bit Floating Point Instructions for embedded Multimedia Applications, in: CAMP: Computer Architecture and Machine Perception, IEEE, 2005.
117L. Lacassagne, D. Etiemble.

16-bit floating point operations for low-end and high-end embedded processors, in: ODES: Optimizations for DSP and Embedded Systems, IEEE/ACM, 2005.
118L. Lacassagne, A. Manzanera, J. Denoulet, A. Mérigot.

High Performance Motion Detection: Some trends toward new embedded architectures for vision systems, in: Journal of Real Time Image Processing, october 2008, pp. 127-148. [ DOI : 10.1007/s11554-008-0096-7 ]
119L. Lacassagne, A. B. Zavidovique.

Light Speed Labeling for RISC architectures, in: IEEE International Conference on Image Analysis and Processing (ICIP), 2009.
120L. Lacassagne, B. Zavidovique.

Light Speed Labeling: efficient connected component labeling on RISC architectures, in: Journal of Real-Time Image Processing, 2011, vol. 6, n^o 2, pp. 117-135.
121F. Laguzet, M. Gouiffès, L. Lacassagne.

Automatic color space switching for robust tracking, in: IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2011.
122F. Laguzet, A. Romero, M. Gouiffès, L. Lacassagne, D. Etiemble.

Color tracking with contextual switching: Real-time implementation on CPU, in: Journal of Real-Time Image Processing, 2013, pp. 1-18.
123J. Lambert, L. Lacassagne, G. Rougeron, S. L. Berre, S. Chatillon.

High Performance simulation of ultrasonic fields for Non Destructive Testing, in: International Symposium in Nuclear Application and Monte-Carlo, 2013.
124J. Lambert, A. Pédron, G. Gens, F. Bimbard, L. Lacassagne, E. Iakovleva, S. L. Berre.

Analysis of multicore CPU and GPU toward parallelization of Total Focusing Method ultrasound reconstruction, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2012.
125J. Lambert, G. Rougeron, L. Lacassagne, S. Chatillon.

A fast ultrasonic simulation tool based on massively parallel implementations, in: Review of Progress of Quantitative Nondestructive Evaluation, 2013.
126Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, A. Ng.

Building high-level features using large scale unsupervised learning, in: International Conference in Machine Learning, 2012.
127W. Ledermann, C. Alexander, D. Ledermann.

Random Orthogonal Matrix Simulation, in: Linear Algebra Appl., 2011, vol. 434, n^o 6, pp. 1444-1467. [ DOI : 10.1016/j.laa.2010.10.023 ]
128A. Leite, C. Tadonki, C. Eisenbeis, A. de Melo.

A Fine-grained Approach for Power Consumption Analysis and Prediction, in: Procedia Computer Science, 2014, vol. 29, pp. 2260–2271.
129S. Liu, C. Eisenbeis, J.-L. Gaudiot.

A theoretical framework for value prediction in parallel systems, in: Parallel Processing (ICPP), 2010 39th International Conference on, IEEE, 2010, pp. 11–20.
130M. W. Mahoney.

Randomized algorithms for matrices and data, in: Foundations and Trends in Machine Learning, 2011, vol. 3, n^o 2, pp. 123–224.
131D. Menard, R. Serizel, R. Rocher, O. Sentieys.

Accuracy Constraint Determination in Fixed-Point System Design, in: Journal on Embedded Systems (JES),, 2008, vol. 2008, pp. 1-12. [ DOI : 10.1155/2008/242584 ]
132P. Monasse, F. Guichard.

Fast computation of contrast-onvariant image representation, in: Transaction on, 2000, vol. 9,5, pp. 860-872.
133S. Moufawad.

Demmel type communication-avoiding generalized minimal residual method (CA-GMRES) on multicore hardwares: an application in QCD, American university of Beirut, Beirut, Libanon, june 2011, defended on 2010, June 10th.
134M. Odersky.

An Overview of the SCALA Programming Language, EPFL Lausanne, Switzerland, 2004, n^o IC/2004/64.
135D. S. Parker.

Random Butterfly Transformations with Applications in Computational Linear Algebra, Computer Science Department, UCLA, 1995, n^o CSD-950023.
136D. Petcu.

Consuming Resources and Services from Multiple Clouds, in: Journal of Grid Computing, 2014, pp. 1–25.
137M. Pharr, W. R. Mark.

ISPC: A SPMD Compiler for High-Performance CPU Programming, in: Innovative Parallel Computing (InPar), 2012.
138S. Piskorski, L. Lacassagne, D. Etiemble.

IPLG: un outil pour la fusion d'opérateurs en Traitement d'Images, in: SYMPA, 2009.
139S. Piskorski, L. Lacassagne, M. Kieffer, D. Etiemble.

Efficient floating point interval processing for embedded systems and applications, in: SCAN - International Symposium of Scientific computing, Computer Arithmetic and Validated Numerics, 2006, 2006 p.
140S. Pop, A. Cohen, C. Bastoul, S. Girbal, G. A. Silber, N. Vasilache.

GRAPHITE: Loop optimizations based on the polyhedral model for GCC, in: Proc. of the 4th GCC Developper's Summit, June 2006, pp. 179–198.
141A. Pédron, L. Lacassagne, V. Barbillon, F. Bimbard, G. Rougeron, S. L. Berre.

Performance analysis of an ultrasound reconstruction algorithm for non destructuve testing, in: IEEE International Conference on Parallel Computing (ParCo), 2011.
142A. Pédron, L. Lacassagne, F. Bimbard, S. L. Berre.

Parallelization of an ultrasound reconstruction algorithm for non destructive testing on multicore CPU and GPU, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2011.
143A. Romero, M. Gouiffès, L. Lacassagne.

Feature Points tracking adaptative to Saturation, in: IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2011.
144A. Romero, M. Gouiffès, L. Lacassagne.

Covariance Descriptor Multiple Object Tracking and Re-Identification with Colorspace Evaluation, in: IEEE ACCV - Workshop on Detection and Tracking in Challenging Environnements, 2012.
145A. Romero, M. Gouiffès, L. Lacassagne.

Enhanced Local Binary Covariance Matrices (ELBCM) for texture analysis and object tracking, in: ACM International Conference on Computer Vision / Computer Graphics Collaboration Techniques and Applications, 2013.
146A. Romero, L. Lacassagne, M. Gouiffès.

Real-time covariance tracking algorithm for embedded systems, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2013.
147A. Rosenfeld, J. Platz.

Sequential operator in digital pictures processing, in: Journal of ACM, 1966, vol. 13,4, pp. 471-494.
148A. Rémy, M. Baboulin, M. Sosonkina, B. Rozoy.

Locality optimization on a NUMA architecture for hybrid LU factorization, in: International Conference on Parallel Computing (PARCO 2013), Advances in Parallel Computing, IOS Press, 2014, vol. 25, pp. 153-162.
149T. Saidani, J. Falcou, C. Tadonki, L. Lacassagne, D. Etiemble.

Algorithmic Skeletons within an Embedded Domain Specific Language for the Cell Processor, in: Parallel Architectures and Compilation Techniques, PACT, 2009, pp. 67-76.
150T. Saidani, L. Lacassagne, S. Bouaziz, T. M. Khan.

Parallelization Strategies for the Points of Interests Algorithm on the Cell Processor, in: Lecture Notes in Computer Science, Springer, 2007, pp. 104-112. [ DOI : 10.1007/978-3-540-74742-0 ]
151T. Saidani, S. Piskorski, L. Lacassagne, S. Bouaziz.

Parallelization Schemes for Memory Optimization on the Cell Processor: A Case Study of Image Processing Algorithm, in: PACT-MEDEA, 2007, pp. 15-19.
152C. Sanderson.

Armadillo: An open source C++ linear algebra library for fast prototyping and computationally intensive experiments, in: Report Version, 2010, vol. 2.
153J. Siek, L.-Q. Lee, A. Lumsdaine.

Boost Random Number Library, June 2000.

http://www.boost.org/libs/graph/
154D. Spinellis.

Notable design patterns for domain-specific languages, in: Journal of Systems and Software, 2001, vol. 56, n^o 1, pp. 91 - 99. [ DOI : 10.1016/S0164-1212(00)00089-3 ]

http://www.sciencedirect.com/science/article/pii/S0164121200000893
155G. W. Stewart.

The Efficient Generation of Random Orthogonal Matrices With an Application to Condition Estimators, in: SIAM J. Numer. Anal., 1980, vol. 17, n^o 3, pp. 403-409.
156A. K. Sujeeth, A. Gibbons, K. J. Brown, H. Lee, T. Rompf, M. Odersky, K. Olukotun.

Forge: Generating a High Performance DSL Implementation from a Declarative Specification, in: 12th International Conference on Generative Programming: Concepts and Experiences, 2013.
157A. K. Sujeeth, T. Rompf, K. J. Brown, H. Lee, H. Chafi, V. Popic, M. Wu, A. Prokopec, V. Jovanovic, M. Odersky, K. Olukotun.

Composition and Reuse with Compiled Domain-Specific Languages, in: ECOOP'13: European Conference on Object-Oriented Programming, 2013.
158V. Sundriyal, M. Sosonkina, A. Gaenko, Z. Zhang.

Energy saving strategies for parallel applications with point-to-point communication phases, in: Journal of Parallel and Distributed Computing, 2013. [ DOI : 10.1016/j.jpdc.2013.03.006 ]
159V. Sundriyal, M. Sosonkina, Z. Zhang.

Automatic runtime frequency-scaling system for energy savings in parallel applications, in: The Journal of Supercomputing, 2014, vol. 68, n^o 2, pp. 777–797.
160K. Suzuki, I. Horiba, N. Sugie.

Linear-time connected component labeling based on sequential local operations, in: Computer Vision and Image Understanding, january 2003, vol. 89, n^o 1, pp. 1-23. [ DOI : 10.1016/S1077-3142(02)00030-9 ]
161H. Tabia, M. Gouiffès, L. Lacassagne.

Motion histogram quantification for human action recognition, in: IEEE International Conference on Pattern Recognition (ICPR), 2012.
162H. Tabia, M. Gouiffès, L. Lacassagne.

Motion modeling for abnormal event detection in crowd scenes, in: IEEE International Conference on Pattern Recognition (ISCIVC), 2012.
163C. Tadonki, L. Lacassagne, T. Saïdani, J. Falcou, K. Hamidouche.

The Harris algorithm revisited on the Cell processor, in: International Workshop on Highly-Efficient Accelerators and Reconfigurable Technologies (HEART), 2010.
164S. Tomov, J. Dongarra, M. Baboulin.

Towards dense linear algebra for hybrid GPU accelerated manycore systems, in: Parallel Computing, 2010, vol. 36, n^o 5&6, pp. 232–240.
165University of Tennessee.

PLASMA Users' Guide, Parallel Linear Algebra Software for Multicore Architectures, Version 2.3, 2010.
166T. L. Veldhuizen.

Active Libraries and Universal Languages, Indiana University Computer Science, May 2004.

http://www.ubietylab.net/ubigraph/content/Papers/pdf/VeldhuizenThesis.pdf
167H. Wang, H. Andrade, B. Gedik, K.-L. Wu.

A Code Generation Approach for Auto-Vectorization in the Spade Compiler, in: LCPC'09, 2009, pp. 383-390.
168Y. Wang, M. Baboulin, J. Dongarra, J. Falcou, Y. Fraigneau, O. L. Maître.

A parallel solver for incompressible fluid flows, in: International Conference on Computational Science (ICCS 2013), Procedia Computer Science, Elsevier, 2013, vol. 18, pp. 439–448.
169Y. Wang, M. Baboulin, K. Rupp, O. Le Maître, Y. Fraigneau.

Solving 3D Incompressible Navier-Stokes Equations on Hybrid CPU/GPU Systems, in: Proceedings of the High Performance Computing Symposium, San Diego, CA, USA, HPC '14, Society for Computer Simulation International, 2014, pp. 12:1–12:8.

http://dl.acm.org/citation.cfm?id=2663510.2663522
170H. Ye, L. Lacassagne, D. Etiemble, L. Cabaret, J. Falcou, O. Florent.

Impact of High Level Transforms on High Level Synthesis for motion detection algorithm, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2012, pp. 1-8.
171H. Ye, L. Lacassagne, J. Falcou, D. Etiemble, L. Cabaret, O. Florent.

High Level Transforms to reduce energy consumption of signal and image processing operators, in: IEEE International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), 2013, pp. 247-254.

Previous |

Home