Members
Overall Objectives
Research Program
New Software and Platforms
New Results
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Bibliography

Major publications by the team in recent years
  • 1M. Baboulin, D. Becker, J. Dongarra.
    A Parallel Tiled Solver for Dense Symmetric Indefinite Systems on Multicore Architectures, in: Proceedings of IEEE International Parallel & Distributed Processing Symposium (IPDPS 2012), 2012, pp. 14-24.
  • 2M. Baboulin, S. Donfack, J. Dongarra, L. Grigori, A. Rémy, S. Tomov.
    A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines, in: International Conference on Computational Science (ICCS 2012), Procedia Computer Science, Elsevier, 2012, vol. 9, pp. 17–26.
  • 3M. Baboulin, J. Dongarra, J. Herrmann, S. Tomov.
    Accelerating linear system solutions using randomization techniques, in: ACM Trans. Math. Softw., 2013, vol. 39, no 2.
  • 4M. Baboulin, S. Gratton.
    A contribution to the conditioning of the total least squares problem, in: SIAM J. Matrix Anal. and Appl., 2011, vol. 32, no 3, pp. 685–699.
  • 5M. Bahi, C. Eisenbeis.
    Impact of Reverse Computing on Information Locality in Register Allocation for High Performance Computing, in: International Journal of Parallel Programming, 2012, pp. 1–28.
  • 6D. Barthou, O. Brand-Foissac, O. Pene, G. Grosdidier, R. Dolbeau, C. Eisenbeis, M. Kruse, K. Petrov, C. Tadonki.
    Automated Code Generation for Lattice Quantum Chromodynamics and beyond, in: Journal of Physics: Conference Series, 2014, vol. 510, 012005 p, LPT-Orsay-13-142. [ DOI : 10.1088/1742-6596/510/1/012005 ]
    http://hal.inria.fr/hal-00926513
  • 7P. Esterie, J. Falcou, M. Gaunard, J.-T. Lapresté, L. Lacassagne.
    The Numerical Template toolbox: A Modern C++ Design for Scientific Computing, in: Journal of Parallel and Distributed Computing, July 2014. [ DOI : 10.1016/j.jpdc.2014.07.002 ]
    https://hal.inria.fr/hal-01061305
  • 8P. Esterie, M. Gaunard, J. Falcou, J.-T. Lapresté.
    Exploiting Multimedia Extensions in C++: A Portable Approach, in: Computing in Science & Engineering, 2012, vol. 14, no 5, pp. 72–77.
  • 9A. Ferreira Leite.
    A User-Centered and Autonomic Multi-Cloud Architecture for High Performance Computing Applications, Paris-Sud XI ; Universidade de Brasília, December 2014.
    https://hal.inria.fr/tel-01097295
  • 10G. Fursin, Y. Kashnikov, A. W. Memon, Z. Chamski, O. Temam, M. Namolaru, E. Yom-Tov, B. Mendelson, A. Zaks, E. Courtois, F. Bodin, P. Barnard, E. Ashton, E. Bonilla, J. Thomson, C. Williams, M. F. P. O'Boyle.
    Milepost GCC: Machine Learning Enabled Self-tuning Compiler, in: International Journal of Parallel Programming, 2011, vol. 39, pp. 296-327, 10.1007/s10766-010-0161-2.
  • 11M. Kruse.
    Lattice QCD Optimization and Polytopic Representations of Distributed Memory, Paris-Sud XI, September 2014.
    https://hal.inria.fr/tel-01078440
  • 12S. Tomov, J. Dongarra, M. Baboulin.
    Towards dense linear algebra for hybrid GPU accelerated manycore systems, in: Parallel Computing, 2010, vol. 36, no 5&6, pp. 232–240.
Publications of the year

Doctoral Dissertations and Habilitation Theses

  • 13A. Ferreira Leite.
    A User-Centered and Autonomic Multi-Cloud Architecture for High Performance Computing Applications, Paris-Sud XI, December 2014.
    https://hal.inria.fr/tel-01097295
  • 14M. Kruse.
    Lattice QCD Optimization and Polytopic Representations of Distributed Memory, Paris-Sud XI, September 2014.
    https://hal.inria.fr/tel-01078440

Articles in International Peer-Reviewed Journals

  • 15M. Baboulin, D. Becker, G. Bosilca, A. Danalis, J. Dongarra.
    An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems, in: Parallel Computing, July 2014, vol. 40, no 7, pp. 213-223. [ DOI : 10.1016/j.parco.2013.12.003 ]
    https://hal.inria.fr/hal-01024857
  • 16M. Baboulin, S. Gratton, R. Lacroix, A. J. Laub.
    Statistical estimates for the conditioning of linear least squares problems, in: Lecture notes in computer science, 2014, vol. 8384, pp. 124-133. [ DOI : 10.1007/978-3-642-55224-3_13 ]
    https://hal.inria.fr/hal-00991710
  • 17D. Barthou, O. Brand-Foissac, O. Pene, G. Grosdidier, R. Dolbeau, C. Eisenbeis, M. Kruse, K. Petrov, C. Tadonki.
    Automated Code Generation for Lattice Quantum Chromodynamics and beyond, in: Journal of Physics: Conference Series, 2014, vol. 510, 012005 p, LPT-Orsay-13-142. [ DOI : 10.1088/1742-6596/510/1/012005 ]
    https://hal.inria.fr/hal-00926513
  • 18P. Esterie, J. Falcou, M. Gaunard, J.-T. Lapresté, L. Lacassagne.
    The Numerical Template toolbox: A Modern C++ Design for Scientific Computing, in: Journal of Parallel and Distributed Computing, July 2014. [ DOI : 10.1016/j.jpdc.2014.07.002 ]
    https://hal.inria.fr/hal-01061305
  • 19G. Fursin, R. Miceli, A. Lokhmotov, M. Gerndt, M. Baboulin, A. D. Malony, Z. Chamski, D. Novillo, D. D. Vento.
    Collective mind: Towards practical and collaborative auto-tuning, in: Scientific Programming, July 2014, vol. 22, no 4, pp. 309-329. [ DOI : 10.3233/SPR-140396 ]
    https://hal.inria.fr/hal-01054763
  • 20A. Romero, L. Lacassagne, M. Gouiffès, A. Hassan Zahraee.
    Covariance tracking: architecture optimizations for embedded systems, in: EURASIP Journal on Advances in Signal Processing, December 2014, 25 p. [ DOI : 10.1186/1687-6180-2014-175 ]
    https://hal.inria.fr/hal-01094903
  • 21M. Szydlarski, P. Esterie, J. Falcou, L. Grigori, R. Stompor.
    Spherical harmonic transform on heterogeneous architectures using hybrid programming, in: Concurrency and Computation Practice and Experience, March 2014, vol. 26, no 3, 28 p. [ DOI : 10.1002/cpe.3038 ]
    https://hal.inria.fr/hal-01091256

International Conferences with Proceedings

  • 22L. Bagnères, C. Bastoul.
    Switchable Scheduling for Runtime Adaptation of Optimization, in: Euro-Par 2014 Parallel Processing, Porto, Portugal, Lecture Notes in Computer Science, Springer International Publishing, August 2014, vol. 8632, pp. 222 - 233. [ DOI : 10.1007/978-3-319-09873-9_19 ]
    https://hal.inria.fr/hal-01097200
  • 23L. Cabaret, L. Lacassagne.
    What Is the World's Fastest Connected Component Labeling Algorithm?, in: SiPS: IEEE International Workshop on Signal Processing Systems, Belfast, United Kingdom, IEEE, October 2014, 6 p.
    https://hal.inria.fr/hal-01094905
  • 24L. Cabaret, L. Lacassagne, L. Oudni.
    A Review of World's Fastest Connected Component Labeling Algorithms: Speed and Energy Estimation, in: International Conference on Design and Architectures for Signal and Image Processing, Madrid, Spain, October 2014.
    https://hal.inria.fr/hal-01081962
  • 25A. Ferreira Leite, C. Tadonki, C. Eisenbeis, T. Raiol, M. E. Walter, A. C. Alves De Melo.
    Excalibur: An Autonomic Cloud Architecture for Executing Parallel Applications, in: Fourth International Workshop on Cloud Data and Platforms (CloudDP), Amsterdam, Netherlands, April 2014. [ DOI : 10.1145/2592784.2592786 ]
    https://hal-mines-paristech.archives-ouvertes.fr/hal-01087315
  • 26L. Lacassagne, D. Etiemble, A. Hassan Zahraee, A. Dominguez, P. Vezolle.
    High Level Transforms for SIMD and Low-Level Computer Vision Algorithms, in: Symposium on Principles and Practice of Parallel Programming / WPMVP, Orlando, Florida, United States, February 2014, 8 p. [ DOI : 10.1145/2568058.2568067 ]
    https://hal.inria.fr/hal-01094906
  • 27A. Leite, C. Tadonki, C. Eisenbeis, A. De Melo.
    A Fine-grained Approach for Power Consumption Analysis and Prediction, in: International Conference on Computational Science - ICCS, Cairns, Australia, June 2014. [ DOI : 10.1016/j.procs.2014.05.211 ]
    https://hal.inria.fr/hal-01074959
  • 28A. Tran Tan, J. Falcou, D. Etiemble, H. Kaiser.
    Automatic Task-based Code Generation for High Performance Domain Specific Embedded Language, in: HLPP 2014, Amsterdam, Netherlands, July 2014.
    https://hal.inria.fr/hal-01061423
  • 29O. Zinenko, C. Bastoul, S. Huot.
    Manipulating Visualization, Not Codes, in: International Workshop on Polyhedral Compilation Techniques (IMPACT), Amsterdam, Netherlands, January 2015, 8 p.
    https://hal.inria.fr/hal-01100974

Scientific Books (or Scientific Book chapters)

  • 30A. Rémy, M. Baboulin, M. Sosonkina, B. Rozoy.
    Locality Optimization on a NUMA Architecture for Hybrid LU Factorization, in: Advances in Parallel Computing, 2014, vol. 25, pp. 153-162. [ DOI : 10.3233/978-1-61499-381-0-153 ]
    https://hal.inria.fr/hal-00987284

Internal Reports

  • 31M. Baboulin, J. Dongarra, R. Lacroix.
    Computing least squares condition numbers on hybrid multicore/GPU systems, February 2014, no RR-8479.
    https://hal.inria.fr/hal-00947204
  • 32M. Baboulin, J. Falcou, I. Masliah.
    Towards an automatic generation of dense linear algebra solvers on parallel architectures, Université Paris-Sud, October 2014, no RR-8615, 20 p.
    https://hal.inria.fr/hal-01075663
  • 33M. Baboulin, X. S. Li, F.-H. Rouet.
    Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods, Inria, February 2014, no RR-8481, Also appeared as Lapack Working Note 285.
    https://hal.inria.fr/hal-00950612
  • 34G. Fursin, C. Dubach.
    Experience report: community-driven reviewing and validation of publications, June 2014.
    https://hal.inria.fr/hal-01006563
  • 35A. Rémy, M. Baboulin, M. Sosonkina, B. Rozoy.
    Locality optimization on a NUMA architecture for hybrid LU factorization, March 2014, no RR-8497.
    https://hal.inria.fr/hal-00957673

Other Publications

  • 36D. Barthou, O. Brand-Foissac, R. Dolbeau, G. Grosdidier, C. Eisenbeis, M. Kruse, O. Pene, K. Petrov, C. Tadonki.
    Automated Code Generation for Lattice Quantum Chromodynamics and beyond, January 2014.
    https://hal.archives-ouvertes.fr/hal-00930288
  • 37J. Lambert, H. Chouh, G. Rougeron, V. Bergeaud, S. Chatillon, L. Lacassagne, J.-C. Iehl, J.-P. Farrugia, V. Ostromoukhov.
    Interactive Ultrasonic Field Simulation For Non-Destructive Testing, June 2014, vol. 33, no 2, 25th Eurographics Symposium on Rendering.
    https://hal.inria.fr/hal-01093294
  • 38J. Lambert, G. Rougeron, L. Lacassagne.
    Calcul de champ ultrasonore interactif pour le contrôle non destructif, May 2014, Les Journées COFREND.
    https://hal.inria.fr/hal-01093131
References in notes
  • 39The HiPEAC vision on high-performance and embedded architecture and compilation (2012-2020), 2012.
    http://www.hipeac.net/roadmap
  • 40European Union Framework Program 6 MILEPOST project No 035307 (MachIne Learning for Embedded PrOgramS opTimization).
    http://cordis.europa.eu/project/rcn/79763_en.html
  • 41PRACE: Partnership for Advanced Computing in Europe.
    http://www.prace-project.eu
  • 42AMD.
    AMD Core Math Library.
    http://developer.amd.com/libraries/acml/
  • 43E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. D. Croz, A. Greenbaum, S. Hammarling, A. McKenney, D. Sorensen.
    LAPACK Users' Guide, SIAM, 1999, Third edition.
  • 44K. Aneja, F. Laguzet, L. Lacassagne, A. Merigot.
    Video rate image segmentation by means of region splitting and merging, in: IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2009.
  • 45M. Arioli, M. Baboulin, S. Gratton.
    A partial condition number for linear least-squares problems, in: SIAM J. Matrix Anal. and Appl., 2007, vol. 29, no 2, pp. 413–433.
  • 46K. Asanovic.
    The landscape of parallel computing research: a view from Berkeley, Electrical Engineering and Computer Sciences, University of California at Berkeley, December 2006, no UCB/EECS-2006-183.
    http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.pdf
  • 47A. Avron, P. Maymounkov, S. Toledo.
    Blendenpick: Supercharging LAPACK’s least-squares solvers, in: SIAM J. Sci. Comput., 2010, vol. 32, pp. 1217–1236.
  • 48M. Baboulin, D. Becker, G. Bosilca, A. Danalis, J. Dongarra.
    An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems, in: Parallel Computing, 2014, vol. 40, no 7, pp. 213–223.
  • 49M. Baboulin, D. Becker, J. Dongarra.
    A Parallel Tiled Solver for Dense Symmetric Indefinite Systems on Multicore Architectures, in: Proceedings of IEEE International Parallel & Distributed Processing Symposium (IPDPS 2012), 2012, pp. 14-24.
  • 50M. Baboulin, A. Buttari, J. Dongarra, J. Kurzak, J. Langou, J. Langou, P. Luszczek, S. Tomov.
    Accelerating scientific computations with mixed precision algorithms, in: Computer Physics Communications, 2009, vol. 180, no 12, pp. 2526–2533.
  • 51M. Baboulin, S. Donfack, J. Dongarra, L. Grigori, A. Rémy, S. Tomov.
    A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines, in: International Conference on Computational Science (ICCS 2012), Procedia Computer Science, Elsevier, 2012, vol. 9, pp. 17–26.
  • 52M. Baboulin, J. Dongarra, J. Demmel, S. Tomov, V. Volkov.
    Enhancing the performance of dense linear algebra solvers on GPUs in the MAGMA project, November 15, 2008.
    http://www.lri.fr/~baboulin/SC08.pdf
  • 53M. Baboulin, J. Dongarra, S. Gratton, J. Langou.
    Computing the conditioning of the components of a linear least squares solution, in: Numerical Linear Algebra with Applications, 2009, vol. 16, no 7, pp. 517–533.
  • 54M. Baboulin, J. Dongarra, J. Herrmann, S. Tomov.
    Accelerating linear system solutions using randomization techniques, in: ACM Trans. Math. Softw., 2013, vol. 39, no 2.
  • 55M. Baboulin, J. Dongarra, R. Lacroix.
    Computing least squares condition numbers on hybrid multicore/GPU systems, in: Proceedings of the International Conference of Applied Mathematics, Modeling and Computational Science (AMMCS 2013), 2013.
  • 56M. Baboulin, J. Dongarra, S. Tomov.
    Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures, in: 9th International Workshop on State-of-the-Art in Scientific and Parallel Computing (PARA'08), Lecture Notes in Computer Science, Springer-Verlag, 2008, vol. 6126-6127.
  • 57M. Baboulin, S. Gratton.
    A contribution to the conditioning of the total least squares problem, in: SIAM J. Matrix Anal. and Appl., 2011, vol. 32, no 3, pp. 685–699.
  • 58M. Baboulin, S. Gratton, R. Lacroix, A. J. Laub.
    Statistical estimates for the conditioning of linear least squares problems, in: 10th International Conference on Parallel Processing and Applied Mathematics (PPAM 2013), Heidelberg, R. Wyrzykowski (editor), Lecture Notes in Computer Science, Springer-Verlag, 2014, vol. 8384, pp. 124-133.
  • 59M. Baboulin, X. S. Li, F.-H. Rouet.
    Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods, in: Proceedings of VECPAR 2014, 2014.
  • 60J. C. Baez, M. Stay.
    Algorithmic thermodynamics, in: Mathematical Structures in Computer Science, 2012, vol. 22, no 5, pp. 771–787.
    http://dx.doi.org/10.1017/S0960129511000521
  • 61M. Bahi, C. Eisenbeis.
    Spatial complexity of reversibly computable DAG, in: Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems, ACM, 2009, pp. 47–56.
  • 62M. Bahi, C. Eisenbeis.
    Impact of Reverse Computing on Information Locality in Register Allocation for High Performance Computing, in: International Journal of Parallel Programming, 2012, pp. 1–28.
  • 63D. Barthou, O. Brand-Foissac, O. Pene, G. Grosdidier, R. Dolbeau, C. Eisenbeis, M. Kruse, K. Petrov, C. Tadonki.
    Automated Code Generation for Lattice Quantum Chromodynamics and beyond, in: Journal of Physics: Conference Series, 2014, vol. 510, 012005, LPT-Orsay-13-142. [ DOI : 10.1088/1742-6596/510/1/012005 ]
    http://hal.inria.fr/hal-00926513
  • 64D. Barthou, G. Grosdidier, C. Eisenbeis, P. Guichon, M. Kruse, O. Pene, K. Petrov, C. Tadonki.
    PetaQCD: En Route for the automatic code generation for lattice QCD, in: Proceedings of the 29th International Symposium on Lattice field theory (Lattice 2011), 2011, vol. 2011.
  • 65P. Basu, S. Williams, B. V. Straalen, A. Venkat, L. Oliker, M. Hall.
    Compiler Generation and Autotuning of Communication-Avoiding Operators for Geometric Multigrid, in: High Performance Computing Conference (HiPC), december 2013.
  • 66D. Becker, M. Baboulin, J. Dongarra.
    Reducing the amount of pivoting in symmetric indefinite systems, in: 9th International Conference on Parallel Processing and Applied Mathematics (PPAM 2011), Heidelberg, R. Wyrzykowski (editor), Lecture Notes in Computer Science, Springer-Verlag, 2012, vol. 7203, pp. 133–142.
  • 67T. Betcke, N. J. Higham, V. Mehrmann, C. Schröder, F. Tisseur.
    NLEVP: A Collection of Nonlinear Eigenvalue Problems, in: ACM Trans. Math. Software, February 2013, vol. 39, no 2, pp. 7:1-7:28. [ DOI : 0.1145/2427023.2427024 ]
  • 68L. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, R. Whaley.
    ScaLAPACK Users' Guide, SIAM, 1997, pp. 58–60.
  • 69Blaze.
    The Blaze Library, 2014.
    https://code.google.com/p/blaze-lib/
  • 70G. Bradski.
    The OpenCV Library, in: Dr. Dobb's Journal of Software Tools, 2000.
  • 71L. Cabaret, L. Lacassagne.
    A Review of World’s Fastest Connected Component Labeling Algorithms : Speed and Energy Estimation, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2014, pp. 1-8.
  • 72L. Cabaret, L. Lacassagne.
    What is the world fastest Connected Component Labeling Algorithm ?, in: IEEE International Workshop on Signal Processing Systems (SiPS), 2014, pp. 1-6.
  • 73V. G. Cerf.
    Where is the science in computer science?, in: Communications of the ACM, 2012, vol. 55, no 10, pp. 5-5.
  • 74M. O. Cheema, L. Lacassagne, O. Hammami.
    System-Platforms-Based SystemC TLM Design of Image Processing Chains for Embedded Applications, in: EURASIP Journal on Embedded Systems, 2007, pp. 1-14. [ DOI : 10.1155/2007/71043 ]
  • 75P. Courbin, A. Pédron, T. Saidani, L. Lacassagne.
    Parallélisation d'opérateurs de TI: multi-coeurs, Cell ou GPU ?, in: GRETSI, 2009.
  • 76K. Czarnecki, U. W. Eisenecker, R. Glück, D. Vandevoorde, T. L. Veldhuizen.
    Generative Programming and Active Libraries, in: Generic Programming, 1998, pp. 25-39.
  • 77P. I. Davies, N. J. Higham.
    Numerically Stable Generation of Correlation Matrices and their Factors, in: BIT, 2000, vol. 40, no 4, pp. 640-651.
  • 78J. W. Demmel, L. Grigori, M. Hoemmen, J. Langou.
    Communication-optimal parallel and sequential QR and LU factorizations, in: SIAM Journal on Scientific Computing, 2012, vol. 34, no 1, pp. 206–239.
  • 79J. W. Demmel, A. McKenney.
    A Test Matrix Generation Suite, Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, USA, March 1989, no MCS-P69-0389, 16 p, LAPACK Working Note 9.
  • 80J. Dongarra et.al..
    The International Exascale Software Project roadmap, in: Int. J. High Perform. Comput. Appl., February 2011, vol. 25, no 1, pp. 3–60.
    http://dx.doi.org/10.1177/1094342010391989
  • 81A. Elouardi, S. Bouaziz, A. Dupret, L. Lacassagne, J. Klein, R. Reynaud.
    A smart sensor based vision system: implementation and evaluation, in: Journal of Applied Physics, 2006, vol. 39, pp. 1694-1705. [ DOI : 10.1088/0022-3727/39/8/033 ]
  • 82A. Elouardi, S. Bouaziz, A. Dupret, L. Lacassagne, J. Klein, R. Reynaud.
    A Smart Architecture for Low-Level Image Computing, in: International Journal of Computer Sciences and Application, 2008, vol. 5,3, pp. 1-19.
  • 83P. Esterie, J. Falcou, M. Gaunard, J.-T. Lapresté, L. Lacassagne.
    The numerical template toolbox: A modern C++ design for scientific computing, in: Journal of Parallel and Distributed Computing, 2014.
  • 84P. Esterie, M. Gaunard, J. Falcou, J.-T. Lapresté.
    Exploiting Multimedia Extensions in C++: A Portable Approach, in: Computing in Science & Engineering, 2012, vol. 14, no 5, pp. 72–77.
  • 85P. Estérie, M. Gaunard, J. Falcou.
    A proposal to add single instruction multiple data computation to the standard library, in: N3561, 2013.
  • 86D. Etiemble, S. Piskorski, L. Lacassagne.
    Performance evaluation of Altera C2H compiler on image processing benchmarks, in: TCHA: Workshop on Tools And Compiler for Hardware Acceleration, 2006.
  • 87J. Falcou, L. Lacassagne, S. Schaetz.
    Cell MPI: Mastering the Cell Broadband Engine architecture through a Boost based parallel communication library, in: Boost Conference, 2011.
  • 88J. Falcou, T. Saidani, L. Lacassagne, D. Etiemble.
    Programmation par squelettes algorithmiques pour le processeur Cell, in: SYMPA, 2008.
  • 89J. Falcou, J. Sérot, L. Pech, J.-T. Lapresté.
    Meta-programming applied to automatic SMP parallelization of linear algebra code, in: Euro-Par 2008–Parallel Processing, Springer Berlin Heidelberg, 2008, pp. 729–738.
  • 90G. Fursin, C. Dubach.
    Experience report: community-driven reviewing and validation of publications, in: Proceedings of the 1st Workshop on Reproducible Research Methodologies and New Publication Models in Computer Engineering (ACM SIGPLAN TRUST'14), ACM, 2014.
    http://dx.doi.org/10.1145/2618137.2618142
  • 91G. Fursin.
    Collective Tuning Initiative: automating and accelerating development and optimization of computing systems, in: Proceedings of the GCC Developers' Summit, June 2009.
  • 92G. Fursin, Y. Kashnikov, A. W. Memon, Z. Chamski, O. Temam, M. Namolaru, E. Yom-Tov, B. Mendelson, A. Zaks, E. Courtois, F. Bodin, P. Barnard, E. Ashton, E. Bonilla, J. Thomson, C. Williams, M. F. P. O'Boyle.
    Milepost GCC: Machine Learning Enabled Self-tuning Compiler, in: International Journal of Parallel Programming, 2011, vol. 39, pp. 296-327, 10.1007/s10766-010-0161-2.
  • 93G. Fursin, R. Miceli, A. Lokhmotov, M. Gerndt, M. Baboulin, A. D. Malony, Z. Chamski, D. Novillo, D. D. Vento.
    Collective Mind: towards practical and collaborative auto-tuning, in: Special issue on Automatic Performance Tuning for HPC Architectures, Scientific Programming Journal, 2014.
  • 94M. Gouiffès, F. Laguzet, L. Lacassagne.
    Color Connectedness Degree For Mean-Shift Tracking, in: IEEE International Conference on Pattern Recognition (ICPR), 2010.
  • 95M. Gouiffès, F. Laguzet, L. Lacassagne.
    Projection Histogram For Mean-Shift Tracking, in: IEEE International Conference on Image Processing (ICIP), 2010.
  • 96C. Grana, D. Borghesani, R. Cucchiara.
    Connected Component Labeling Techniques on Modern Architectures, in: ICIAP, IEEE, 2009, pp. 816-824.
  • 97L. Grigori, J. Demmel, H. Xiang.
    CALU: a communication optimal LU factorization algorithm, in: SIAM J. Matrix Anal. and Appl., 2011, vol. 32, pp. 1317-1350.
  • 98M. Gu, S. C. Eisenstat.
    Efficient Algorithms for Computing a Strong Rank-revealing QR Factorization, in: SIAM Journal on Scientific Computing, July 1996, vol. 17, no 4, pp. 848–869.
    http://dx.doi.org/10.1137/0917055
  • 99S. Guelton, J. Falcou, P. Brunet.
    Exploring the vectorization of python constructs using pythran and boost SIMD, in: Proceedings of the 2014 Workshop on Workshop on programming models for SIMD/Vector processing, ACM, 2014, pp. 79–86.
  • 100G. Guennebaud, B. Jacob.
    Eigen v3, 2010.
    http://eigen.tuxfamily.org
  • 101N. Halko, P. G. Martinsson, J. A. Tropp.
    Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions, in: SIAM Review, 2011, vol. 53, pp. 217–288.
  • 102C. Harris, M. Stephens.
    A combined corner and edge detector, in: 4th ALVEY Vision Conference, Editions Hermes, Paris, 1988.
  • 103L. He, Y. Chao, K. Suzuki.
    A run-based two-scan labeling algorithm, in: ICIAR, LNCS 4633, 2007, pp. 131-142.
  • 104R. M. Heiberger.
    Algorithm AS 127: Generation of Random Orthogonal Matrices, in: J. Roy. Statist. Soc. Ser. C (Applied Statistics), 1978, vol. 27, no 2, pp. 199-206.
  • 105N. J. Higham.
    J-Orthogonal Matrices: Properties and Generation, in: SIAM Rev., September 2003, vol. 45, no 3, pp. 504-519. [ DOI : 10.1137/S0036144502414930 ]
  • 106G. E. Hinton, S. Osindero.
    A fast learning algorithm for deep belief nets, in: Neural Computation, 2006, vol. 18.
  • 107S. Horowitz, T. Pavlidis.
    Picture segmentation by a tree traversal algorithm, in: Journal of the ACM, 1976, vol. 23, pp. 368-388.
  • 108T. Ikegami, T. Sakurai, U. Nagashima.
    A filter diagonalization for generalized eigenvalue problems based on the Sakurai-Sugiura projection method, in: Journal of Computational and Applied Mathematics, 2010, vol. 233, no 8, pp. 1927–1936.
  • 109Intel.
    Math Kernel Library.
    http://developer.intel.com/software/products/mkl/
  • 110V. Jimenez, I. Gelado, L. Vilanova, M. Gil, G. Fursin, N. Navarro.
    Predictive runtime code scheduling for heterogeneous architectures, in: Proceedings of the International Conference on High Performance Embedded Architectures & Compilers (HiPEAC 2009), January 2009.
  • 111C. S. Kenney, A. J. Laub.
    Small-sample statistical condition estimates for general matrix functions, in: SIAM J. Sci. Comput., 1994, vol. 15, pp. 36–61.
  • 112A. Khabou, J. Demmel, L. Grigori, M. Gu.
    LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version, in: SIAM Journal on Matrix Analysis and Applications, 2013, vol. 34, no 3, pp. 1401-1429.
    http://epubs.siam.org/doi/abs/10.1137/120863691
  • 113M. Kruse.
    Lattice QCD Optimization and Polytopic Representations of Distributed Memory, Université Paris-Sud 11, September, 26th 2014.
  • 114T. Kunlin, L. Lacassagne, A. Mérigot.
    A Fast image segmentation scheme, in: International Conference on Information and Communication Technologies, IEEE, 2004.
  • 115L. Lacassagne, D. Etiemble, A. Hassan Zahraee, A. Dominguez, P. Vezolle.
    High Level Transforms for SIMD and low-level computer vision algorithms, in: ACM Workshop on Programming Models for SIMD/Vector Processing (PPoPP), 2014, pp. 49-56.
  • 116L. Lacassagne, D. Etiemble, S. Kablia.
    16-bit Floating Point Instructions for embedded Multimedia Applications, in: CAMP: Computer Architecture and Machine Perception, IEEE, 2005.
  • 117L. Lacassagne, D. Etiemble.
    16-bit floating point operations for low-end and high-end embedded processors, in: ODES: Optimizations for DSP and Embedded Systems, IEEE/ACM, 2005.
  • 118L. Lacassagne, A. Manzanera, J. Denoulet, A. Mérigot.
    High Performance Motion Detection: Some trends toward new embedded architectures for vision systems, in: Journal of Real Time Image Processing, october 2008, pp. 127-148. [ DOI : 10.1007/s11554-008-0096-7 ]
  • 119L. Lacassagne, A. B. Zavidovique.
    Light Speed Labeling for RISC architectures, in: IEEE International Conference on Image Analysis and Processing (ICIP), 2009.
  • 120L. Lacassagne, B. Zavidovique.
    Light Speed Labeling: efficient connected component labeling on RISC architectures, in: Journal of Real-Time Image Processing, 2011, vol. 6, no 2, pp. 117-135.
  • 121F. Laguzet, M. Gouiffès, L. Lacassagne.
    Automatic color space switching for robust tracking, in: IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2011.
  • 122F. Laguzet, A. Romero, M. Gouiffès, L. Lacassagne, D. Etiemble.
    Color tracking with contextual switching: Real-time implementation on CPU, in: Journal of Real-Time Image Processing, 2013, pp. 1-18.
  • 123J. Lambert, L. Lacassagne, G. Rougeron, S. L. Berre, S. Chatillon.
    High Performance simulation of ultrasonic fields for Non Destructive Testing, in: International Symposium in Nuclear Application and Monte-Carlo, 2013.
  • 124J. Lambert, A. Pédron, G. Gens, F. Bimbard, L. Lacassagne, E. Iakovleva, S. L. Berre.
    Analysis of multicore CPU and GPU toward parallelization of Total Focusing Method ultrasound reconstruction, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2012.
  • 125J. Lambert, G. Rougeron, L. Lacassagne, S. Chatillon.
    A fast ultrasonic simulation tool based on massively parallel implementations, in: Review of Progress of Quantitative Nondestructive Evaluation, 2013.
  • 126Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, A. Ng.
    Building high-level features using large scale unsupervised learning, in: International Conference in Machine Learning, 2012.
  • 127W. Ledermann, C. Alexander, D. Ledermann.
    Random Orthogonal Matrix Simulation, in: Linear Algebra Appl., 2011, vol. 434, no 6, pp. 1444-1467. [ DOI : 10.1016/j.laa.2010.10.023 ]
  • 128A. Leite, C. Tadonki, C. Eisenbeis, A. de Melo.
    A Fine-grained Approach for Power Consumption Analysis and Prediction, in: Procedia Computer Science, 2014, vol. 29, pp. 2260–2271.
  • 129S. Liu, C. Eisenbeis, J.-L. Gaudiot.
    A theoretical framework for value prediction in parallel systems, in: Parallel Processing (ICPP), 2010 39th International Conference on, IEEE, 2010, pp. 11–20.
  • 130M. W. Mahoney.
    Randomized algorithms for matrices and data, in: Foundations and Trends in Machine Learning, 2011, vol. 3, no 2, pp. 123–224.
  • 131D. Menard, R. Serizel, R. Rocher, O. Sentieys.
    Accuracy Constraint Determination in Fixed-Point System Design, in: Journal on Embedded Systems (JES),, 2008, vol. 2008, pp. 1-12. [ DOI : 10.1155/2008/242584 ]
  • 132P. Monasse, F. Guichard.
    Fast computation of contrast-onvariant image representation, in: Transaction on, 2000, vol. 9,5, pp. 860-872.
  • 133S. Moufawad.
    Demmel type communication-avoiding generalized minimal residual method (CA-GMRES) on multicore hardwares: an application in QCD, American university of Beirut, Beirut, Libanon, june 2011, defended on 2010, June 10th.
  • 134M. Odersky.
    An Overview of the SCALA Programming Language, EPFL Lausanne, Switzerland, 2004, no IC/2004/64.
  • 135D. S. Parker.
    Random Butterfly Transformations with Applications in Computational Linear Algebra, Computer Science Department, UCLA, 1995, no CSD-950023.
  • 136D. Petcu.
    Consuming Resources and Services from Multiple Clouds, in: Journal of Grid Computing, 2014, pp. 1–25.
  • 137M. Pharr, W. R. Mark.
    ISPC: A SPMD Compiler for High-Performance CPU Programming, in: Innovative Parallel Computing (InPar), 2012.
  • 138S. Piskorski, L. Lacassagne, D. Etiemble.
    IPLG: un outil pour la fusion d'opérateurs en Traitement d'Images, in: SYMPA, 2009.
  • 139S. Piskorski, L. Lacassagne, M. Kieffer, D. Etiemble.
    Efficient floating point interval processing for embedded systems and applications, in: SCAN - International Symposium of Scientific computing, Computer Arithmetic and Validated Numerics, 2006, 2006 p.
  • 140S. Pop, A. Cohen, C. Bastoul, S. Girbal, G. A. Silber, N. Vasilache.
    GRAPHITE: Loop optimizations based on the polyhedral model for GCC, in: Proc. of the 4th GCC Developper's Summit, June 2006, pp. 179–198.
  • 141A. Pédron, L. Lacassagne, V. Barbillon, F. Bimbard, G. Rougeron, S. L. Berre.
    Performance analysis of an ultrasound reconstruction algorithm for non destructuve testing, in: IEEE International Conference on Parallel Computing (ParCo), 2011.
  • 142A. Pédron, L. Lacassagne, F. Bimbard, S. L. Berre.
    Parallelization of an ultrasound reconstruction algorithm for non destructive testing on multicore CPU and GPU, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2011.
  • 143A. Romero, M. Gouiffès, L. Lacassagne.
    Feature Points tracking adaptative to Saturation, in: IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2011.
  • 144A. Romero, M. Gouiffès, L. Lacassagne.
    Covariance Descriptor Multiple Object Tracking and Re-Identification with Colorspace Evaluation, in: IEEE ACCV - Workshop on Detection and Tracking in Challenging Environnements, 2012.
  • 145A. Romero, M. Gouiffès, L. Lacassagne.
    Enhanced Local Binary Covariance Matrices (ELBCM) for texture analysis and object tracking, in: ACM International Conference on Computer Vision / Computer Graphics Collaboration Techniques and Applications, 2013.
  • 146A. Romero, L. Lacassagne, M. Gouiffès.
    Real-time covariance tracking algorithm for embedded systems, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2013.
  • 147A. Rosenfeld, J. Platz.
    Sequential operator in digital pictures processing, in: Journal of ACM, 1966, vol. 13,4, pp. 471-494.
  • 148A. Rémy, M. Baboulin, M. Sosonkina, B. Rozoy.
    Locality optimization on a NUMA architecture for hybrid LU factorization, in: International Conference on Parallel Computing (PARCO 2013), Advances in Parallel Computing, IOS Press, 2014, vol. 25, pp. 153-162.
  • 149T. Saidani, J. Falcou, C. Tadonki, L. Lacassagne, D. Etiemble.
    Algorithmic Skeletons within an Embedded Domain Specific Language for the Cell Processor, in: Parallel Architectures and Compilation Techniques, PACT, 2009, pp. 67-76.
  • 150T. Saidani, L. Lacassagne, S. Bouaziz, T. M. Khan.
    Parallelization Strategies for the Points of Interests Algorithm on the Cell Processor, in: Lecture Notes in Computer Science, Springer, 2007, pp. 104-112. [ DOI : 10.1007/978-3-540-74742-0 ]
  • 151T. Saidani, S. Piskorski, L. Lacassagne, S. Bouaziz.
    Parallelization Schemes for Memory Optimization on the Cell Processor: A Case Study of Image Processing Algorithm, in: PACT-MEDEA, 2007, pp. 15-19.
  • 152C. Sanderson.
    Armadillo: An open source C++ linear algebra library for fast prototyping and computationally intensive experiments, in: Report Version, 2010, vol. 2.
  • 153J. Siek, L.-Q. Lee, A. Lumsdaine.
    Boost Random Number Library, June 2000.
    http://www.boost.org/libs/graph/
  • 154D. Spinellis.
    Notable design patterns for domain-specific languages, in: Journal of Systems and Software, 2001, vol. 56, no 1, pp. 91 - 99. [ DOI : 10.1016/S0164-1212(00)00089-3 ]
    http://www.sciencedirect.com/science/article/pii/S0164121200000893
  • 155G. W. Stewart.
    The Efficient Generation of Random Orthogonal Matrices With an Application to Condition Estimators, in: SIAM J. Numer. Anal., 1980, vol. 17, no 3, pp. 403-409.
  • 156A. K. Sujeeth, A. Gibbons, K. J. Brown, H. Lee, T. Rompf, M. Odersky, K. Olukotun.
    Forge: Generating a High Performance DSL Implementation from a Declarative Specification, in: 12th International Conference on Generative Programming: Concepts and Experiences, 2013.
  • 157A. K. Sujeeth, T. Rompf, K. J. Brown, H. Lee, H. Chafi, V. Popic, M. Wu, A. Prokopec, V. Jovanovic, M. Odersky, K. Olukotun.
    Composition and Reuse with Compiled Domain-Specific Languages, in: ECOOP'13: European Conference on Object-Oriented Programming, 2013.
  • 158V. Sundriyal, M. Sosonkina, A. Gaenko, Z. Zhang.
    Energy saving strategies for parallel applications with point-to-point communication phases, in: Journal of Parallel and Distributed Computing, 2013. [ DOI : 10.1016/j.jpdc.2013.03.006 ]
  • 159V. Sundriyal, M. Sosonkina, Z. Zhang.
    Automatic runtime frequency-scaling system for energy savings in parallel applications, in: The Journal of Supercomputing, 2014, vol. 68, no 2, pp. 777–797.
  • 160K. Suzuki, I. Horiba, N. Sugie.
    Linear-time connected component labeling based on sequential local operations, in: Computer Vision and Image Understanding, january 2003, vol. 89, no 1, pp. 1-23. [ DOI : 10.1016/S1077-3142(02)00030-9 ]
  • 161H. Tabia, M. Gouiffès, L. Lacassagne.
    Motion histogram quantification for human action recognition, in: IEEE International Conference on Pattern Recognition (ICPR), 2012.
  • 162H. Tabia, M. Gouiffès, L. Lacassagne.
    Motion modeling for abnormal event detection in crowd scenes, in: IEEE International Conference on Pattern Recognition (ISCIVC), 2012.
  • 163C. Tadonki, L. Lacassagne, T. Saïdani, J. Falcou, K. Hamidouche.
    The Harris algorithm revisited on the Cell processor, in: International Workshop on Highly-Efficient Accelerators and Reconfigurable Technologies (HEART), 2010.
  • 164S. Tomov, J. Dongarra, M. Baboulin.
    Towards dense linear algebra for hybrid GPU accelerated manycore systems, in: Parallel Computing, 2010, vol. 36, no 5&6, pp. 232–240.
  • 165University of Tennessee.
    PLASMA Users' Guide, Parallel Linear Algebra Software for Multicore Architectures, Version 2.3, 2010.
  • 166T. L. Veldhuizen.
    Active Libraries and Universal Languages, Indiana University Computer Science, May 2004.
    http://www.ubietylab.net/ubigraph/content/Papers/pdf/VeldhuizenThesis.pdf
  • 167H. Wang, H. Andrade, B. Gedik, K.-L. Wu.
    A Code Generation Approach for Auto-Vectorization in the Spade Compiler, in: LCPC'09, 2009, pp. 383-390.
  • 168Y. Wang, M. Baboulin, J. Dongarra, J. Falcou, Y. Fraigneau, O. L. Maître.
    A parallel solver for incompressible fluid flows, in: International Conference on Computational Science (ICCS 2013), Procedia Computer Science, Elsevier, 2013, vol. 18, pp. 439–448.
  • 169Y. Wang, M. Baboulin, K. Rupp, O. Le Maître, Y. Fraigneau.
    Solving 3D Incompressible Navier-Stokes Equations on Hybrid CPU/GPU Systems, in: Proceedings of the High Performance Computing Symposium, San Diego, CA, USA, HPC '14, Society for Computer Simulation International, 2014, pp. 12:1–12:8.
    http://dl.acm.org/citation.cfm?id=2663510.2663522
  • 170H. Ye, L. Lacassagne, D. Etiemble, L. Cabaret, J. Falcou, O. Florent.
    Impact of High Level Transforms on High Level Synthesis for motion detection algorithm, in: IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), 2012, pp. 1-8.
  • 171H. Ye, L. Lacassagne, J. Falcou, D. Etiemble, L. Cabaret, O. Florent.
    High Level Transforms to reduce energy consumption of signal and image processing operators, in: IEEE International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), 2013, pp. 247-254.