Section: Partnerships and Cooperations

International Initiatives

Inria International Labs

In the framework of the Joint Laboratory for Extreme Scale Computing (JLESC) within a collaboration between Inria and Argonne national laboratory an new joint project studies how lossy compression can be monitored by Krylov solvers to significantly reduce the memory footprint when solving very-large sparse linear systems. The resulting solvers will alleviate the I/O penalty paid when running large calculations using either check-point mechanisms to address resiliency or out-of-core techniques to solve huge problems. For the solution of large linear systems of the form Ax=b where An×n, x and bn, Krylov subspace methods are among the most commonly used iterative solvers; they are further extended to cope with extreme scale computing as one can integrate features such as communication hidden in their variants referred to as pipelined Krylov solvers [7]. On the one hand, the Krylov subspace methods such as GMRES allow some inexactness when computing the orthonormal search basis; more precisely theoretical results  [24], [25] show that the matrix-vector product involved in the construction of the new search directions can be more and more inexact when the convergence towards the solution takes place. An inexact scheme of that form writes into a generalized Arnoldi equality

[ ( A + E 1 ) v 1 , , ( A + E k ) v k ] = [ v 1 , , v k , v k + 1 ] H ¯ k , (1)

where the theory gives a bound on Ek that depends on the residual norm b-Axk at step k, where xk is the k th iterate. Such a result has a major interest in applications where the matrix is not formed explicitly, e.g., in the fast mutipole (FMM) or domain decomposition (DDM) methods context, where this allows one to drastically reduce the computational effort.

One the other hand, novel agnostic lossy data compression techniques are studied to reduce the I/O footprint of large applications that have to store snapshots of the calculation, for a posteriori analysis, because they implement out-of-core calculation or for checkpointing data for resilience. Those lossy compression techniques allow for precise control on the error introduced by the compressor to ensure that the stored data are still meaningful for the considered application. In the context of the Krylov method, the basis Vk+1=[v1,,vk,vk+1] represents the most demanding data in terms of memory footprint, so that, in a fault-tolerant or out-of-core context, storing it in a lossy form would allow for a tremendous saving.

The objective of this work, developped within the post-doc of N. Schenkels, is to dynamically control the compression error of Vk+1 to comply with the inexact Krylov theory. The main difficulty is to translate the known theoretical inexactness on Ek into a suited lossy compression mechanism for vk with loss δvk .