EN FR
EN FR


Section: New Results

High performance solvers for large linear algebra problems

Blocking strategy optimizations for sparse direct linear solver on heterogeneous architectures

The preprocessing steps of sparse direct solvers, ordering and block-symbolic factorization, are two major steps that lead to a reduced amount of computation and memory and to a better task granularity to reach a good level of performance when using BLAS kernels. With the advent of GPUs, the granularity of the block computation became more important than ever. In this paper, we present a reordering strategy that increases this block granularity. This strategy relies on the block-symbolic factorization to refine the ordering produced by tools such as METIS or Scotch, but it does not impact the number of operations required to solve the problem. We integrate this algorithm in the PaStiX solver and show an important reduction of the number of off-diagonal blocks on a large spectrum of matrices. This improvement leads to an increase in efficiency of up to 20% on GPUs.

These contributions have been published in SIAM Journal on Matrix Analysis and Applications [22].

Sparse supernodal solver using block low-rank compression

In the context of FastLA associate team, during the last 4 years, we are collaborating with Eric Darve, professor in the Institute for Computational and Mathematical Engineering and the Mechanical Engineering Department at Stanford, on the design of a new efficient sparse direct solvers. We have been working on applying fast direct solvers for dense matrices to the solution of sparse direct systems. We observed that the extend-add operation (during the sparse factorization) is the most time-consuming step. We have therefore developed a series of algorithms to reduce this computational cost.

We presented two approaches using a Block Low-Rank (BLR) compression technique to reduce the memory footprint and/or the time-to-solution of the sparse supernodal solver PaStiX. This flat, non-hierarchical, compression method allows to take advantage of the low-rank property of the blocks appearing during the factorization of sparse linear systems, which come from the discretization of partial differential equations. The first approach, called Minimal Memory, illustrates the maximum memory gain that can be obtained with the BLR compression method, while the second approach, called Just-In-Time, mainly focuses on reducing the computational complexity and thus the time-to-solution. Singular Value Decomposition (SVD) and Rank-Revealing QR (RRQR), as compression kernels, are both compared in terms of factorization time, memory consumption, as well as numerical properties. Experiments on a single node with 24 threads and 128 GB of memory are performed to evaluate the potential of both strategies. On a set of matrices from real-life problems, we demonstrate a memory footprint reduction of up to 4 times using the Minimal Memory strategy and a computational time speedup of up to 3.5 times with the Just-In-Time strategy. Then, we study the impact of configuration parameters of the BLR solver that allowed us to solve a 3D laplacian of 36 million unknowns a single node, while the full-rank solver stopped at 8 million due to memory limitation.

These contributions have been presented at the PDSEC workshop of IPDPS'17 conference [30] and an extended version has been submitted in Journal of Computational Science [48].

Towards a hierachical symbol factorization for data sparse direct solvers

Hierarchical algorithms based on low-rank compression techniques have led to fully re-design the methods of solving dense linear systems at the dawn of the twenty-first century, significantly reducing the computational costs. However, their application to the treatment of sparse linear systems remains today a major challenge to which both the community of hierarchical matrices and that of the sparse matrices are tackling. For this purpose, a first class of approach has been developed by the community of hierarchical matrices to exploit the sparse matrix structure. If the strong point of these methods is that the resulting algorithm remains hierarchical, these do not manage exploit some zeros as naturally do sparse solvers. In contrast, the fact that a sparse factorization can be seen as a sequence of smaller, dense operations, the community of hollow treasure has explored this property to introduce hierarchical techniques within these elementary operations. However, the resulting algorithm loses the fundamental property of hierarchical algorithms, since the compression hierarchy is only local. As part of this doctorate, we introduce a new algorithm, performing a sparse hierarchical symbolic factorization that allows to exploit precisely the sparse structure the matrix and its factors while preserving a global hierarchical structure for to ensure effective compression. We have shown experimentally that this new approach allows us to obtain at the same time a reduced number of operations (because of its hierarchical character) and a number of non-zero elements as small as a hollow method (through the use of a symbolic factorization).

This work is developped in the A. Falco PhD thesis, it led to a publication in a national conference [31] and will give rise to a submission in an international journal in 2018