ROMA - 2015
New Software and Platforms
New Results
New Software and Platforms
New Results

Section: New Results

Combining backward and forward recovery to cope with silent errors in iterative solvers

Participants : Massimiliano Fasi [Univ Manchester, UK] , Julien Langou [Univ. Colorado Denver, USA] , Yves Robert, Bora Uçar.

We proposed combining checkpointing and verification for coping with silent errors in iterative solvers. We used algorithm based fault tolerance for error detection and error correction, allowing a forward recovery (and no rollback nor re-execution) when a single error is detected. We introduced an abstract performance model to compute the performance of all schemes, and we instantiated it using the Conjugate Gradient (CG) algorithm. Finally, we validate our new approach through a set of simulations both in normal and preconditioned CG [48] , [25] , [47] .