Section: New Results
Floating-point arithmetic
Improved error bounds for complex floating-point arithmetic with a fused-multiply add
Assuming that a fused multiply-add (FMA) instruction is available, C.-P. Jeannerod, N. Louvet, and J.-M. Muller [22] obtained sharp error bounds for various alternatives to Kahan's FMA-based algorithm for 2 x 2 determinants (which they had analyzed in [5] ). They showed how to combine such variants with Kahan's original scheme in order to derive componentwise-accurate algorithms for complex floating-point division. Finally, they established sharp or reasonably sharp error bounds for each of these division algorithms.
C.-P. Jeannerod, P. Kornerup (U. of Southern Denmark), N. Louvet, and J.-M. Muller [36]
studied the impact of the FMA on the normwise relative accuracy of complex floating-point multiplication.
They showed that the classical normwise relative error bound
J.-M. Muller analyzed in [41] another 2 x 2 determinant algorithm, due
to Cornea, Harrison, and Tang, and showed that for radix 2 it admits a sharp relative error bound of
the form
Improved error bounds for numerical linear algebra
C.-P. Jeannerod and S. M. Rump (Hamburg University of Technology) [7]
showed that when evaluating sums of
In [45] they extended these results to LU and Cholesky factorizations as well as to
triangular linear system solving by showing that the constants
On Ziv's rounding test
F. de Dinechin, J.-M. Muller and S. Torres studied with C. Lauter (Univ. Paris 6) the rounding test introduced by Ziv in its libultim software [4] .
This test determines if an approximation to the value
Various issues related to double roundings
Double rounding is a phenomenon that may occur when different floating-point precisions are available on the same system. Although double rounding is, in general, innocuous, it may change the behavior of some useful floating-point algorithms. G. Melquiond (Toccata team), E. Martin-Dorel (then in the Marelle team), and J.-M. Muller analyzed in [9] the potential influence of double rounding on the Fast2Sum and 2Sum algorithms, on some summation algorithms, and Veltkamp’s splitting. When performing divisions using Newton-Raphson (or similar) iterations on a processor with a floating-point fused multiply-add instruction, one must sometimes scale the iterations, to avoid over/underflow and/or loss of accuracy. This may lead to double-roundings, resulting in output values that may not be correctly rounded when the quotient falls in the subnormal range. J.-M. Muller showed in [13] how to avoid this problem.
Comparison between binary and decimal floating-point numbers
The IEEE 754-2008 standard for floating-point arithmetic arithmetic specifies binary as well as decimal formats. N. Brisebarre, C. Lauter (Univ. Paris 6), M. Mezzarobba, and J.-M. Muller introduced in [17] an algorithm that allows one to quickly compare a binary64 floating-point number and a decimal64 floating-point number, assuming the “binary encoding” of the decimal formats specified by the IEEE-754 standard is used. It is a two-step algorithm: a first pass, based on the exponents only, makes it possible to quickly eliminate most cases; then, when the first pass does not suffice, a more accurate second pass is required. They provide an implementation of several variants of their algorithm, and compare them.
Conversions between binary and decimal floating-point numbers
Conversion between binary and decimal floating-point representations is ubiquitous. Floating-point radix conversion means converting both the exponent and the mantissa. O. Kupriianova and C. Lauter (Univ. Paris 6) and J.-M. Muller developed in [38] an atomic operation for floating-point radix conversion with simple straight-line algorithm, suitable for hardware design. Exponent conversion is performed with a small multiplication and a lookup table. It yields the correct result without error. Mantissa conversion uses a few multiplications and a small lookup table that is shared amongst all types of conversions. The accuracy changes by adjusting the computing precision.
Table-maker's dilemma
Computing hardest-to-round cases of elementary functions is a key
issue when one wants to develop an efficient and reliable
implementation of such a function. The algorithms developed until now
required a large amount of computation and produced a simple yes/no
answer.
In [40] , G. Hanrot developed
together with E. Martin-Dorel (Toccata team), M. Mayero (IUT Villetaneuse, LIPN), and L. Théry (Marelle team)
a certificate-based approach of the SLZ algorithm
where the execution produces certificates which can then be validated using Coq. This allows one to
validate a posteriori the fact that for a given function, a given input
precision