Section: New Results
Floating-point and Validated Numerics
Optimal bounds on relative errors of floating-point operations
Rounding error analyses of numerical algorithms are most often carried out via repeated applications of the so-called standard models of floating-point arithmetic. Given a round-to-nearest function and barring underflow and overflow, such models bound the relative errors and by the unit roundoff . In [10] we investigate the possibility and the usefulness of refining these bounds, both in the case of an arbitrary real and in the case where is the exact result of an arithmetic operation on some floating-point numbers. We show that and are optimally bounded by and , respectively, when is real or, under mild assumptions on the base and the precision, when or with two floating-point numbers. We prove that while this remains true for division in base , smaller, attainable bounds can be derived for both division in base and square root. This set of optimal bounds is then applied to the rounding error analysis of various numerical algorithms: in all cases, we obtain significantly shorter proofs of the best-known error bounds for such algorithms, and/or improvements on these bounds themselves.
On various ways to split a floating-point number
In [32] we review several ways to split a floating-point number, that is, to decompose it into the exact sum of two floating-point numbers of smaller precision. All the methods considered here involve only a few IEEE floating-point operations, with rounding to nearest and including possibly the fused multiply-add (FMA). Applications range from the implementation of integer functions such as round and floor to the computation of suitable scaling factors aimed, for example, at avoiding spurious underflows and overflows when implementing functions such as the hypotenuse.
Algorithms for triple-word arithmetic
Triple-word arithmetic consists in representing high-precision numbers as the unevaluated sum of three floating-point numbers. In [45], we introduce and analyze various algorithms for manipulating triple-word numbers. Our new algorithms are faster than what one would obtain by just using the usual floating-point expansion algorithms in the special case of expansions of length 3, for a comparable accuracy.
Error analysis of some operations involved in the Fast Fourier Transform
In [44], we are interested in obtaining error bounds for the classical FFT algorithm in floating-point arithmetic, for the 2-norm as well as for the infinity norm. For that purpose we also give some results on the relative error of the complex multiplication by a root of unity, and on the largest value that can take the real or imaginary part of one term of the FFT of a vector , assuming that all terms of have real and imaginary parts less than some value .