Section: New Results
Floating-point Arithmetic
On the maximum relative error when computing integer powers by iterated multiplications in floating-point arithmetic
We improve the usual relative error bound for the computation of through iterated multiplications by in binary floating-point arithmetic. The obtained error bound is only slightly better than the usual one, but it is simpler. We also discuss the more general problem of computing the product of terms. [5]
Formally verified certificate checkers for hardest-to-round computation
In order to derive efficient and robust floating-point implementations of a given function , it is crucial to compute its hardest-to-round points, i.e. the floating-point numbers such that is closest to the midpoint of two consecutive floating-point numbers. Depending on the floating-point format one is aiming at, this can be highly computationally intensive. In this paper, we show how certificates based on Hensel's lemma can be added to an algorithm using lattice basis reduction so that the result of a computation can be formally checked in the Coq proof assistant. [7]
On the error of Computing using Cornea, Harrison and Tang's method
In their book, Scientific Computing on the Itanium, Cornea et al. (2002) introduce an accurate algorithm for evaluating expressions of the form in binary floating-point arithmetic, assuming an FMA instruction is available. They show that if is the precision of the floating-point format and if , the relative error of the result is of order . We improve their proof to show that the relative error is bounded by . Furthermore, by building an example for which the relative error is asymptotically (as or, equivalently, as ) equivalent to , we show that our error bound is asymptotically optimal. [8]
Improved error bounds for floating-point products and Horner’s scheme
Let denote the relative rounding error of some floating-point format. Recently it has been shown that for a number of standard Wilkinson-type bounds the typical factors can be improved into , and that the bounds are valid without restriction on . Problems include summation, dot products and thus matrix multiplication, residual bounds for LU- and Cholesky-decomposition, and triangular system solving by substitution. In this note we show a similar result for the product of real and/or floating-point numbers , for computation in any order, and for any base . The derived error bounds are valid under a mandatory restriction of . Moreover, we prove a similar bound for Horner's polynomial evaluation scheme. [9]
Comparison between binary and decimal floating-point numbers
In collaboration with Christoph Lauter and Marc Mezzarobba (LIP6 laboratory, Paris), Nicolas Brisebarre and Jean-Michel Muller introduce an algorithm to compare a binary floating-point (FP) number and a decimal FP number, assuming the “binary encoding” of the decimal formats is used, and with a special emphasis on the basic interchange formats specified by the IEEE 754-2008 standard for FP arithmetic. It is a two-step algorithm: a first pass, based on the exponents only, quickly eliminates most cases, then, when the first pass does not suffice, a more accurate second pass is performed. They provide an implementation of several variants of our algorithm, and compare them [26] .