Interval arithmetic using round-to-nearest mode (part 1 of n)

Daniel Pfeffer

5.00/5 (10 votes)

Oct 27, 2015

CPOL

6 min read

19624

165

A faster method for performing interval arithmetic

Download BasicInterval.zip - 36.6 KB

Abstract

Interval arithmetic has traditionally been hampered by low performance of programs written to take advantage of it. I suggest a method based on IEEE Std 754-2008, which has the potential of being much faster.

Introduction

Interval arithmetic is one attempt to answer the question of computation accuracy - given a set of floating-point operations, how accurate are the results?

Interval arithmetic replaces a single floating-point number with a range of numbers which we know contains the value. For example, measuring a 3m x 4m room with a metric tape measure might give results accurate to within 0.5cm, which would be represented as [2.995, 3.005] x [3.995, 4.005]. Any operations on intervals are performed in a manner which preserves this property. For example,

add(a, b) == add([a_inf, a_sup], [b_inf, b_sup]) == [a_inf + b_inf, a_sup + b_sup].

A basic implementation

The basic idea of interval arithmetic is to perform each operation twice – once in “round to -infinity” (RD) mode, and once in “round to +infinity” (RU) mode. While this is guaranteed to work, it has the following disadvantages:

Each operation incurs three changes of the rounding mode – to RU, to RD, and back to the default (typically “round to nearest” or RN).
On most modern processors, each of these switches is extremely expensive. For example, on late-model Intel processors reading the SSE rounding mode is expensive, while setting it is a serializing operation!

The RU approach

An approach that works nicely for some operations is to take advantage of the fact that RD( a ) == -RU( -a ). In order to do this, we store the interval as [-a, b] rather than [a, b]. It is obvious that addition/subtraction may be performed directly with only RU, while multiplication and division may be performed with some additional sign-bit flipping.

This has the advantages of:

Requiring only two rounding mode switches per operation
Enabling the use of SIMD instructions for addition/subtraction

However, it cannot be universally used. For example, sqrt(-a) == NaN, not -sqrt(a).

The “round to nearest” approach

The problem of interval arithmetic boils down to an issue of selection. Given an infinitely-precise result A for one of the bounds, we need to round down if we are calculating the lower bound, or round up if we are calculating the upper bound of the interval.

We know that RN( A ) is equal to either RD( A ) or RU( A ), but which?

It turns out that this information can be made available. Furthermore, use of “round to nearest” mode is required in order to make this work.

Exact transforms

Given two floating-point numbers a, b, it is possible to calculate exact transforms as follows:

TwoSum() - (a, b) are transformed into (s,e) such that s = RN(a + b) and e = a + b - s

s = RN(a + b)
a' = RN(s - b)
b' = RN(s - a')
da = RN(a - a')
db = RN(b - b')
e = RN(da + db)

The TwoSum() algorithm requires 6 floating-point operations.

TwoProd - (a, b) are transformed into (p, e) such that p = RN(a * b) and e = a * b - p

Split(x, n)

Require: C = 2^n + 1, where n = (bits in mantissa + 1)/2

t1 = RN(C * x)
t2 =  RN(x - t1)
xh = RN(t1 + t2)
xl = RN(x - xh)

TwoProd(x, y)

(xh, xl) = Split(x, s)
(yh, yl) = Split(y, s)
p = RN(x * y)
t1 = RN(-p + RN(xh * yh))
t2  = RN(t1 + RN(xh * yl))
t3 = RN(t2 + RN(xl * yh))
e = RN(t3 + RN(xl * yl))

The TwoProd() algorithm requires 17 floating-point operations

If e is negative, then RN( a + b ) == RU( a + b ), and RD( a + b ) == prev( s ). If e is positive, then RN( a + b ) == RD( a + b ), and RU( a + b ) == succ( s ). Similar arguments apply to TwoProd().

Basic Functions

The TwoSum() and TwoProd() procedures are sufficient for performing interval addition/subtraction and multiplication. If an fma() operation is present (it is mandatory under IEEE 754-2008), it is possible to calculate the following functions:

TwoProdFMA(a, b)

p = RN(a * b)
e = fma(a, b, -p)

The TwoProdFMA() algorithm performs a TwoProd() with 2 floating-point operations, instead of 17.

Division() - q = RN(a / b); r = -fma(q, b, -a)
Reciprocal() - q = RN(1.0 / b); r = -fma(q, b, -1.0)
Sqrt() - q = RN( sqrt(a) ); r = -fma(q, q, -a)

If r is negative, then the result = RN( A ) == RU( A ) and RD( A ) == prev( result ). If r is positive, then the result = RN( A ) == RD( A ) and RU( A ) == succ( result ).

Note that:

prev(x) and succ(x) are required by IEEE Std 754, and are implemented in C++ 2011 as std::nextafter()
If an fma() operation is not present in hardware, it may be simulated in software by using the TwoProd() and TwoSum() operations. This requires a total of 35 floating-point operations – 17 for TwoProd(), and 18 for the 3 TwoSum() operations required.

Testing

The following C++ implementations of interval arithmetic were tested:

“Basic” interval arithmetic – sets the rounding mode to RU, RD, and default as necessary
“RU” interval arithmetic – stores [a, b] as [-a, b], eliminating the requirement to set the rounding mode to RD in many cases.
“round to nearest” interval arithmetic – implements the algorithms defined in this paper.

In all cases, SIMD instructions were used if available and useful.

The test involves 100,000,000 operations with random operands, as follows:

Addition
Multiplication
division (0.0 not included in the denominator’s interval)
square sqr(x) = (x*x)
square root
hypot(x, y) = sqrt(sqr(x), sqr(y)).

This hypot() function was chosen because it includes addition (which may be optimized by the RU method), squaring (which may also optimized by the RU method), and square root (which may be optimized only by the “round to nearest” method). It is also easily calculated in straightforward C++ code.

The code was compiled using Visual Studio 2013 in x64 Release mode, using the /fp:strict compiler option. Note that use of this option is essential if accurate results are to be obtained.

The results (in nanoseconds per operation) on an Intel Core i7 2670QM processor are as follows:

Speed Test Results
	Basic	RU	Round to nearest
Addition	104	79	25
Multiplication	230	194	61
Division	280	217	94
Square	238	185	42
Square Root	139	83	44
hypot()	804	702	173

Notes:

The times are given after subtracting the loop overhead and the time required to generate random inputs.
Unlike the latest Intel processors, this 2670QM processor does not contain fma() in hardware, and must simulate it in software. This significantly reduces the advantage of the “round to nearest” method for multiplication, division, square, and square root operations.

Summary

An interval arithmetic library that uses round-to-nearest mode is practical, and on current popular hardware – significantly (3+ times) faster than the directed rounding approach.

Future directions

An interval math package that will include most of the forward functions required by the IEEE-1788 Standard for interval arithmetic is currently under development.

References

IEEE Std 754-2008
IEEE Std 1788-2015
Handbook of Floating-Point Arithmetic, Jean-Michel Muller et al., Birkhäuser
Interval Arithmetic: from Principles to Implementation, T. Hickey, Q. Ju, M.H. van Emden
Interval Operations in Rounding to Nearest, S.M. Rump, P. Zimmerman, S. Boldo, G. Melquiond

History

Version 1.0