15,031,495 members
Articles / General Programming / Algorithms
Article
Posted 27 Oct 2015

12K views
6 bookmarked

# Interval arithmetic using round-to-nearest mode (part 1 of n)

Rate me:
A faster method for performing interval arithmetic

## Abstract

Interval arithmetic has traditionally been hampered by low performance of programs written to take advantage of it. I suggest a method based on IEEE Std 754-2008, which has the potential of being much faster.

## Introduction

Interval arithmetic is one attempt to answer the question of computation accuracy - given a set of floating-point operations, how accurate are the results?

Interval arithmetic replaces a single floating-point number with a range of numbers which we know contains the value. For example, measuring a 3m x 4m room with a metric tape measure might give results accurate to within 0.5cm, which would be represented as [2.995, 3.005] x [3.995, 4.005]. Any operations on intervals are performed in a manner which preserves this property. For example,

`add(a, b) == add([a_inf, a_sup], [b_inf, b_sup]) == [a_inf + b_inf, a_sup + b_sup].`

## A basic implementation

The basic idea of interval arithmetic is to perform each operation twice – once in “round to -infinity” (RD) mode, and once in “round to +infinity” (RU) mode.  While this is guaranteed to work, it has the following disadvantages:

• Each operation incurs three changes of the rounding mode – to RU, to RD, and back to the default (typically “round to nearest” or RN).

• On most modern processors, each of these switches is extremely expensive. For example, on late-model Intel processors reading the SSE rounding mode is expensive, while setting it is a serializing operation!

## The RU approach

An approach that works nicely for some operations is to take advantage of the fact that RD( a ) == -RU( -a ). In order to do this, we store the interval as [-a, b] rather than [a, b]. It is obvious that addition/subtraction may be performed directly with only RU, while multiplication and division may be performed with some additional sign-bit flipping.

• Requiring only two rounding mode switches per operation

• Enabling the use of SIMD instructions for addition/subtraction

However, it cannot be universally used. For example, sqrt(-a) == NaN, not -sqrt(a).

## The “round to nearest” approach

The problem of interval arithmetic boils down to an issue of selection. Given an infinitely-precise result A for one of the bounds, we need to round down if we are calculating the lower bound, or round up if we are calculating the upper bound of the interval.

We know that RN( A ) is equal to either RD( A ) or RU( A ), but which?

It turns out that this information can be made available. Furthermore, use of “round to nearest” mode is required in order to make this work.

### Exact transforms

Given two floating-point numbers a, b, it is possible to calculate exact transforms as follows:

• TwoSum() - (a, b) are transformed into (s,e) such that s = RN(a + b) and e = a + b - s
```s = RN(a + b)
a' = RN(s - b)
b' = RN(s - a')
da = RN(a - a')
db = RN(b - b')
e = RN(da + db)```

The TwoSum() algorithm requires 6 floating-point operations.

• TwoProd - (a, b) are transformed into (p, e) such that p = RN(a * b) and e = a * b - p

Split(x, n)

```Require: C = 2^n + 1, where n = (bits in mantissa + 1)/2

t1 = RN(C * x)
t2 =  RN(x - t1)
xh = RN(t1 + t2)
xl = RN(x - xh)```

TwoProd(x, y)

```(xh, xl) = Split(x, s)
(yh, yl) = Split(y, s)
p = RN(x * y)
t1 = RN(-p + RN(xh * yh))
t2  = RN(t1 + RN(xh * yl))
t3 = RN(t2 + RN(xl * yh))
e = RN(t3 + RN(xl * yl))```

The TwoProd() algorithm requires 17 floating-point operations

If e is negative, then RN( a + b ) == RU( a + b ), and RD( a + b ) == prev( s ). If e is positive, then RN( a + b ) == RD( a + b ), and RU( a + b ) == succ( s ). Similar arguments apply to TwoProd().

### Basic Functions

The TwoSum() and TwoProd() procedures are sufficient for performing interval addition/subtraction and multiplication. If an fma() operation is present (it is mandatory under IEEE 754-2008), it is possible to calculate the following functions:

• TwoProdFMA(a, b)
```p = RN(a * b)
e = fma(a, b, -p)```

The TwoProdFMA() algorithm performs a TwoProd() with 2 floating-point operations, instead of 17.

• Division() - q = RN(a / b); r = -fma(q, b, -a)
• Reciprocal() - q = RN(1.0 / b); r = -fma(q, b, -1.0)
• Sqrt() - q = RN( sqrt(a) ); r = -fma(q, q, -a)

If r is negative, then the result = RN( A ) == RU( A ) and RD( A ) == prev( result ). If r is positive, then the result = RN( A ) == RD( A ) and RU( A ) == succ( result ).

Note that:

• prev(x) and succ(x) are required by IEEE Std 754, and are implemented in C++ 2011 as std::nextafter()
• If an fma() operation is not present in hardware, it may be simulated in software by using the TwoProd() and TwoSum() operations. This requires a total of 35 floating-point operations – 17 for TwoProd(), and 18 for the 3 TwoSum() operations required.

## Testing

The following C++ implementations of interval arithmetic were tested:

• “Basic” interval arithmetic – sets the rounding mode to RU, RD, and default as necessary

• “RU” interval arithmetic – stores [a, b] as [-a, b], eliminating the requirement to set the rounding mode to RD in many cases.

• “round to nearest” interval arithmetic – implements the algorithms defined in this paper.

In all cases, SIMD instructions were used if available and useful.

The test involves 100,000,000 operations with random operands, as follows:

2. Multiplication
3. division (0.0 not included in the denominator’s interval)
4. square sqr(x) = (x*x)
5. square root
6. hypot(x, y) = sqrt(sqr(x), sqr(y)).

This hypot() function was chosen because it includes addition (which may be optimized by the RU method), squaring (which may also optimized by the RU method), and square root (which may be optimized only by the “round to nearest” method). It is also easily calculated in straightforward C++ code.

The code was compiled using Visual Studio 2013 in x64 Release mode, using the /fp:strict compiler option. Note that use of this option is essential if accurate results are to be obtained.

The results (in nanoseconds per operation) on an Intel Core i7 2670QM processor are as follows:

Speed Test Results

Basic

RU

Round to nearest

104

79

25

Multiplication

230

194

61

Division

280

217

94

Square

238

185

42

Square Root

139

83

44

hypot()

804

702

173

Notes:

• The times are given after subtracting the loop overhead and the time required to generate random inputs.

• Unlike the latest Intel processors, this 2670QM processor does not contain fma() in hardware, and must simulate it in software. This significantly reduces the advantage of the “round to nearest” method for multiplication, division, square, and square root operations.

## Summary

An interval arithmetic library that uses round-to-nearest mode is practical, and on current popular hardware – significantly (3+ times) faster than the directed rounding approach.

## Future directions

An interval math package that will include most of the forward functions required by the IEEE-1788 Standard for interval arithmetic is currently under development.

## References

1. IEEE Std 754-2008

2. IEEE Std 1788-2015

3. Handbook of Floating-Point Arithmetic, Jean-Michel Muller et al., Birkhäuser

4. Interval Arithmetic: from Principles to Implementation, T. Hickey, Q. Ju, M.H. van Emden

5. Interval Operations in Rounding to Nearest, S.M. Rump, P. Zimmerman, S. Boldo, G. Melquiond

Version 1.0

## Share

 Software Developer (Senior) Israel
B.Sc - Physics & Computer Science, Bar-Ilan University, Ramat Gan, Israel

Over thirty years experience as a programmer and software engineer.

Currently employed at Western Digital*.

* Any opinions expressed are strictly my personal opinions, and do not represent my employer's position.

 First Prev Next
 Problem with fma and underflow Member 1209640029-Oct-15 13:46 Member 12096400 29-Oct-15 13:46
 Re: Problem with fma and underflow Daniel Pfeffer29-Oct-15 20:49 Daniel Pfeffer 29-Oct-15 20:49
 Re: Problem with fma and underflow Member 1209640029-Oct-15 21:21 Member 12096400 29-Oct-15 21:21
 I could produce the following net speedups with your algorithms in the interval package for GNU Octave on my machine: sum 4x, prod 4x, sqrt 2.5x, sqr 1.5x The situation is a little bit more complicated there (interpreted language, use of the MPFR library, etc.) For example, this testcase (line 85) fails to produce a valid result, because the resulting interval does not contain the exact solution. It is nice to have speedups, but in interval arithmetics it is not affordable to loose correctness. Probably this can be solved with a fma function with rounding away from zero. However, if you did this with rounding-mode switches all benefits from round-to-nearest would be gone.
 Re: Problem with fma and underflow Daniel Pfeffer29-Oct-15 22:26 Daniel Pfeffer 29-Oct-15 22:26
 Re: Problem with fma and underflow Daniel Pfeffer31-Oct-15 5:51 Daniel Pfeffer 31-Oct-15 5:51
 Re: Problem with fma and underflow Member 1209640031-Oct-15 6:30 Member 12096400 31-Oct-15 6:30
 Square slower than Multiplication? Matt T Heffron28-Oct-15 11:15 Matt T Heffron 28-Oct-15 11:15
 Re: Square slower than Multiplication? Daniel Pfeffer28-Oct-15 21:27 Daniel Pfeffer 28-Oct-15 21:27
 Re: Square slower than Multiplication? Matt T Heffron29-Oct-15 6:31 Matt T Heffron 29-Oct-15 6:31
 Re: Square slower than Multiplication? Daniel Pfeffer29-Oct-15 20:40 Daniel Pfeffer 29-Oct-15 20:40
 Last Visit: 31-Dec-99 18:00     Last Update: 19-Sep-21 4:44 Refresh 1