Click here to Skip to main content
15,878,959 members
Articles / General Programming / Algorithms

Interval arithmetic using round-to-nearest mode (part 1 of n)

Rate me:
Please Sign up or sign in to vote.
5.00/5 (10 votes)
27 Oct 2015CPOL6 min read 17.6K   142   8   10
A faster method for performing interval arithmetic

Abstract

Interval arithmetic has traditionally been hampered by low performance of programs written to take advantage of it. I suggest a method based on IEEE Std 754-2008, which has the potential of being much faster.

Introduction

Interval arithmetic is one attempt to answer the question of computation accuracy - given a set of floating-point operations, how accurate are the results?

Interval arithmetic replaces a single floating-point number with a range of numbers which we know contains the value. For example, measuring a 3m x 4m room with a metric tape measure might give results accurate to within 0.5cm, which would be represented as [2.995, 3.005] x [3.995, 4.005]. Any operations on intervals are performed in a manner which preserves this property. For example,

add(a, b) == add([a_inf, a_sup], [b_inf, b_sup]) == [a_inf + b_inf, a_sup + b_sup].

A basic implementation

The basic idea of interval arithmetic is to perform each operation twice – once in “round to -infinity” (RD) mode, and once in “round to +infinity” (RU) mode.  While this is guaranteed to work, it has the following disadvantages:

  • Each operation incurs three changes of the rounding mode – to RU, to RD, and back to the default (typically “round to nearest” or RN).

  • On most modern processors, each of these switches is extremely expensive. For example, on late-model Intel processors reading the SSE rounding mode is expensive, while setting it is a serializing operation!

The RU approach

An approach that works nicely for some operations is to take advantage of the fact that RD( a ) == -RU( -a ). In order to do this, we store the interval as [-a, b] rather than [a, b]. It is obvious that addition/subtraction may be performed directly with only RU, while multiplication and division may be performed with some additional sign-bit flipping.

This has the advantages of:

  • Requiring only two rounding mode switches per operation

  • Enabling the use of SIMD instructions for addition/subtraction

However, it cannot be universally used. For example, sqrt(-a) == NaN, not -sqrt(a).

The “round to nearest” approach

The problem of interval arithmetic boils down to an issue of selection. Given an infinitely-precise result A for one of the bounds, we need to round down if we are calculating the lower bound, or round up if we are calculating the upper bound of the interval.

We know that RN( A ) is equal to either RD( A ) or RU( A ), but which?

It turns out that this information can be made available. Furthermore, use of “round to nearest” mode is required in order to make this work.

Exact transforms

Given two floating-point numbers a, b, it is possible to calculate exact transforms as follows:

  • TwoSum() - (a, b) are transformed into (s,e) such that s = RN(a + b) and e = a + b - s
s = RN(a + b)
a' = RN(s - b)
b' = RN(s - a')
da = RN(a - a')
db = RN(b - b')
e = RN(da + db)

The TwoSum() algorithm requires 6 floating-point operations.

  • TwoProd - (a, b) are transformed into (p, e) such that p = RN(a * b) and e = a * b - p

Split(x, n)

Require: C = 2^n + 1, where n = (bits in mantissa + 1)/2

t1 = RN(C * x)
t2 =  RN(x - t1)
xh = RN(t1 + t2)
xl = RN(x - xh)

TwoProd(x, y)

(xh, xl) = Split(x, s)
(yh, yl) = Split(y, s)
p = RN(x * y)
t1 = RN(-p + RN(xh * yh))
t2  = RN(t1 + RN(xh * yl))
t3 = RN(t2 + RN(xl * yh))
e = RN(t3 + RN(xl * yl))

The TwoProd() algorithm requires 17 floating-point operations

If e is negative, then RN( a + b ) == RU( a + b ), and RD( a + b ) == prev( s ). If e is positive, then RN( a + b ) == RD( a + b ), and RU( a + b ) == succ( s ). Similar arguments apply to TwoProd().

Basic Functions

The TwoSum() and TwoProd() procedures are sufficient for performing interval addition/subtraction and multiplication. If an fma() operation is present (it is mandatory under IEEE 754-2008), it is possible to calculate the following functions:

  • TwoProdFMA(a, b)
p = RN(a * b)
e = fma(a, b, -p)

The TwoProdFMA() algorithm performs a TwoProd() with 2 floating-point operations, instead of 17.

  • Division() - q = RN(a / b); r = -fma(q, b, -a)
  • Reciprocal() - q = RN(1.0 / b); r = -fma(q, b, -1.0)
  • Sqrt() - q = RN( sqrt(a) ); r = -fma(q, q, -a)

If r is negative, then the result = RN( A ) == RU( A ) and RD( A ) == prev( result ). If r is positive, then the result = RN( A ) == RD( A ) and RU( A ) == succ( result ).

Note that:

  • prev(x) and succ(x) are required by IEEE Std 754, and are implemented in C++ 2011 as std::nextafter()
  • If an fma() operation is not present in hardware, it may be simulated in software by using the TwoProd() and TwoSum() operations. This requires a total of 35 floating-point operations – 17 for TwoProd(), and 18 for the 3 TwoSum() operations required.

Testing

The following C++ implementations of interval arithmetic were tested:

  • “Basic” interval arithmetic – sets the rounding mode to RU, RD, and default as necessary

  • “RU” interval arithmetic – stores [a, b] as [-a, b], eliminating the requirement to set the rounding mode to RD in many cases.

  • “round to nearest” interval arithmetic – implements the algorithms defined in this paper.

In all cases, SIMD instructions were used if available and useful.

The test involves 100,000,000 operations with random operands, as follows:

  1. Addition
  2. Multiplication
  3. division (0.0 not included in the denominator’s interval)
  4. square sqr(x) = (x*x)
  5. square root
  6. hypot(x, y) = sqrt(sqr(x), sqr(y)).

This hypot() function was chosen because it includes addition (which may be optimized by the RU method), squaring (which may also optimized by the RU method), and square root (which may be optimized only by the “round to nearest” method). It is also easily calculated in straightforward C++ code.

The code was compiled using Visual Studio 2013 in x64 Release mode, using the /fp:strict compiler option. Note that use of this option is essential if accurate results are to be obtained.

The results (in nanoseconds per operation) on an Intel Core i7 2670QM processor are as follows:

 

Speed Test Results

 

Basic

RU

Round to nearest

Addition

104

79

25

Multiplication

230

194

61

Division

280

217

94

Square

238

185

42

Square Root

139

83

44

hypot()

804

702

173

 

 

 

 

 

 

 

 

 

 

 

 

Notes:

  • The times are given after subtracting the loop overhead and the time required to generate random inputs.

  • Unlike the latest Intel processors, this 2670QM processor does not contain fma() in hardware, and must simulate it in software. This significantly reduces the advantage of the “round to nearest” method for multiplication, division, square, and square root operations.

Summary

An interval arithmetic library that uses round-to-nearest mode is practical, and on current popular hardware – significantly (3+ times) faster than the directed rounding approach.

Future directions

An interval math package that will include most of the forward functions required by the IEEE-1788 Standard for interval arithmetic is currently under development.

References

  1. IEEE Std 754-2008

  2. IEEE Std 1788-2015

  3. Handbook of Floating-Point Arithmetic, Jean-Michel Muller et al., Birkhäuser

  4. Interval Arithmetic: from Principles to Implementation, T. Hickey, Q. Ju, M.H. van Emden

  5. Interval Operations in Rounding to Nearest, S.M. Rump, P. Zimmerman, S. Boldo, G. Melquiond

History

Version 1.0

 

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
Israel Israel
B.Sc - Physics & Computer Science, Bar-Ilan University, Ramat Gan, Israel

Over thirty years experience as a programmer and software engineer.

Currently employed at Western Digital*.



* Any opinions expressed are strictly my personal opinions, and do not represent my employer's position.

Comments and Discussions

 
QuestionProblem with fma and underflow Pin
Member 1209640029-Oct-15 13:46
Member 1209640029-Oct-15 13:46 
AnswerRe: Problem with fma and underflow Pin
Daniel Pfeffer29-Oct-15 20:49
professionalDaniel Pfeffer29-Oct-15 20:49 
GeneralRe: Problem with fma and underflow Pin
Member 1209640029-Oct-15 21:21
Member 1209640029-Oct-15 21:21 
GeneralRe: Problem with fma and underflow Pin
Daniel Pfeffer29-Oct-15 22:26
professionalDaniel Pfeffer29-Oct-15 22:26 
GeneralRe: Problem with fma and underflow Pin
Daniel Pfeffer31-Oct-15 5:51
professionalDaniel Pfeffer31-Oct-15 5:51 
GeneralRe: Problem with fma and underflow Pin
Member 1209640031-Oct-15 6:30
Member 1209640031-Oct-15 6:30 
QuestionSquare slower than Multiplication? Pin
Matt T Heffron28-Oct-15 11:15
professionalMatt T Heffron28-Oct-15 11:15 
AnswerRe: Square slower than Multiplication? Pin
Daniel Pfeffer28-Oct-15 21:27
professionalDaniel Pfeffer28-Oct-15 21:27 
GeneralRe: Square slower than Multiplication? Pin
Matt T Heffron29-Oct-15 6:31
professionalMatt T Heffron29-Oct-15 6:31 
GeneralRe: Square slower than Multiplication? Pin
Daniel Pfeffer29-Oct-15 20:40
professionalDaniel Pfeffer29-Oct-15 20:40 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.