Click here to Skip to main content
15,867,453 members
Articles / Programming Languages / Markdown
Tip/Trick

Floating Point and Integer Arithmetic Benchmark

Rate me:
Please Sign up or sign in to vote.
5.00/5 (4 votes)
5 Jun 2018CPOL2 min read 20.1K   282   3   10
Performance of Floating Point and Integer Arithmetic has closed gap in modern CPU

Introduction

This is not much of a tip, just a posting of benchmark result to compare integer and floating point arithmetic timing. All the integer and floating point types used in Benchmark are 64bit. Timing is based on looping 100 million times. Clarification: SmallInt and SmallDouble refers to small values (10-10000) stored in int64_t and double, not referring to the type size. Big integer and double value range from 10,000 to 1000,000, if they are any bigger, there would be overflow in 64bit integer.

Hardware Specs

  • Processor: Intel i7-6700 CPU @ 3.40GHz, 3400 Mhz, 4 Cores, 8 Logical Processors
  • RAM: 16 GB
  • Graphics Card: NVIDIA GeForce GTX 1060 6GB

CSharp x64 Benchmark

Note: x86-32 executable typically has worse integer performance than floating point (not shown here). You can build as x86-32 executable and run it to see for yourself.

Multiplication and Division Benchmark
=====================================
MulBigDouble RunTime:00:00.186
MulBigInt RunTime:00:00.157
DivBigDouble RunTime:00:00.160
DivBigInt RunTime:00:00.776
MulSmallDouble RunTime:00:00.192
MulSmallInt RunTime:00:00.191
DivSmallDouble RunTime:00:00.205
DivSmallInt RunTime:00:00.933

Addition and Subtraction Benchmark
==================================
AddBigDouble RunTime:00:00.167
AddBigInt RunTime:00:00.154
SubBigDouble RunTime:00:00.151
SubBigInt RunTime:00:00.152
AddSmallDouble RunTime:00:00.204
AddSmallInt RunTime:00:00.187
SubSmallDouble RunTime:00:00.186
SubSmallInt RunTime:00:00.218

C++ x64 Benchmark

Multiplication and Division Benchmark
=====================================
       MulBigDouble:   57ms
          MulBigInt:   49ms
       DivBigDouble:   96ms
          DivBigInt:  636ms
     MulSmallDouble:   60ms
        MulSmallInt:   68ms
     DivSmallDouble:  118ms
        DivSmallInt:  823ms

Addition and Subtraction Benchmark
==================================
       AddBigDouble:   57ms
          AddBigInt:   49ms
       SubBigDouble:   64ms
          SubBigInt:   49ms
     AddSmallDouble:   69ms
        AddSmallInt:   59ms
     SubSmallDouble:   63ms
        SubSmallInt:   59ms

Most of the time, integer performance is on par with floating point, with exception of division.

The performance of floating point arithmetic has caught up with the integer in the last 15 years. This very much removes the requirement to have our own custom fixed point type to wring last drop of performance out of processor. For those who are not familiar, fixed point is arithmetic type which is like floating point except its decimal point is fixed, does not move, hence its name. The main difference is fixed point arithmetic is executed on the integer unit, not on floating point unit. Fixed point type was relevant during the period where integer perf was crown over floating point. Source code download consists of the CSharp and C++ version of the same benchmark.

Any suggestions on how to improve the nature of benchmark or constructive criticism on what I have been doing wrong, are all welcome.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
Singapore Singapore
Shao Voon is from Singapore. His interest lies primarily in computer graphics, software optimization, concurrency, security, and Agile methodologies.

In recent years, he shifted focus to software safety research. His hobby is writing a free C++ DirectX photo slideshow application which can be viewed here.

Comments and Discussions

 
QuestionHow about benchmark on x86? Pin
Southmountain13-Jun-18 16:25
Southmountain13-Jun-18 16:25 
QuestionComparable results Pin
KarstenK11-Jun-18 0:44
mveKarstenK11-Jun-18 0:44 
QuestionUseful Pin
Rick York7-Jun-18 7:38
mveRick York7-Jun-18 7:38 
AnswerRe: Useful Pin
Jochen Arndt7-Jun-18 21:22
professionalJochen Arndt7-Jun-18 21:22 
GeneralRe: Useful Pin
Rick York8-Jun-18 4:19
mveRick York8-Jun-18 4:19 
GeneralRe: Useful Pin
Avitevet11-Jun-18 5:50
Avitevet11-Jun-18 5:50 
QuestionInteresting Pin
Yves7-Jun-18 7:16
Yves7-Jun-18 7:16 
PraiseUseful updated information Pin
Armando A Bouza7-Jun-18 6:24
Armando A Bouza7-Jun-18 6:24 
SuggestionSuggestions PinPopular
Jochen Arndt5-Jun-18 23:03
professionalJochen Arndt5-Jun-18 23:03 
Quote:
Any suggestion on how to improve the nature of benchmark or constructive criticism on what I have been doing wrong, is all welcome.
Your benchmarks are far away from being precise.

Execution of your benchmarking application is interrupted by other processes resulting in different execution times and measures. To take this into account, each test should be executed multiple times reporting the average and minimum times (the minimum time is expected to be close to the "real" time). Ensure that the system is as calm as possible when running. Especially I/O operations like disk and network transfers might interrupt your application.

Tip: Use the task manager and discard the results when there is other activity.

Due to the x86 architecture the times for adding, subtracting, and (probably ; I'm not quite sure) multiplying should not depend on the operand values. There should be also no differences between adding and subtracting (at least no significant / measurable differences).

You have to know (or control) which kind of instructions is used. This depends on the compiler options and the CPU type. With the default /arch:SSE2 C++ option, floating point operations are probably executed using SSE instructions instead of x87 coprocessor instructions. Then the compiler may also use partial loop unrolling to generate SSE vector operations performing multiple operations at once. Such vector operations can also be used for integer operations.

To make the results comparable, you would have to subtract the execution time of the other code (the loops and accessing the values). Because this is always the same, you can create a function without math operation (e.g. by just assigning a value). But ensure that the loops are not optimised away by the compiler.

I suggest also to use a temporary variable in all functions which makes measuring the overhead time more precise:
C++
for (int k = 0; k < loop; ++k)
{
    for (int i = 0; i < bigIntList.size(); ++i)
    {
        int64_t temp = bigIntList[i];
        for (int j = 0; j < bigIntList.size(); ++j)
        {
            // The operation under test
            result = temp * bigIntList[j];
            // Use this for measuring the overhead time
            //result = bigIntList[j];
        }
    }
}

If you provide benchmark data, you should add information about the system (at least CPU type), the used settings for optimisation and floating point, and - if possible - what kind of instructions is used. For the latter you can let the C++ compiler generate an assembly output file to check what is used.

Finally, never use debug builds for such benchmarks. Execution of the additional code might require more time than the operations itself. This applies especially to your C++ code where you use std::vector (each access includes a range check with debug builds).
GeneralRe: Suggestions Pin
Avitevet11-Jun-18 7:59
Avitevet11-Jun-18 7:59 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.