Click here to Skip to main content
15,886,689 members
Articles / Programming Languages / C# 7.0
Tip/Trick

Real Case of Premature Micro-Optimization

Rate me:
Please Sign up or sign in to vote.
4.33/5 (3 votes)
31 Jan 2020CPOL2 min read 8.8K   19   1   3
Ternary Operator vs Lookup Table Benchmark

The benchmark is hosted at Github.

I discovered a real case of premature micro-optimization when you don't measure it. You know it is pretty bad when you read premature and micro in the same sentence. On page 44 of Optimizing C++ ebook by Agner Fog, it is written ...

In some cases it is possible to replace a poorly predictable branch by a table lookup. For example:

C++
float a;
bool b;
a = b ? 1.5f : 2.6f;

The ?: operator here is a branch. If it is poorly predictable then replace it by a table lookup:

C++
float a;
bool b = 0;
const float lookup[2] = {2.6f, 1.5f};
a = lookup[b];

If a bool is used as an array index then it is important to make sure it is initialized or comes from a reliable source so that it can have no other values than 0 or 1. In some cases the compiler can automatically replace a branch by a conditional move.

I was trying to implement this lookup table optimization (to replace ternary operator) on a floating-point value which the code is compiled with G++ and ran on Linux. I also ran the integer benchmark and on other compilers such like Visual C++ 2019 and Clang and also the Visual C# 7 to see their differences. Below is the C++ benchmark code. The lookup array is declared as a static local variable inside the function.

C++
int64_t IntTernaryOp(bool value)
{
    return value ? 3 : 4;
}

int64_t IntArrayOp(bool value)
{
    static const int64_t arr[2] = { 3, 4 };
    return arr[value];
}

double FloatTernaryOp(bool value)
{
    return value ? 3.0f : 4.0f;
}

double FloatArrayOp(bool value)
{
    static const double arr[2] = { 3.0f, 4.0f };
    return arr[value];
}

int main()
{
    const size_t MAX_LOOP = 1000000000;
    
    int64_t sum = 0;
    double sum_f = 0;

    timer stopwatch;

    stopwatch.start("IntTernaryOp");
    sum = 0;
    for (size_t k = 0; k < MAX_LOOP; ++k)
    {
        sum += IntTernaryOp(k % 2);
    }
    stopwatch.stop(sum);

    stopwatch.start("IntArrayOp");
    sum = 0;
    for (size_t k = 0; k < MAX_LOOP; ++k)
    {
        sum += IntArrayOp(k % 2);
    }
    stopwatch.stop(sum);

    stopwatch.start("FloatTernaryOp");
    sum_f = 0;
    for (size_t k = 0; k < MAX_LOOP; ++k)
    {
        sum_f += FloatTernaryOp(k % 2);
    }
    stopwatch.stop(sum_f);

    stopwatch.start("FloatArrayOp");
    sum_f = 0;
    for (size_t k = 0; k < MAX_LOOP; ++k)
    {
        sum_f += FloatArrayOp(k % 2);
    }
    stopwatch.stop(sum_f);

    return 0;
}

In Visual C# code, static local variable is not permitted so the lookup array is declared as a static member variable inside the class. C# does not allow casting integer bool to integer (to be used as array index), so I use an integer instead.

C#
static Int64[] arrInt64 = new Int64[] { 3, 4 };
static Double[] arrDouble = new Double[] { 3.0, 4.0 };

static Int64 IntTernaryOp(Int64 value)
{
    return (value==1) ? 3 : 4;
}

static Int64 IntArrayOp(Int64 value)
{
    return arrInt64[value];
}

static double FloatTernaryOp(Int64 value)
{
    return (value == 1) ? 3.0f : 4.0f;
}

static double FloatArrayOp(Int64 value)
{
    return arrDouble[value];
}

static void Main(string[] args)
{
    const int MAX_LOOP = 1000000000;

    Int64 sum = 0;
    double sum_f = 0.0;

    Stopwatch stopWatch = new Stopwatch();
    stopWatch.Start();
    sum = 0;
    for (Int64 k = 0; k < MAX_LOOP; ++k)
    {
        sum += IntTernaryOp(k % 2);
    }
    stopWatch.Stop();
    DisplayElapseTime("IntTernaryOp", stopWatch.Elapsed, sum);

    stopWatch = new Stopwatch();
    stopWatch.Start();
    sum = 0;
    for (Int64 k = 0; k < MAX_LOOP; ++k)
    {
        sum += IntArrayOp(k % 2);
    }
    stopWatch.Stop();
    DisplayElapseTime("IntArrayOp", stopWatch.Elapsed, sum);

    stopWatch = new Stopwatch();
    stopWatch.Start();
    sum_f = 0;
    for (Int64 k = 0; k < MAX_LOOP; ++k)
    {
        sum_f += FloatTernaryOp(k % 2);
    }
    stopWatch.Stop();
    DisplayElapseTime("FloatTernaryOp", stopWatch.Elapsed, sum_f);

    stopWatch = new Stopwatch();
    stopWatch.Start();
    sum_f = 0;
    for (Int64 k = 0; k < MAX_LOOP; ++k)
    {
        sum_f += FloatArrayOp(k % 2);
    }
    stopWatch.Stop();
    DisplayElapseTime("FloatArrayOp", stopWatch.Elapsed, sum_f);
}

These are the benchmark results on a computer with Intel i7-8700 at 3.2Ghz with 16GB RAM.

VC++ /Ox results
       IntTernaryOp:  562ms, result:3500000000
         IntArrayOp:  523ms, result:3500000000
     FloatTernaryOp: 3972ms, result:3.5e+09
       FloatArrayOp: 1030ms, result:3.5e+09
G++ 7.4.0 -O3 results
       IntTernaryOp:  306ms, result:3500000000
         IntArrayOp:  519ms, result:3500000000
     FloatTernaryOp: 1030ms, result:3.5e+09
       FloatArrayOp: 1030ms, result:3.5e+09
Clang++ 6.0.0 -O# results
       IntTernaryOp:  585ms, result:3500000000
         IntArrayOp:  523ms, result:3500000000
     FloatTernaryOp: 1030ms, result:3.5e+09
       FloatArrayOp: 1030ms, result:3.5e+09
C# 7 Release Mode, .NET Framework 4.7.2
       IntTernaryOp: 1311ms, result:3500000000
         IntArrayOp: 1038ms, result:3500000000
     FloatTernaryOp: 2448ms, result:3500000000
       FloatArrayOp: 1036ms, result:3500000000

For the integer benchmark, the lookup table got worse performance than lookup table with G++, while in Visual C++/C# and Clang++, there is improvement. As it turns out in the floating benchmark I am looking for (in G++), improvement is only seen in Visual C++/C# while G++ and Clang++ retained the same timing for both ternary and lookup table code. I did check the assembly code at the Godbolt Compiler Explorer, none of the ternary operator is converted to use conditional move as suggested by Agner Fog's book so I guess the short branch is still with the CPU cache, therefore it makes little difference in G++ and Clang++ floating-point test. The morale of the story is never take a book's word at its value and always measure to confirm your hypothesis. In my case, I forgo this lookup micro-optimization.

History

  • 31st January, 2020: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
Singapore Singapore
Shao Voon is from Singapore. His interest lies primarily in computer graphics, software optimization, concurrency, security, and Agile methodologies.

In recent years, he shifted focus to software safety research. His hobby is writing a free C++ DirectX photo slideshow application which can be viewed here.

Comments and Discussions

 
QuestionBenchmarking conditions Pin
YvesDaoust4-Feb-20 3:20
YvesDaoust4-Feb-20 3:20 
Timing this operation in isolation in a function might not be conclusive. The behavior might be very different in a real-code situation.

When the ? operator is implemented via a conditional branch, you can indeed see a performance degradation. But it can also be implemented by a conditional assignment operation.

I wouldn't draw a general conclusion from this experiment, except that you should only worry when this operation is used intensively.

QuestionNot sure if I got the purpose. Pin
Paulo Zemek31-Jan-20 16:44
mvaPaulo Zemek31-Jan-20 16:44 
AnswerRe: Not sure if I got the purpose. Pin
Shao Voon Wong2-Feb-20 11:53
mvaShao Voon Wong2-Feb-20 11:53 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.