|
Yesterday I stumbled across this[^], about C++20 adding [[likely]] and [[unlikely]] to give compilers hints about this kind of thing. The proprietary language in which I worked for many years introduced the tags <usual> and <rare> in the early '90s for this purpose.
|
|
|
|
|
I think that seems like it has more to do with how the compiler arranges its jump tables, than how the CPU predicts branches, but I could be wrong, or could be talking about the same thing without realizing it.
I'm still new to this myself. I did a little SIMD before, but only for DSP stuff and you already don't need ifs for that.
Real programmers use butterflies
|
|
|
|
|
honey the codewitch wrote: I think that seems like it has more to do with how the compiler arranges its jump tables, than how the CPU predicts branches, but I could be wrong Yep, that's exactly what it does. With MSVC you would use the _mm_prefetch intrinsic to ask the cpu to cache a memory address.
Probably best to avoid it.
|
|
|
|
|
for my part, I'm avoiding those kinds of things, preferring to rely on optimizations present in (usually, and for reasons) the C stdlib like string.h's strpbrk() and math.h's max() function
however, in order to make those branchless implementations really work effectively i still have to avoid branching in my own code. I expect the compiler to take my branchless code and apply SIMD instructions to its implementation of it.
Real programmers use butterflies
|
|
|
|
|
I never got into the details but simply recall that, at least on that system, it allowed the mainline path to avoid branching, which made the code run faster.
|
|
|
|
|
likely & unlikely are used so the optimiser can reorder code around conditional branches to suit the CPU best - this blog post gives some background & some performance measurements.
Java, Basic, who cares - it's all a bunch of tree-hugging hippy cr*p
|
|
|
|
|
yeah, that's what i thought. basically, so it can build better jump tables and such.
Real programmers use butterflies
|
|
|
|
|
No, it affects the code sequences generated by the compiler. Consider these two functions:
unsigned count;
#define likely(x) __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)
unsigned f1(unsigned x) {
if (likely(x > count)) {
++count;
} else {
--count;
}
return count;
}
unsigned f2(unsigned x) {
if (unlikely(x > count)) {
++count;
} else {
--count;
}
return count;
}
The use of likely vs unlikely causes different code to be generated for the two functions (this is with clang 11, with the -O3 flag). In f1 , there's a branch to the --count case, while in f2 , there's a branch to the ++count case - in each case, there's a branch to the code considered to be 'unlikely', whereas the control flow just falls through the branch instruction to the 'likely' case. No jump tables here.
f1(unsigned int): # <a href="https://www.codeproject.com/Members/F1">@f1</a>(unsigned int)
mov ecx, dword ptr [rip + count]
cmp ecx, edi
setb dl
mov eax, 1
test dl, 1
je .LBB0_1
add eax, ecx
mov dword ptr [rip + count], eax
ret
.LBB0_1:
mov eax, -1
add eax, ecx
mov dword ptr [rip + count], eax
ret
f2(unsigned int): # <a href="https://www.codeproject.com/Members/f2">@f2</a>(unsigned int)
mov ecx, dword ptr [rip + count]
cmp ecx, edi
setb al
test al, 1
jne .LBB1_1
mov eax, -1
add eax, ecx
mov dword ptr [rip + count], eax
ret
.LBB1_1:
mov eax, 1
add eax, ecx
mov dword ptr [rip + count], eax
ret
count:
.long 0 # 0x0
You can find this on Compiler Explorer.
Java, Basic, who cares - it's all a bunch of tree-hugging hippy cr*p
|
|
|
|
|
I mean, to me that's still a jump table, just with two entries - one of which falls through. I wonder how the compiler would generate code for more cases?
Admittedly, I'm using the term maybe loosely, as I tend to think very abstractly, so if something feels like a jump table then i have no problem calling it that.
My point in making the distinction anyway, is my comment to Greg Utas i was making the distinction between code like this (whether an actual "jump table" or just two possibilities) nd what i was initially talking about, which is enabling branch prediction - a different animal. i don't think likely and unlikely directly impact branch prediction (although they likely do indirectly) - they affect the order the jumps come in, as your code basically demonstrates.
Adding, I probably should have said jump order, and part of the reason i didn't is i first learned about it in the context of switch/case
Real programmers use butterflies
modified 4-Jan-21 13:15pm.
|
|
|
|
|
I'm not really sure if he knows what the jump table is. There are two instructions in his example that would be added to the jump table.
But to be fair... programmers don't necessarily need to know these things.
|
|
|
|
|
Well I've only recently had occasion to learn about this stuff in a concrete sense. I mean, I understood some of the basic concepts of superscalar architectures with branch prediction and SIMD, but nothing specifically about actually coding against them effectively.
But it seems to me like the order of the jumps doesn't (or shouldn't?) impact the branch prediction much if at all.
That's why I said that likely/unlikely probably effect jump order rather than branch prediction (though maybe it impacts it indirectly, if jump order actually matters to the CPU for something more than evaluation order)
I'm saying this to you now, because clearly you know about this stuff. So please correct me if I'm wrong.
Real programmers use butterflies
|
|
|
|
|
honey the codewitch wrote: But it seems to me like the order of the jumps doesn't (or shouldn't?) impact the branch prediction much if at all. The order in which your compiler emits jump instructions absolutely matter. The optimizing pass attempts to determine which branches are most likely to be used and reorders instructions. The likely and unlikely keywords now allow the programmer to have some control over that.
honey the codewitch wrote: That's why I said that likely/unlikely probably effect jump order rather than branch prediction (though maybe it impacts it indirectly, if jump order actually matters to the CPU for something more than evaluation order) yeah, by adjusting the jump order it has an affect on branch prediction.
|
|
|
|
|
That's good to know. I guess that means unoptimized jump orders in C++ code *really* hurts, instead of just kind of.
That's sort of unfortunate though, because there are many cases in a codepath where an if will be evaluated most often true with one use case, and most often false with another. Assuming what you're saying is true, the CPU is going to suck at one of these.
Real programmers use butterflies
|
|
|
|
|
honey the codewitch wrote: That's sort of unfortunate though, because there are many cases in a codepath where an if will be evaluated most often true with one use case, and most often false I mean... some of this is just common sense. Imagine that you have a switch statement with 100 cases. 99 of those cases occur 0.1% of the time, the other case occurs 99.9% of the time. In the old days (on old hardware before pipelining and branch prediction) we had to remember to optimize our switch statements so the most common case is at the top.
It's not as relevant today because the address of the most used case will most likely be present in the instruction pipeline. In fact it will have probably already been fetched and speculatively executed.
|
|
|
|
|
Ah, that makes sense. There's sure a lot of thinking around corners required when it comes to understanding what the CPU is doing with your instructions on a superscalar architecture.
Real programmers use butterflies
|
|
|
|
|
honey the codewitch wrote: Ah, that makes sense. Can you tell me why the most used case will already be present in the instruction pipeline? And also why that address will most likely have been speculatively executed? (The 100 case switch statement)
Yes, this is a quiz to check if you really understood.
|
|
|
|
|
If I understand your question, and the problem correctly, then because of locality. The code would be in the cache. And based on what you said before, it uses history to predict, but most recently you also said it would rely on branch order - i am a little unclear on that, but right now I'm assuming you mean it uses both, which seems likely.
Real programmers use butterflies
|
|
|
|
|
honey the codewitch wrote: it uses history to predict Yeah, the 'jump table' would most likely have that address with a weighted score.
honey the codewitch wrote: but most recently you also said it would rely on branch order - i am a little unclear on that I think looking at executables as a visual graph[^] make it easier to understand. The cpu does not have any control over branch ordering. Your compiler/linker controls this. This entire conversation was about how likely and unlikely keywords affect that ordering. And how that influences branch prediction.
Anyway, I am falling behind in my work, gotta go
|
|
|
|
|
Randor wrote: The cpu does not have any control over branch ordering. Your compiler/linker controls this.
Yeah, I understand that. What I'm saying is you seemed to be saying that the CPU relies on the order those branches appear in the instruction stream in order to predict. I heard you as saying it would favor the first one it encountered. Maybe I misunderstood you.
Randor wrote: Anyway, I am falling behind in my work, gotta go
Don't work too hard.
Real programmers use butterflies
|
|
|
|
|
Stuart,
We are discussing jump tables in the CPU related to branch prediction. There are two address locations in the example you have given that would be added to the jump table. Both the je and jne would result in a jump table entry. It's actually extremely easy to understand:
Branch prediction basically works like this: (note that this is a simplification and each vendor does it differently)
- As your program is executing the CPU tracks each and every time a jump instruction has executed. It keeps a table of the address of these jumps and increments an internal counter each time that jump instruction is executed.
- As executable code is being loaded and cached, the cpu attempts to predict which path will be executed. It uses that internal counter as a 'weight'. In other words previous taken paths have a higher probability of being chosen. If it incorrectly predicts a path... this is known as a branch misprediction. After a misprediction the cpu has to throw away everything it just computed and go back and predict the path again until it gets it right.(This is actually the core problem with the spectre vulnerability[^]) as you can look at the timing and use very small intervals of time to know what is inside the cpu cache.
The likely and unlikely keywords effect the ordering of the jump instructions in such a way so that branch prediction will be more accurate.
Best Wishes,
-David Delaune
modified 5-Jan-21 3:48am.
|
|
|
|
|
Randor wrote: This is actually the core problem with the spectre vulnerability[^]
I just read about that now, thanks to your comment. I don't know whether to be impressed by the hack, or appalled at the amount of time the creator of it must have had on their hands.
Real programmers use butterflies
|
|
|
|
|
cool. i'll check it out
Real programmers use butterflies
|
|
|
|
|
For something a little bit different, try programming GPUs. You have to think in terms of using a few thousand processors that can all work at once. The challenge is to arrange your algorithm and data so they actually can all work at once. It's really quite fun.
"They have a consciousness, they have a life, they have a soul! Damn you! Let the rabbits wear glasses! Save our brothers! Can I get an amen?"
|
|
|
|
|
I've played a little bit with that, enough to at least have a general feel for the landscape. It dovetails with this kind of coding actually, because branchless coding tends to be parallelizable. It's not always true of course, but it just tends to lend itself to it.
Real programmers use butterflies
|
|
|
|
|
No offense intended, but I can think of some things that are a lot more fun, and less contrived.
|
|
|
|