The Circular Shift is a very commonly used operation used heavily in encryption algorithms and ciphers. For the most part, it has been implemented in code, however recently some processors have started having it implemented in the hardware. Here I would like to present an improved version of the algorithm, which is faster in code, and will almost certainly be much faster in hardware.
Using the Code
Okay, I'm just going to jump into the code, and then explain afterwards how it works, although I hope the more astute of you will figure it out before I do.
First, let's look at the classical Circular Shifts:
uint32_t ClassicRotateLeft(uint32_t, int);
uint32_t ClassicRotateRight(uint32_t, int);
uint32_t ClassicRotateLeft(uint32_t N, int S)
return (N >> (32 - S)) | (N << S);
uint32_t ClassicRotateRight(uint32_t N, int S)
return (N << (32 - S)) | (N >> S);
I really don't think I need to explain these. There are thousands of videos and Java applets available online that can explain this much better than I can. Not to mention it's pretty self explanatory.
Now to the juicy parts. Here are the new Circular Shifts / Bit Rotations:
uint32_t NewRotateLeft(uint32_t, int);
uint32_t NewRotateRight(uint32_t, int);
uint32_t NewRotateLeft(uint32_t N, int S)
return ((N >> (~S)) >> 1) | (N << S);
uint32_t NewRotateRight(uint32_t N, int S)
return ((N << (~S)) << 1) | (N >> S);
First, before you continue reading, I'd ask that you try to figure it out on your own. It's actually pretty simple, and the mental exercise is quite fun.
How It Works
So the first thing you will notice is that I got rid of the
(32 - S). It's uses no arithemtic operations, only bit manipulations. Anyone who has studied low level code optimizations will know that bitwise operators are much faster than complicated operations such as multiplication and division, and are a little bit faster than simpler operations such as addition and subtraction.
The only part that's different from the Classical Rotations are these:
(N >> (~S)) >> 1)
(N << (~S)) << 1)
This is the part that replaces
(32 - S). It uses a simple bit trick to replace the subtraction.
So, let's say we want to Rotate an integer
N by 21 bits. Again, the
(N << S) &
(N >> S) are the same, so I'll ignore them. Let's look at 21 as a series of 5 bits.
So, in the code above, we
NOT 21 (flip the bits), which gives us:
By flipping the bits of 21, it gives us 10. If we add 21 and 10 we get 31. This is the same for every other 5 bit number.
N + (~N) = 31
N | (~N) = 31
So by flipping the bits, we get 1 less than if we subtract (32 - S). So technically, we could use:
(N >> ((~S) + 1))
(N << ((~S) + 1))
But, instead of using an addition instruction (and thereby throwing out optimization out the window), we could just use an extra shift to replace the one we know needs to be there.
So technically this:
(N >> (~S)) >> 1)
(N << (~S)) << 1)
is the same as this:
(N >> ((31 - S) + 1))
(N << ((31 - S) + 1))
With just a tad more awesome.
How Much Faster?
1.21 giga Okay, in practice it really isn't that much faster. I had to do several trials runs with a larger and larger number of iterations each time before I noticed a significant difference. Running it at 10,000,000,000 (ten billion) iterations, there was a 1.2 second difference between the two. So yeah, don't call the press just yet. This is more of a fun little trick rather than the next stage of computing, but I do think that it still has some value. It's certainly much more efficient to implement this in hardware than the original one. And seeing as how the Circular Shifts are standard functions that are rarely changed (this being the exception), there's no reason not to use the one that is technically (albeit imperceptibly) faster.
Again, this is really more of a cool trick to show off rather than a huge achievement. I do think that this shows that even the most basic, standard operations can still be optimized. By itself, it's just a small change. But you never know; keep looking, and by optimizing a few more functions, those little changes can add up to make a big difference
Lastly, to answer what is undoubtedly going to be a common question:
Yes, when you
NOT a 32 bit variable, ALL 32 bits get flipped. In example:
uint32_t N = 21;
N = (~N);
Flipping 21 gives you 4,294,967,274.
However the processor will only look at the bits significant to the operation, which means for a 32 bit the first 5 bits, 64 bit the first 6, etc. Don't worry, the processor will handle it.
Well, thanks for taking the time to look at this. Please let me know what you think about this trick, as well as your experience using it.