Click here to Skip to main content
15,846,976 members
Home / Discussions / C / C++ / MFC
   

C / C++ / MFC

 
GeneralRe: For loop Pin
honey the codewitch3-Oct-23 13:32
mvahoney the codewitch3-Oct-23 13:32 
AnswerRe: For loop Pin
Dave Kreskowiak25-Sep-23 9:53
mveDave Kreskowiak25-Sep-23 9:53 
GeneralRe: For loop Pin
Calin Negru25-Sep-23 23:40
Calin Negru25-Sep-23 23:40 
GeneralRe: For loop Pin
Richard Andrew x6428-Sep-23 13:12
professionalRichard Andrew x6428-Sep-23 13:12 
GeneralRe: For loop Pin
Dave Kreskowiak28-Sep-23 13:20
mveDave Kreskowiak28-Sep-23 13:20 
GeneralRe: For loop Pin
harold aptroot26-Sep-23 16:15
harold aptroot26-Sep-23 16:15 
GeneralRe: For loop Pin
Calin Negru27-Sep-23 0:09
Calin Negru27-Sep-23 0:09 
GeneralRe: For loop Pin
harold aptroot5-Oct-23 22:32
harold aptroot5-Oct-23 22:32 
By the way in general it is not right to think of variables as being allocated anywhere, neither in memory nor in registers. Variable are not the "thing" that is allocated, and any given variable may end being in zero or more places at the same time, if you insist on looking at it like that. It's not a completely useless mental model, which is probably why it persists, but that's as a lie-to-children. If a variable is assigned to various times (in the static sense: not so much several times in a loop, but several times in straight line code), those different "versions" of the variable may well end up in different places. SSA considers those different "versions" of the variable to be different variables altogether. Furthermore, even one "version" of a variable can be split into multiple live ranges - that's not just theoretical, there can be multiple good reasons to split it and allocated the pieces to different places. For example, there are often restrictions on which set of register can be used for some instructions, such as on x64 divisions and "legacy" shift-by-variable instructions.

For example, if we consider this code with a division and shift-by-variable:
int test(int x, int y)
{
    x = x / y;
    return y << x;
}
[MSVC compiles it like this, for x64](https://godbolt.org/z/T8dGWMzhc) (why doesn't this link linkify?)
Assembly
 0   x$ = 8
 1   y$ = 16
 2   int test(int,int) PROC                           ; test
 3           mov     r8d, edx
 4           mov     eax, ecx
 5           cdq
 6           idiv    r8d
 7           mov     ecx, eax
 8           shl     r8d, cl
 9           mov     eax, r8d
10           ret     0
11   int test(int,int) ENDP                           ; test

On lines 0 and 1 MSVC helpfully defined stack offsets for x and y, which aren't used, they never end up being on the stack. x is passed in via ecx, and y via edx.

x begins in ecx, then is copied to eax (line 4) because idiv takes the dividend in edx:eax, the division leaves it in eax (only because the code happens to assign the result of x / y back to x - to be clear, the output would be in eax either way, but eax could have represented some other variable otherwise), the original un-divided value of x is still in ecx at this point (after the division on line 6 but before the mov on line 7) but we need the new value to be in ecx, because shl needs the shift count to be in cl which is the lowest byte of ecx. Clearly if we ask "where is x", it depends on which line of the assembly code (not even the C++ source code) we ask that question about.

y begins in edx, but it cannot stay there because idiv uses edx as input for the upper half of the dividend, and as output for the remainder, so y is copied to r8d, and it stays there. The result of y << x is copied into eax (the return value needs to be in eax) but that's not really y itself. I could have written y <<= x; return y; and then the same assembly code results, but then eax does represent y.

Let's turn things up a notch. I wrote that a variable may be in multiple places, let's see it:
#include <stddef.h>

int test(size_t N, int *data)
{
    int sum = 0;
    for (size_t i = 0; i < N; i++)
        sum += data[i];
    return sum;
}

Compiler Explorer
Assembly
N$ = 8
data$ = 16
int test(unsigned __int64,int * __ptr64) PROC                       ; test
        xor     r8d, r8d
        mov     r11, rcx
        mov     r10d, r8d
        mov     eax, r8d
        cmp     rcx, 8
        jb      SHORT $LN9@test
        xorps   xmm2, xmm2
        and     rcx, -8
        movdqa  xmm1, xmm2
        npad    3
$LL4@test:
        movdqu  xmm0, XMMWORD PTR [rdx+rax*4]
        paddd   xmm0, xmm2
        movdqa  xmm2, xmm0
        movdqu  xmm0, XMMWORD PTR [rdx+rax*4+16]
        add     rax, 8
        paddd   xmm0, xmm1
        movdqa  xmm1, xmm0
        cmp     rax, rcx
        jb      SHORT $LL4@test
        paddd   xmm1, xmm2
        movdqa  xmm0, xmm1
        psrldq  xmm0, 8
        paddd   xmm1, xmm0
        movdqa  xmm0, xmm1
        psrldq  xmm0, 4
        paddd   xmm1, xmm0
        movd    r10d, xmm1
$LN9@test:
        mov     r9d, r8d
        cmp     rax, r11
        jae     SHORT $LN20@test
        mov     rcx, r11
        sub     rcx, rax
        cmp     rcx, 2
        jb      SHORT $LC14@test
        lea     rcx, QWORD PTR [r11-1]
        npad    1
$LL16@test:
        add     r8d, DWORD PTR [rdx+rax*4]
        add     r9d, DWORD PTR [rdx+rax*4+4]
        add     rax, 2
        cmp     rax, rcx
        jb      SHORT $LL16@test
$LC14@test:
        cmp     rax, r11
        jae     SHORT $LN15@test
        add     r10d, DWORD PTR [rdx+rax*4]
$LN15@test:
        lea     eax, DWORD PTR [r9+r8]
        add     eax, r10d
        ret     0
$LN20@test:
        mov     eax, r10d
        ret     0
int test(unsigned __int64,int * __ptr64) ENDP                       ; test

Lots of stuff going on here, but here's the important part: there are 4 sums, held in one vector register. xmm2 usually holds those sums. After paddd xmm0, xmm2 it's really xmm0 that holds the sums, then movdqa xmm2, xmm0 immediately copies them back to xmm2 though. And by the way, yes I think that's a mildly silly way to do it, MSVC could have used paddd xmm2, XMMWORD PTR [rdx+rax*4] instead of that movdqu \ paddd \ movdqa sequence, and while "number of instructions" is a poor metric I do believe that that would just be a better way to do it. Especially on CPUs that do not have move-elimination. But whatever, MSVC does what it does.

After the label $LL16@test there is a small unrolled-by-a-factor-of-2 loop where both r8d and r9d are used to calculate more sums, but are they sum? However you look at it, r8d and r9d are used to calculate part of the sum. r10d also holds part of the sum at this point, namely the part that was calculated by the vectorized loop .. also up to one extra element may be summed into r10d, if one element is left over (ie if N is odd). After $LN15@test, a lea and add are used to add up all 3 parts of the sum that exist at that point.

modified 6-Oct-23 7:05am.

QuestionService not created correctly under windows 11 but any older version. Pin
Rick R. 202323-Sep-23 15:54
Rick R. 202323-Sep-23 15:54 
AnswerRe: Service not created correctly under windows 11 but any older version. Pin
Randor 23-Sep-23 11:43
professional Randor 23-Sep-23 11:43 
GeneralRe: Service not created correctly under windows 11 but any older version. Pin
Rick R. 202323-Sep-23 13:13
Rick R. 202323-Sep-23 13:13 
GeneralRe: Service not created correctly under windows 11 but any older version. Pin
Randor 23-Sep-23 13:22
professional Randor 23-Sep-23 13:22 
GeneralRe: Service not created correctly under windows 11 but any older version. Pin
Rick R. 202323-Sep-23 13:24
Rick R. 202323-Sep-23 13:24 
GeneralRe: Service not created correctly under windows 11 but any older version. Pin
Randor 23-Sep-23 13:53
professional Randor 23-Sep-23 13:53 
GeneralRe: Service not created correctly under windows 11 but any older version. Pin
Rick R. 202323-Sep-23 14:06
Rick R. 202323-Sep-23 14:06 
QuestionRe: Service not created correctly under windows 11 but any older version. Pin
Randor 23-Sep-23 16:57
professional Randor 23-Sep-23 16:57 
AnswerRe: Service not created correctly under windows 11 but any older version. Pin
Rick R. 202324-Sep-23 7:37
Rick R. 202324-Sep-23 7:37 
GeneralRe: Service not created correctly under windows 11 but any older version. Pin
Randor 24-Sep-23 8:08
professional Randor 24-Sep-23 8:08 
Questionfingerprint sensor code with c++ Pin
ibiere22-Sep-23 0:56
ibiere22-Sep-23 0:56 
AnswerRe: fingerprint sensor code with c++ Pin
CPallini22-Sep-23 1:52
mveCPallini22-Sep-23 1:52 
QuestionType of array and printf specifiers Pin
Member 114540620-Sep-23 5:40
Member 114540620-Sep-23 5:40 
AnswerRe: Type of array and printf specifiers Pin
Mircea Neacsu20-Sep-23 6:11
Mircea Neacsu20-Sep-23 6:11 
AnswerRe: Type of array and printf specifiers Pin
k505420-Sep-23 6:36
mvek505420-Sep-23 6:36 
AnswerRe: Type of array and printf specifiers Pin
CPallini20-Sep-23 6:51
mveCPallini20-Sep-23 6:51 
QuestionHow to get disk model and serial number for the disk Windows is installed on Pin
JohnCodding19-Sep-23 21:41
JohnCodding19-Sep-23 21:41 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.