|
Allignment
I thought this was a quite Basic, Simple and well understood topic, but I must admit, cannot find any articles about it. Must write a short one myself to fill this void. Watch this space, and I get something together soon.
Regards,
Bram van Kampen
|
|
|
|
|
Hi Bram van Kampen,
I am not saying that I do not understand what is memory alignment.
My question is I do not know why you mentioned memory alignment matters to the performance of equal comparison in my question. Could you provide more information please?
regards,
George
|
|
|
|
|
If the item being compared is less than the natural word size in the CPU, the item may need to be shifted before the comparison takes place. That takes more time than if the item falls on a natural word boundary.
Judy
|
|
|
|
|
Thanks Judy,
I do not know why if loading one byte, other than 4-bytes (32-bit), the system needs additional efforts to do alignment? For one byte, the system can load the continuous 4 bytes from memory which contains the one byte and could meet with 32-bit alignment requirement, and in this way, only one CPU cycle is needed, and no additional cycle is needed to do alignment for 4-byte (32-bit) data, right?
Please correct me if there are anything wrong with my statements above.
regards,
George
|
|
|
|
|
Read Bram's post again more carefully. The difference comes in if the item to be compared is not one byte long and is not stored on a 4-word boundary. If the item is 4 bytes long but is not aligned on the boundary, the CPU must do two fetches to get the item in its entirety before it can compare.
Judy
|
|
|
|
|
Thanks, At least You Understand what I'm trying to explain. You confirmed I'm Not writing gibberish afterall. Trying to close the tread , without success sofar.
We were all learners once.
Regards and thanks
Bram van Kampen
|
|
|
|
|
Hi Judy,
In my original question, I am comparing one byte with another one byte. So, I do not think there is any alignment issues. Bram is talking about if a WORD or something is put across the alignment boundary, so we need to 2 CPU cycles to fetch -- it is another case. I am talking about a byte, not a WORD.
Please feel free to correct me if I am wrong.
If we need any additional alignment operations for one byte (not one WORD or DWORD), please also correct and I would be willing to learn.
regards,
George
|
|
|
|
|
You are correct for a byte. Your original post way back when also mentioned an int which is not one byte so you got the long discussion on alignment. This statement concerns me:
George_George wrote: If we need any additional alignment operations for one byte (not one WORD or DWORD), please also correct and I would be willing to learn.
You do not do anything to deal with alignment with respect to the CPU, it handles that itself. You asked a pretty low level performance question about the comparison of two one-byte numbers versus the comparison of two four-byte numbers. You got a low-level answer on how the CPU handles these comparisons which is where the answer to your original performance question lies.
The alignment Bram and I have been talking about is not the same as the "struture member alignment" option you can specify in the compiler options and override with #pragma pack. Two completely different beasts.
The first answer when dealing with a question about low-level performance should always be: code it in a sane and reasonable manner without trying to optimize performance and see how it actually performs before tinkering with the code. Nine times out of ten, it performs fine. In the one case where it doesn't, do some profiling and see where the bottlenecks actually are. They are usually not where you were worrying about in the first place.
Judy
|
|
|
|
|
Thanks Judy!
I think my question is answered. I appreciate your help and patience all the time.
regards,
George
|
|
|
|
|
Thanx for your Support. You understood what I Tried to explain.
Sent the Following to George_George to Close the Subject:-
May the tread continue in virtual heaven, May those humans who contributed and have expired since the thread started, go to their respective heavens, Those that are still in the land of the living. for, all those who contributed, did not break new boundaries, but merely covered points which first year University Courses should have covered.
Claim Bonus Points for Your respective heavens, whatever their religion.
Bottom Line: Education should not concentrate only on the virtual experience of how a compiler compiles, It should keep the new bucks down a bit by also teaching programming in assy language and Basic principles and a basic understanding of how an I86, or whatever chip works!
Bram van Kampen
|
|
|
|
|
The processor only loads data at 4 byte boundaries. If the allignment is off, it transparently carries out internal shifts,
and loads the 32 bits in two goes.
If you compare Bytes, Words or Dwords alligned on a 4 byte boundary, there is in all cases one 32 bit wide fetch cycle for each operand. The Compare cycle is also identical, it generates all three possible results in the one go. The difference between them is which result gets stored in the Flag Register.
Now if you have a DWORD stored on a 2 byte boundary, that takes 2 fetch cycles. whereas a WORD stored on a 2 byte boundary takes only One fetch cycle. That means that a DWORD comparison can be slower than a WORD comparison, depending on allignment. You can fill in the rest yourself for the situation with Bytes.
BTW.
This is more a significant issue when you do things like RF.digital signal processing. If I were to do something like that, I would definitely not start with a pentium chip. Horses for Courses as they say.
It has never cropped up anywhere in my experience as an issue of major importance when writing CPP Windows/MFC Code, which is what this forum is about.
Then again, There's nothing wrong with being curious.
Regards,
Bram van Kampen
|
|
|
|
|
Thanks Bram,
I agree memory is aligned on 32-bit if the machine is 32-bit. But I do not know why alignment needs additional efforts.
For one byte, the system can load the continuous 4 bytes from memory which contains the one byte and could meet with 32-bit alignment requirement, and in this way, only one CPU cycle is needed, and no additional cycle is needed to do alignment for 4-byte (32-bit) data, right?
Please correct me if there are anything wrong with my statements above.
regards,
George
|
|
|
|
|
|
Hi Bram,
Sorry for any inconvenience. In my original question, I am comparing one byte with another one byte (with the performance comparison of one 32-bit integer and another 32-bit integer). So, I do not think there is any alignment issues for a byte. I think you are talking about if a WORD or something is put across the alignment boundary, so we need to 2 CPU cycles to fetch -- it is another case. I am talking about a byte, not a WORD or a DWORD.
Please feel free to correct me if I am wrong.
If we need any additional alignment operations for one byte (not one WORD or DWORD), please also correct and I would be willing to learn.
regards,
George
|
|
|
|
|
Please re-read what I wrote in the past. You seem to have some block of imagination between what you write in your file, and how it's used after being compiled. Please explain further why these timing issues are so important. As I Tried to explain before, but I'll now spel it out:-
The idea of writing in windows and MFC is, that whatever platform you write for, In Escence your Code will work. Believe it or not, if written prudently, your code will work on a MAC, on Windows 2000, or, Windows NT. Things work, because the type of question you ask here in how long does it take to perform a Core Operation, like a Compare, or the differerence therein by size of operant, does not come into the equasion, and is largely insignificant in most cases, because of the nature of the User Interface.
Please let me know WHY it is so important to know these timing differences.
Regards
Bram van Kampen
|
|
|
|
|
Thanks Bram,
It is my pure technical interest to learn how internal things work, like compare. I appreciate your help all the time.
regards,
George
|
|
|
|
|
George_George wrote: It is my pure technical interest to learn how internal things work, like compare. I appreciate your help all the time
I Thought that All Along, Otherwise I might have dismissed you with a smart remark. (not my style though)Hope my comments were helpful to yourself an the community. You ask many basic questions, an that's GOOD!
Bram van Kampen
|
|
|
|
|
Thanks for your encouragement, Bram van Kampen!
regards,
George
|
|
|
|
|
May the tread continue in virtual heaven, May those humans who contributed and have expired since the thread started, go to their respective heavens, Those that are still in the land of the living. for, all those who contributed, did not break new boundaries, but merely covered points which first year University Courses should have covered.
Claim Bonus Points for Your respective heavens, whatever their religion.
Bottom Line: Education should not concentrate only on the virtual experience of how a compiler compiles, It should keep the new bucks down a bit by also teaching programming in assy language and Basic principles and a basic understanding of how an I86, or whatever chip works!
Bram van Kampen
|
|
|
|
|
Hello everyone,
Two concepts about heap on Windows after reading MSDN document about heap functions.
1. Default heap. Each process has a default heap. But the default heap of different processes are different, right? Example, process 1 has default heap A and process 2 has default heap B, then A and B should be different heaps, right?
2. Why a process needs to allocate private heap, any practical use?
3. Are there any default global heap which different processes could share?
thanks in advance,
George
|
|
|
|
|
1. A and B should be different heaps - Yes.
2. Practical uses - Yes.
3. Global heap - No.
...cmk
The idea that I can be presented with a problem, set out to logically solve it with the tools at hand, and wind up with a program that could not be legally used because someone else followed the same logical steps some years ago and filed for a patent on it is horrifying.
- John Carmack
|
|
|
|
|
Thanks cmk,
Could you help to show some practical usage of creating private heap please?
regards,
George
|
|
|
|
|
2. A process needs a heap to allocate dynamic things (e.g. malloc function, and new keyword).
1. When a process exits, it must release its memory. How could it release memory that
contains its heap intertwined with some other processes heap?
3. no, see 1.
Luc Pattyn [Forum Guidelines] [My Articles]
this months tips:
- use PRE tags to preserve formatting when showing multi-line code snippets
- before you ask a question here, search CodeProject, then Google
|
|
|
|
|
Thanks Luc,
Could you help to show some practical usage of creating private heap please?
regards,
George
|
|
|
|
|
Hi,
Having more than one heap may improve the memory situation (less fragmentation)
and the performance.
PERFORMANCE
You could take advantage of a private heap like this:
- create a private heap
- allocate a lot of objects on that heap
- when done, rather than freeing each of these objects, and then the heap itself,
you can free the heap directly without caring about the objects it contains; this of course
assumes you don't need the objects any longer.
FRAGMENTATION
Example: a program needs a lot of type1 objects for a long time and a lot of type2 objects
for a short time; these objects are created at the same time.
First scenario: using a single heap the objects would get interleaved somehow, so freeing the type2 objects would leave the one heap fragmented, effecively not yielding free memory pages
at all.
Second scenario: allocating type1 objects on heap1 and type2 objects on heap2, when done
with type2, heap2 can be freed, effectively freeing all its memory pages whereas heap1
continues to be used (with less, at best no, fragmentation).
Luc Pattyn [Forum Guidelines] [My Articles]
this months tips:
- use PRE tags to preserve formatting when showing multi-line code snippets
- before you ask a question here, search CodeProject, then Google
|
|
|
|
|