|
Could this be a possible workaround to avoid those extra JIT compiler arguments?
var cur = this.current;
if( cur >= 'A' && cur <= 'Z' || cur >= 'a' && cur <= 'z') {
}
|
|
|
|
|
Not in the instance I'm using it in without a rework. I'd have to change the structure of the code, which is made more complicated by the fact that it's a CodeDOM tree instead of real code.
Before I do that, I want to make sure I'm not (A) doing something for nothing, and more importantly (B) introducing clutter or extra overhead in an attempt to optimize.
I've included a chunk of the state machine runner code which should illustrate the issue I hope.
int p;
int l;
int c;
ch = -1;
this.capture.Clear();
if ((this.current == -2)) {
this.Advance();
}
p = this.position;
l = this.line;
c = this.column;
if (((((this.current >= 9)
&& (this.current <= 10))
|| (this.current == 13))
|| (this.current == 32))) {
this.Advance();
goto q1;
}
if ((((((((((this.current >= 65)
&& (this.current <= 90))
|| (this.current == 95))
|| (this.current == 104))
|| ((this.current >= 106)
&& (this.current <= 107)))
|| (this.current == 109))
|| (this.current == 113))
|| (this.current == 120))
|| (this.current == 122))) {
this.Advance();
goto q2;
}
if ((this.current == 97)) {
this.Advance();
goto q3;
}
if ((this.current == 98)) {
this.Advance();
goto q22;
}
q1:
if (((((this.current >= 9)
&& (this.current <= 10))
|| (this.current == 13))
|| (this.current == 32))) {
this.Advance();
goto q1;
}
return FAMatch.Create(2, this.capture.ToString(), p, l, c);
q2:
if ((((((this.current >= 48)
&& (this.current <= 57))
|| ((this.current >= 65)
&& (this.current <= 90)))
|| (this.current == 95))
|| ((this.current >= 97)
&& (this.current <= 122)))) {
this.Advance();
goto q2;
}
return FAMatch.Create(0, this.capture.ToString(), p, l, c);
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
Don't expect to see any optimizations in MSIL code, even in Release configuration. They are done by JIT-compiler, and may be more effective, since exact CPU type is known at runtime.
You may try to see optimized real Assembly code, but this is difficult task, since there is huge distance from the source C# code and MSIL to machine language instructions.
modified 21-Jan-24 3:51am.
|
|
|
|
|
I'm aware of that. I am generating MSIL instructions using Reflection Emit as part of my project.
The other part generates source code. I would like to ensure that this source code generates IL that will be then be optimized appropriately by the JITter. If not, I will generate the source code differently, but my interest is in post-jitted code. Not the IL.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
modified 21-Jan-24 1:05am.
|
|
|
|
|
"Quote: Running the code through a debugger and dropping to assembly. The only way I can do that reliably is with debug info, which may change how the JITter drops native instructions. I can't rely on it.
Probably, the answer is here: Do PDB Files Affect Performance?
Generally, the answer is: No. Debugging information is just additional file, which helps debugger to match the native instructions and source code. Of course, if implemented correctly. The article is written by John Robbins.
|
|
|
|
|
I think that's about unmanaged code, and not the JITter
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
Well, buzzwords like .NET, VB .NET, C#, JIT compiler, ILDASM are used in this article only by accident. You are right.
|
|
|
|
|
I am tired and I read the first bit of it. Sorry. It's 3am here and I shouldn't be awake.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
Wouldn't the Rosslyn compiler stuff be a good place to look? It's open source afaik.
|
|
|
|
|
Probably not, since at best it uses Emit facilities and has nothing to do with the final JITter output
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
The JIT (as well as the rest of the runtime) is also open source - there's an optimizer.cpp in that directory, which might be of interest.
Also in that directory is a file (viewing-jit-dumps.md ) which talks about looking at disassembly, and also mentions a Visual Studio plugin, Disasmo, that simplifies this process.
[Edit]Another option - use Godbolt - it supports C#![/Edit]
Java, Basic, who cares - it's all a bunch of tree-hugging hippy cr*p
|
|
|
|
|
Oh wow. I learned two new things from your post. Thanks! Will check that out.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
Even if it did, I wouldn't assume that it always would and would do so on all systems.
I would code explicitly and not use behaviour that isn't part of the doco.
|
|
|
|
|
Well, I didn't ask you what you would do.
And this isn't bizdev
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
If you make a question about some super-fine peephole optimization, an answer that says "Trying to do anything like that is a waste of your time" is an appropriate answer.
Years ago, I could spend days timing and fine-tuning code, testing out various inline assembly variants. Gradually, I came to realize that the compiler would beat me almost every time. Instructions sequences that "looked like" being inefficient, actually run faster when I timed it.
Since those days, CPUs have gotten even bigger caches, more lookahead, hyperthreading and whathaveyou, all confusing tight timing loops to the degree of making them useless. Writing (or generating) assembler code to suppress single instructions was meaningful in the days of true RISCs (including pre-1975 architectures when all machines were RISCs...) running at 1 instruction/cycle with (almost) no exception. Today, we are in a different world.
I really should have spent the time to assembler code the example you bring up, with and without the repeated register load, and time them for you. But I have a very strong gut feeling of what it would show. I am so certain that I do not spend the time to do that for you.
Religious freedom is the freedom to say that two plus two make five.
|
|
|
|
|
I guess I just don't see looking at a new (to me) tech for code generation to see if it's doing what I expect in terms of performance as a waste of time.
To be fair, I also look at the native output of my C++ code. I'm glad I have. Even if not especially the times when it ruined my day, like when I realized how craptastic the ESP32 floating point coprocessor was.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
If you are working on a jitter for one specific CPU, or a gcc code generator for one specific CPU, and your task is to improve the code generating, then you would study common methods for code generating and peephole optimization.
If you are not developing or improving a code generator (whether gcc or jitter), the only reason for studying the architecture of one specific of them is for curiosity. Not for modifying your source code, not even with "minor adjustments".
It can be both educating and interesting to study what resides a couple of layers below the layer you are working on. But you should remember that it is a couple of layers down. You are not working at that layer, and should not try to interfere with it.
(I should mention that I grew up in an OSI protocol world. Not the one where all you know is that some people have something they call 'layers', but one where layers were separated by solid hulls, and service/protocol were as separated as oil and water. An application entity should never fiddle with TCP protocol elements or IP routing, shouldn't even know that they are there! 30+ years of OO programming, interface definitions, private and protected elements -- and still, developers have not learned to keep their fingers out of lower layers, neither in protocol stacks nor in general programming!)
Religious freedom is the freedom to say that two plus two make five.
|
|
|
|
|
Why not download ILSpy[^] and nosey at the produced IL code? Just compile your application in release mode and take a look at the produced IL to see whether it's been optimised. I would hazard a guess that it probably doesn't optimise something like that, but I could be wrong!
|
|
|
|
|
Because I'm not interested in the IL code, but in the post jitted native code.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
Maybe you ought to be more interested in the IL code.
The optimizations that is really significant to you application are not concerned with register loads, but with techniques such as moving invariant code out of loops, doing arithmetic simplifications etc. Peephole optimization (done at code generation) is a combination of "already done, always" and "no real effect on execution time".
I have had a few surprises with C# performance, but they were typically related to data structures, and they were discovered by timing at application level. To pick one example: I suspected that moving a variable out of a single-instance class, making it a static, would make address calculation simpler and faster, compared to addressing a variable within an object instance. I was seriously wrong; that slowed down the application significantly. I could have (maybe should have) dug into the binary code to see what made addressing a static location significantly slower, but as I knew the effect already, I didn't spend the time when I was working on that application.
Religious freedom is the freedom to say that two plus two make five.
|
|
|
|
|
I'm intimately familiar with the IL code already. I both generate code that then gets compiled to it, and I Reflection Emit it directly.
I get that you don't want me to be concerned about the things that I am concerned about. Get that I am anyway.
I already optimized at the application level.
I should add, I inlined one method and got a 20% performance increase. That's strictly jit manipulation. You don't think it's worth it. My tests say otherwise.
And one more thing - not paying attention to this? That along with some broken benchmarks (which shielded me from seeing the performance issues) led me into a huge mess.
Sure if you're writing an e-commerce site you don't have to be concerned with inner loop performance and "performance critical codepaths" because to the degree that you have them, they are measured in seconds to complete or longer.
Lexing, or regex searching is not that. If you don't think manipulating the jitter is worth it then why don't you ask microsoft why they mark up their generated regex code with attributes specifically designed to manipulate the jitted code?
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
"I should add, I inlined one method and got a 20% performance increase."
You are not telling, 20% of what? The entire application, or that specific function call?
And: Inlining is not peephole optimization. The jitter didn't do that. The compiler generating the IL did.
Inlining isn't quite at the same level as changing the algorithm, but much closer to that than to register allocation. In another post, I mentioned my experience with trying to make a variable static, rather than local to the single instance. Inlining is more at that level.
I am saying that modifying your source code to affect peephole optimization is a waste of energy. Inlining is at a different level, and might be worth it, especially if the method is small and called only in a few places.
Religious freedom is the freedom to say that two plus two make five.
|
|
|
|
|
20% of my total execution time, lexing a document end to end.
> The jitter didn't do that. The compiler generating the IL did.
Sorry, but that's just categorically false. The method is created and called in the IL code. It's only inline when jitted.
[System.Runtime.CompilerServices.MethodImpl(System.Runtime.CompilerServices.MethodImplOptions.AggressiveInlining)]
Feel free to play around and mark your code up with that attribute. Watch the compiled results, and the jitted results. You'll see the compiler still drops your method in the assembly, and still drops the callvirt opcode to call it. THE JITTER is what inlines it.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
Yes, I guess I was wrong about that. In C#, inlining is not absolute, just a proposal. The compiler/jitter is free to ignore the proposal. That depends on the cost of a call, which varies a lot with CPU architecture. I would guess that on an ARM, a lot more functions are not inlined, even if proposed by the developer, as calls are significantly cheaper on the ARM than on, say, x8x/x64.
Nevertheless, even if the code generator makes the final decision based on that CPUs specific instruction set and instruction timing, inlining is something you relate to at source code level. Compare it to unrolling a tight loop with a small, fixed number of iterations. Or use of #define expressions in C/C++. It is not at the level of which instructions are generated. (Well, of course all source code has an effect on code generated, but not at the level of selecting specific coding techniques.) If a method is inlined on both architecture X and architecture Y, that is the same structural code change, regardless of X and Y instruction set.
I saw the inlining option a generation ago, when it was a new concept. Then it was a directive to be honored, not a meek proposal. That was at a time when you could also direct a variable to reside in a specific register its entire lifetime. Experience showed that the compiler might know better ... (So we started trusting the compilers!).
Note that leaving the decision whether to inline or not might restrict the freedom of the higher level optimizer: If it takes care of the inlining above code generator level, it can e.g. combine common expressions in the inlined code with other code before or after the (inlined) call. While a code generator in principle could do a similar analysis of surrounding code, don't expect, it to be prepared to! The code to possibly be inlined will be inlined in extenso, even if identical expressions were calculated before or after the call. The task of a (code-independent) compiler is to discover such common expressions even when it takes the responsibility for inlining functions, while the code generator does not have a similar responsibily for restructuring the parse tree before generating code.
Religious freedom is the freedom to say that two plus two make five.
|
|
|
|
|
Maybe I didn't choose the best example in the OP, but it was the one most readily in front of me.
I'll say that about this - not knowing if the JITter would "know" that a repeated access off argument zero could be registerized is a fair question. I already know the answer of a traditional compiler. Here, (given my other most recent response to you, with an eye toward #1) the difference in performance would be significant, if my fear were realized about the actual generated code. I predict substantially more than a 20% difference in execution speed given how often I hit that field in my code. I can't easily test that, because I can't make the jitter do the wrong thing. So admittedly, it's a bit post hoc ergo propter hoc, but I wouldn't say it's a wild guess either.
But finding that out was significant. It wasn't about the CPU making adjustments to the microcode. It was higher than that level. The CPU can't figure that out. It requires at the very least peephole optimization, or better. I know a traditional compiler will do it, but I don't know the cost benefit calculus microsoft engage in in order to even decide if they thought it was worth it to do that optimization in the JITter for most purposes - my purposes being somewhat different than most purposes here.
I stand by that the question was worth knowing, and that the code would have been worth modifying in that worst case.
Because of the kind of code that it is. I'm not arguing general purpose development here. You know as well as I do that generalized rules aren't meant to cover every specific scenario- that's where the experience to know when to step outside those general rules is worth the cost and the potential risks.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|