|
trønderen wrote: If you have any reason at all to relate to the generated code. Trading readability and maintainability for "slightly faster code" is generally a bad move. . I generally agree with you. However, as we both know there are exceptions, which is why you used the word "generally" I'm sure. This is one of those cases, as lexing is always in a critical code path, and a generalized lexer must be able to handle bulk input as efficiently as possible.
My input to visual fa is one or more regular expressions. Literally just that. Here is the full input for that generated lexer, in my .rl Rolex lexer format, but it should be easy enough to discern the grammar below without knowing the format.
Object = "{"
ObjectEnd = "}"
Array = "["
ArrayEnd = "]"
FieldSeparator = ":"
Comma = ","
Number = '-?(?:0|[1-9][0-9]*)(?:\.[0-9]+)?(?:[eE][+-]?[0-9]+)?'
Boolean = 'true|false'
Null = "null"
String = '"([^\n"\\]|\\([btrnf"\\/]|(u[0-9A-Fa-f]{4})))*"'
WhiteSpace = '[ \t\r\n]+'
The table driven code is run on a flat array of integers. It might be more efficient to unflatten it in this case - maybe? I used to run a more complicated array of structs for this, and I don't remember there being a performance difference. But anyway, there is also an array of int arrays for a feature called block ends, which simulate lazy matching on a DFA. (I have the details of all of it documented in my Visual FA series). It's also simpler in operation than it looks. I do actually use gotos in a couple of places here to restart the state machine. It was much less complicated than orchestrating a while with breaks. I should state that I didn't comment the code here because it wouldn't help me. It may help others, but I didn't really care about that. This pattern is burned into my brain after writing more than half a dozen lexers that follow the same. It honestly would just clutter it for me, as the code makes immediate sense to me despite how it looks, and I didn't write it for a team.
private FAMatch _NextImpl(
#if FALIB_SPANS
ReadOnlySpan<char> s
#else
string s
#endif
)
{
int tlen;
int tto;
int prlen;
int pmin;
int pmax;
int i;
int j;
int state = 0;
int acc;
if (position == -1)
{
++position;
}
int len = 0;
long cursor_pos = position;
int line = this.line;
int column = this.column;
int ch = -1;
Advance(s, ref ch, ref len, true);
start_dfa:
acc = _dfa[state];
++state;
tlen = _dfa[state];
++state;
for (i = 0; i < tlen; ++i)
{
tto = _dfa[state];
++state;
prlen = _dfa[state];
++state;
for (j = 0; j < prlen; ++j)
{
pmin = _dfa[state];
++state;
pmax = _dfa[state];
++state;
if (ch < pmin)
{
state += ((prlen - (j + 1)) * 2);
j = prlen;
}
else if (ch <= pmax)
{
Advance(s, ref ch, ref len, false);
state = tto;
goto start_dfa;
}
}
}
if (acc != -1)
{
int sym = acc;
int[] be = (_blockEnds != null && _blockEnds.Length > acc) ? _blockEnds[acc] : null;
if (be != null)
{
state = 0;
start_be_dfa:
acc = be[state];
++state;
tlen = be[state];
++state;
for (i = 0; i < tlen; ++i)
{
tto = be[state];
++state;
prlen = be[state];
++state;
for (j = 0; j < prlen; ++j)
{
pmin = be[state];
++state;
pmax = be[state];
++state;
if (ch < pmin)
{
state += ((prlen - (j + 1)) * 2);
j = prlen;
}
else if (ch <= pmax)
{
Advance(s, ref ch, ref len, false);
state = tto;
goto start_be_dfa;
}
}
}
if (acc != -1)
{
return FAMatch.Create(sym,
#if FALIB_SPANS
s.Slice(unchecked((int)cursor_pos), len).ToString()
#else
s.Substring(unchecked((int)cursor_pos), len)
#endif
, cursor_pos, line, column);
}
if (ch == -1)
{
return FAMatch.Create(-1,
#if FALIB_SPANS
s.Slice(unchecked((int)cursor_pos), len).ToString()
#else
s.Substring(unchecked((int)cursor_pos), len)
#endif
, cursor_pos, line, column);
}
Advance(s, ref ch, ref len, false);
state = 0;
goto start_be_dfa;
}
return FAMatch.Create(acc,
#if FALIB_SPANS
s.Slice(unchecked((int)cursor_pos), len).ToString()
#else
s.Substring(unchecked((int)cursor_pos), len)
#endif
, cursor_pos, line, column);
}
while (ch != -1)
{
var moved = false;
state = 1;
tlen = _dfa[state];
++state;
for (i = 0; i < tlen; ++i)
{
++state;
prlen = _dfa[state];
++state;
for (j = 0; j < prlen; ++j)
{
pmin = _dfa[state];
++state;
pmax = _dfa[state];
++state;
if (ch < pmin)
{
state += ((prlen - (j + 1)) * 2);
j = prlen;
}
else if (ch <= pmax)
{
moved = true;
}
}
}
if (moved)
{
break;
}
Advance(s, ref ch, ref len, false);
}
if (len == 0)
{
return FAMatch.Create(-2, null, 0, 0, 0);
}
return FAMatch.Create(-1,
#if FALIB_SPANS
s.Slice(unchecked((int)cursor_pos), len).ToString()
#else
s.Substring(unchecked((int)cursor_pos), len)
#endif
, cursor_pos, line, column);
}
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
trønderen wrote: To me, the SM table is not the result delivered by a generator - it is a modeling tool for the human developer. The driver is typically a score code statements, independent of the model (a.k.a. transition table). I certainly agree that an editor tailored to transition table editing is a great thing to have. I have tried to maintain a compile time initialized C++ array for a transition table, using Np++. Even for trivial SMs, that is almost impossible (unless you just copy the table from e.g. a standard document and it will be carved in stone).
I didn't address this part of your post. I should. I don't expect transition tables to be especially readable.
Visual FA is called "Visual" because it can produce images - directed graphs of the state machines that match 1:1 with the generated tables and code. For example, q0 in the graph corresponds to the q0 : label in the goto table. It's perspicuous enough and yet concise enough that I've used the graphs as a guide to hand roll lexers when i needed the lexer to perform additional work beyond what a strict DFA could provide.
I've also used them to debug. Since the correlation is 1:1 between the graphs and the compiled code, it's easier to follow along with than the tables, but both can be followed with the graphs.
The graphs effectively become in part, documentation for the state machine, and for that they work pretty well.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
Generally you are right, usually the codewitch works on small embedded systems where performances and code footprint can be extremely stringent.
I found myself doing things I would have abhorred only a few scant years ago...
GCS/GE d--(d) s-/+ a C+++ U+++ P-- L+@ E-- W+++ N+ o+ K- w+++ O? M-- V? PS+ PE Y+ PGP t+ 5? X R+++ tv-- b+(+++) DI+++ D++ G e++ h--- r+++ y+++* Weapons extension: ma- k++ F+2 X
The shortest horror story: On Error Resume Next
|
|
|
|
|
Table driven implementations usually reduces the control code to typically a couple hundred bytes (or even less), at the expense of data space for the table.
By using data elements no larger than required in the transition table entries, each entry can be kept to a very moderate size.
One possible issue is the number of states and events. It takes some experience to control both, to keep the table size (#states * #events) under control. A common trick is to introduce 'state variables'.
Some times you can use 2+ small tables rather than a huge one, e.g. if you implement a communication protocol: One table for the connect phase, one for the data transfer phase.
Many transition tables are rather sparse anyway, but a lot of methods for space efficient storage of sparse matrices are basic data structure knowledge. E.g. non-empty entries may be factored out to a packed, linear array, and the large table contains indexes to this array. Often, several transitions are identical (typically in one state, for different events, or for one event in several different states) - then a linear table need to hold only a single copy.
Certainly, really old embedded processors (such as 8051) had very little data space; expanding code space was far easier (e.g. through banking hardware). While we would usually call the transition table 'data', it is 100% read-only, and may very well be burnt in ROM (ok, call it 'flash' nowadays) together with the driver code.
If you consider CLR for an embedded CPU (don't try that on an 8051!), then you definitely can fit a packed transition table. My guess is that the total code+data size would be significantly smaller than an equivalent implementation with switch cases and/or if/else-sequences. And faster, even if a packed table will lead to a couple more indirections.
I will maintain that table driven state machines can be a very good solution for embedded processors.
Religious freedom is the freedom to say that two plus two make five.
|
|
|
|
|
trønderen wrote: If I were given the responsibility for a state machine implementation like that, I would immediately run to my boss asking for permission to rewrite the whole thing as a table driven machine.
... or as a state machine that returns function pointers instead of using tables and state variables:
#include <stdio.h>
#include <conio.h>
typedef void (*RT)( int input );
typedef RT (*TER)( int input );
extern TER state1( int input );
extern TER state2( int input );
extern TER state3( int input );
TER state1( int input )
{
printf( "one\t" );
return input < 10 ? (TER)&state2 : (TER)NULL;
}
TER state2( int input )
{
printf( "two\t" );
return (TER)&state3;
}
TER state3( int input )
{
printf( "three\t" );
return (TER)&state1;
}
int main(int argc, char* argv[])
{
int n;
TER state = (TER)&state1;
for ( n = 0 ; state ; ++n ) {
state = (TER)( state( n ) );
}
printf( "\n\nPress any key\n" );
getch();
return 0;
}
Type casts are useful because in C it's impossible to declare function pointers that return function pointers that return function pointers that return function pointers...
Regards
|
|
|
|
|
Wow, too much with the GOTO's already - they put it in the language for a reason. It will always be in certain languages for the same reasons. We are just debating normal human failings that have nothing to do with GOTO.
We are engineers, we should know that ALL humans a fallible and can make a mess of anything.
Careless use of GOTO helps us make a mess faster, careful use of GOTO makes our code run faster.
|
|
|
|
|
Fair enough. Implement a DFA state machine without gotos, achieving comparable performance. I'll wait.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
I was agreeing with you - my point was, nobody should care if you are using GOTO's, they should only care if you are making a mess with them.
I have never seen you produce a mess, quite the opposite in fact.
|
|
|
|
|
I clearly misunderstood you. Sorry.
And thanks!
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
honey the codewitch wrote: I clearly misunderstood you. Sorry.
Np, irony is easily missed in short messaging!
|
|
|
|
|
My CP sig used to be something like, "If you think GOTO's are bad try coding in Assembler without using JMP."
A programmer I once worked with had the opinion that subroutines (that's what methods were called back then) should only have one exit point. Since you couldn't RETURN from where the routine might need to exit and you couldn't use GOTO his longer methods tended to have dozens of embedded IF blocks. Yuck.
There are no solutions, only trade-offs. - Thomas Sowell
A day can really slip by when you're deliberately avoiding what you're supposed to do. - Calvin (Bill Watterson, Calvin & Hobbes)
|
|
|
|
|
When all you have is a hammer, every problem looks like a nail.
Goto's have their place, the problem is when they're being abused because all the developer sees is nails and can't come up with a better solution.
That being said, I'm looking glancing at your code and clearly I'm in no position to criticize.
|
|
|
|
|
We're running a heavy process that's causing some of our servers to creak a little. Sorry about any slowness you might experience today. It'll be over soon, I hope.
cheers
Chris Maunder
|
|
|
|
|
No problem, I had to attend a funeral today
(no, not the funeral of the CodeProject website)
|
|
|
|
|
Sorry to hear that, mate.
cheers
Chris Maunder
|
|
|
|
|
Thanks Chris, it was my mothers funeral, everything went well, the weather was perfect, and I'm sure she would have appreciated all the kind words said by the four speakers. I concluded with some lines from an old poem:
Quote: Farewell to thee, farewell to thee
Thou charming one who dwells in shaded bowers
One fond embrace ere I depart
Until we meet again.
|
|
|
|
|
I will do my part to alleviate the server burden by not posting this. Oops.
|
|
|
|
|
Cheers Chris it wasn't slow for me - it was unavailable for hours
In a closed society where everybody's guilty, the only crime is getting caught. In a world of thieves, the only final sin is stupidity. - Hunter S Thompson - RIP
|
|
|
|
|
If a page doesn't load for a few hours can we still claim that as slow loading times?
(Sorry about that)
cheers
Chris Maunder
|
|
|
|
|
No worries Chris I’m sure it couldn’t be helped
In a closed society where everybody's guilty, the only crime is getting caught. In a world of thieves, the only final sin is stupidity. - Hunter S Thompson - RIP
|
|
|
|
|
I thought the server hamsters seemed a little haggard this morning.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
I'm a little haggard this morning...
cheers
Chris Maunder
|
|
|
|
|
Honestly, I thought it was the solar flares.
|
|
|
|
|
The introduction to The Lounge says:
Quote: The Lounge is rated Safe For Work. If you're about to post something inappropriate for a shared office environment, then don't post it. No ads, no abuse, and no programming questions. Trolling, (political, climate, religious or whatever) will result in your account being removed.
I am focused on the sentence: Trolling, (political, climate, religious or whatever) will result in your account being removed.
However, is bashing and trolling M$, Apple, Google, Meta, etc. when they, as companies, act stupid with their technology/programming/API offerings, OK?
Of course it is
|
|
|
|
|
It's not bashing. It's discussing, in painful detail, the issues that we face using our tools as part of a thoughtful dialogue.
To be honest, the state of software today is doing my head in.
(and it's been a while - how are you?)
cheers
Chris Maunder
|
|
|
|
|