|
Stuart Dootson wrote: They're letting you have the real code
Now fix it.
(well, maybe I'd be more insistent if you worked for Rockstar...the more I play GTA Online, the more I'm convinced there's not a single aspect of this game that doesn't have some bug.)
|
|
|
|
|
Did you see those ladies just walk all over New Zealand (43-12)? I wonder if the men could take some lessons from them?
|
|
|
|
|
Did they moon the haka?
|
|
|
|
|
Edit: Aha! I knew there was an anti-pattern in what I was doing.
The simple solution is to modify the lexer rules, adding a new rule with a symbol ID of #ERROR (-1) and making it match EVERYTHING. Since it will be added to the state machine last every other match overrides it. This allows the state machine building code to do the heavy lifting of crafting the logic to gather error tokens.
I've actually done this before. I can't believe it didn't occur to me. I can't believe my code didn't already do it. It's one of those simple rules of lexing that I seemed to have learned at one point and then forgotten. Never again. This took me hours.
Edit 2: Aaand that didn't work because it needs to lazy match for that to be effective.
You can do lazy matching with DFAs using a little known technique. Constructing Fast Lexical Analyzers with RE/flex - Why Another Scanner Generator?[^] does it, and the author promised to release a paper on how that worked but I haven't seen anything further on it and the source code is ... creative - everything is done in constructors, just for starters so I haven't been able to make heads or tails of it.
Edit 3: I think I finally solved it without hacking anything too badly.
Edit 4: Solved it and have one of the template generators for the new code (targeting C# so far) implemented:
This is what victory looks like.
Tokenizing: ... /* ...a*/ baz ... 12343 foo 123.22 bar....
AbsolutePosition: 0, AbsoluteLength: 3, Position: 0, Length: 3, SymbolId: -1, Value: ..., Line: 1, Column: 1
AbsolutePosition: 5, AbsoluteLength: 9, Position: 5, Length: 9, SymbolId: 40, Value: /* ...a*/, Line: 1, Column: 9
AbsolutePosition: 15, AbsoluteLength: 3, Position: 15, Length: 3, SymbolId: 6, Value: baz, Line: 1, Column: 19
AbsolutePosition: 20, AbsoluteLength: 3, Position: 20, Length: 3, SymbolId: -1, Value: ..., Line: 1, Column: 24
AbsolutePosition: 24, AbsoluteLength: 5, Position: 24, Length: 5, SymbolId: 3, Value: 12343, Line: 1, Column: 28
AbsolutePosition: 30, AbsoluteLength: 3, Position: 30, Length: 3, SymbolId: 6, Value: foo, Line: 1, Column: 34
AbsolutePosition: 34, AbsoluteLength: 6, Position: 34, Length: 6, SymbolId: 4, Value: 123.22, Line: 1, Column: 41
AbsolutePosition: 41, AbsoluteLength: 3, Position: 41, Length: 3, SymbolId: 6, Value: bar, Line: 1, Column: 48
AbsolutePosition: 44, AbsoluteLength: 4, Position: 44, Length: 4, SymbolId: -1, Value: ...., Line: 1, Column: 51
AbsolutePosition: 0, AbsoluteLength: 3, Position: 0, Length: 3, SymbolId: -1, Value: ..., Line: 1, Column: 1
AbsolutePosition: 5, AbsoluteLength: 9, Position: 5, Length: 9, SymbolId: 40, Value: /* ...a*/, Line: 1, Column: 9
AbsolutePosition: 15, AbsoluteLength: 3, Position: 15, Length: 3, SymbolId: 6, Value: baz, Line: 1, Column: 19
AbsolutePosition: 20, AbsoluteLength: 3, Position: 20, Length: 3, SymbolId: -1, Value: ..., Line: 1, Column: 24
AbsolutePosition: 24, AbsoluteLength: 5, Position: 24, Length: 5, SymbolId: 3, Value: 12343, Line: 1, Column: 28
AbsolutePosition: 30, AbsoluteLength: 3, Position: 30, Length: 3, SymbolId: 6, Value: foo, Line: 1, Column: 34
AbsolutePosition: 34, AbsoluteLength: 6, Position: 34, Length: 6, SymbolId: 4, Value: 123.22, Line: 1, Column: 41
AbsolutePosition: 41, AbsoluteLength: 3, Position: 41, Length: 3, SymbolId: 6, Value: bar, Line: 1, Column: 48
AbsolutePosition: 44, AbsoluteLength: 4, Position: 44, Length: 4, SymbolId: -1, Value: ...., Line: 1, Column: 51
___ snip ___
Trying to match tokens such that runs of error characters get reported as one error, rather than one error for each rejected character.
This is shockingly difficult. I've given up on it in the past, like with my Rolex project, but Reggie is to replace Rolex, among other things, and I'm not willing to ship the latest without that sorted out.
The reason it's such a big deal is multiple errors for one error result can mess with "panic mode" error recovery in parsers that are built on top of a lexer like this because it will get confused as to how many bad tokens there actually are in the text, which makes an already bad situation worse when parsing a document with errors in it.
The thing is it seems so bloody simple, but every similarly simple approach I've taken with it has fallen flat on its face.
This is getting in the way of me releasing code.
Real programmers use butterflies
modified 31-Oct-21 18:08pm.
|
|
|
|
|
Your statement confirms what a great job the IntelliSense developers are doing. And they have been busy with it for decades and it is a team of many (hundreds?) developers / basic researchers.
So don't be so hard on yourself! From what I'm reading from your articles: You are doing a great job
modified 29-Nov-21 21:01pm.
|
|
|
|
|
Thanks!
It is and they are, especially since it is pluggable especially with VS Code. It's really quite amazing, but they do it I think using regex mostly. I used to parse to try to get to syntax highlighting but that's an anti-pattern. Comments and (often) whitespace need to be stripped so the parser doesn't trip over them, and reporting those elements "out of band" to the highlighter complicates things, because they must "pass through" the parser without the parser choking on them. It's easier to discard them. Furthermore, parsers are way more finicky about errors and choke up far more easily on bad input.
I think what VS Code (i refuse to call it Visual Studio Code) and Visual Studio (in one of its two modes at least) does a variation of PEG parsing, which is heavily regular expression and substitution oriented, but I can't be sure because I haven't dug into it, but either way, I think regex or a similar set of constructions (possibly something of a superset) runs the show.
With parsers, you typically use "substitution grammars" and they require the text to be far more structured, but unlike with the kind of thing syntax highlighters use you get a nice clean, unambiguous tree back, which is more suitable for a compiler to consume.
Real programmers use butterflies
|
|
|
|
|
Quote: "...but they do it I think using regex mostly."
I don't think so, but of course I also don't know it ...
For me it looks more like an 'error tolerant parser' with some strategies (trial and error scenarios?) to solve / become synch again.
Sorry, my English is to bad to explain it more precise. In my crazy brain I would compare it to a file comparing tool which has to decide to skip 'a certain part' to get synchronized again... and this with a parsing is for me a similar thing... to skip parts which are not ok.
modified 29-Nov-21 21:01pm.
|
|
|
|
|
I'm mostly thinking of VS Code, but with Visual Studio there are two major ways of doing it if I understand it.. I think you might be talking about the one I haven't touched.
Either that, or my information is bad or old.
Real programmers use butterflies
|
|
|
|
|
Yup. In the products that I worked on, mainline paths were probably less than 10% of the code.
|
|
|
|
|
I'm not looking forward to redoing the database code. I had it almost flawless last night except for this error handling bit. (at some point I need to generate tests to compare the output between the different renditions)
At least the code is very structured. It's basically copying and pasting SQL chunks that do things like advance the cursor and fetch the next UTF32 codepoint so it's straightforward to port from C# to SQL.
The problem is that I lose all intellisense because I'm rendering using ASP like templates and I forget my SQL a lot of times. It's just such a bizarre and thrown together syntax.
That's the worst part about this all. I've made the code very maintainable but it still generates multiple types of output each to multiple targets (sql and C#) so it's necessarily a lot of template code.
Real programmers use butterflies
|
|
|
|
|
Isn't that always the case? It certainly is true for my code (with the exception of trivially simple functions such as memset()).
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
It is, and yet this is particularly troublesome because error handling can typically get away with being "dumb" - basically flag early and often is the best case.
In this case, flagging early and often is the problem. I need to hold off reporting until after I've started matching again so I get the entire set of error characters as a single run. It's the delaying the reporting and then figuring out how far to "reach back" into my capture buffer for the error, as well as anchoring the initial error position so I can report it. It's maddening because i'm running into corner cases, and that screams "anti-pattern" to me, but I don't know if there's a "clean" way to do this.
Real programmers use butterflies
|
|
|
|
|
You're referring to the "else" part of the equation ... the part most do not like to think about because it is always hard.
It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it.
― Confucian Analects: Rules of Confucius about his food
|
|
|
|
|
I have no knowledge of parser types or how parsers work but, whenever I have to get inputs (example: reading structs from a file) and I find an erroneous token/word/data, I put it in a list along with the position of the input where it was found, move forward one byte and repeat again until I get valid input again.
Basically:
1- test token/word/data
2- if valid goto 3 else goto 5
3- process
4- if there is more token/word/data move to next token/word/data and goto 1 else goto 7
5- if invalid put token/word/data in list with position
6- if there is more input move forward one byte (not one token/word/data) and goto 1 else goto 7
7- process invalid token/word/data list (here you can merge them into blocks of successive errors)
8- end program and report errors (if any)
Can't something like this work? The invalid token/word/data are just ignored.
|
|
|
|
|
It can sort of, and I considered keeping an extra stringbuilder around in the C# rendition but it's less workable to do so in SQL. It can still be done, it just makes the code a lot nastier.
It would actually be easier at the parser level to concatenate error tokens coming in off the lexer, and in fact that's probably what I'll do. I'm sick of this.
Real programmers use butterflies
|
|
|
|
|
error handling separates the men from the boys, pardon the expression . So many times I've heard, "but that will never fail" and ashamedly, I've thought the same myself. It's a lie from the pit of hell. It *will* fail, and it will bite your weekend in the butt if you don't handle it.
Defensive programming is part of the art.
In embedded systems, it gets even worse. Every single failure point you have to ask yourself, "how do I keep running?" It takes a completely different mindset and design approach. A guy I work with has schooled me to the point of embarrassment, because his approach is obviously better.
Charlie Gilley
<italic>Stuck in a dysfunctional matrix from which I must escape...
"Where liberty dwells, there is my country." B. Franklin, 1783
“They who can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.” BF, 1759
|
|
|
|
|
With IoT it has been robbing peter to pay paul such that I try to make my code fail, but fail nicely.
I don't have the cycles or the memory to do full error checking in many cases, so what you're left with is a kind of situation where if you can, you want the machine to reboot straight to the landing page after dumping a log. Obviously it shouldn't do that in production but in situations like this it's always a matter of "if someday comes where it does"
I can't harden an IoT device against network attacks for example because i don't have the cycles to do overrun checks and well-formedness checks even on my HTTP headers for example. I parse just enough to make it work, which means a malformed header could easily be eaten by my device.
Such is the lay of the land, and it *is* a different error handling ballgame. It always makes me feel a little dirty.
That's not to say it's like industrial embedded where the opposite is true, and despite working on hardware that is "just good enough" it has to be solid rather than fancy. With IoT, the reverse is typically true.
Real programmers use butterflies
|
|
|
|
|
honey the codewitch wrote: The thing is it seems so bloody simple, but every similarly simple approach I've taken with it has fallen flat on its face.
I feel this to my core. Off and on for awhile I've been working on a project that's the same way. It seems so simple and yet every approach runs into a tiny little problem that completely invalidates the entire approach. I think I may have finally cracked the code last night though. Either that or I'll have another design that was infinitesimally close but fell on its face at the last hurdle.
I'm sure if you keep at it you'll find your answer
|
|
|
|
|
I think I found my solution. I hope you find yours.
Whenever I run into a situation like that - where it feels like trying to push the air bubbles out of a waterbed - I strongly consider the possibility that what I'm doing is an "anti-pattern" and look to replan my approach. That helps some.
Good luck.
Real programmers use butterflies
|
|
|
|
|
I work on automation systems and error handling is immensely important with them. Especially where I am right now, bringing up a new machine for the first time with a whole bunch of new software. It's been quite a battle and we are winning the war.
"They have a consciousness, they have a life, they have a soul! Damn you! Let the rabbits wear glasses! Save our brothers! Can I get an amen?"
|
|
|
|
|
Absolutely. That's a whole different ball of wax. Even the software development takes on more of a flavor of hardware engineering in terms of the rigor involved.
It's not my favorite. I prefer to use tools to produce as much of that code as possible, to shrink my test matrix.
Real programmers use butterflies
|
|
|
|
|
|
Two things:
1) Read the stuff at the top of the page: the Lounge is not for coding questions. Post it here instead: Ask a Question[^]
Ignoring the rules and annoying people you want free help from is not a good idea ...
2) While we are more than willing to help those that are stuck, that doesn't mean that we are here to do it all for you! We can't do all the work, you are either getting paid for this, or it's part of your grades and it wouldn't be at all fair for us to do it all for you.
So we need you to do the work, and we will help you when you get stuck. That doesn't mean we will give you a step by step solution you can hand in!
Start by explaining where you are at the moment, and what the next step in the process is. Then tell us what you have tried to get that next step working, and what happened when you did.
Just posting your homework and expecting us to give you code you can hand in as your own isn't going to work.
If you are having problems getting started at all, then this may help: How to Write Code to Solve a Problem, A Beginner's Guide[^]
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
When I saw this post earlier my instant thought was 'Ah - a homework question', and got the popcorn ready for a flame-throwing session - but then you come back all polite and reasonable, spoiling all the fun!
|
|
|
|
|
He has many of those as a copy+paste solution, just to keep it polite and reasonable by copy+pasting it
If he had to write it down everytime... it would sound different
M.D.V.
If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about?
Help me to understand what I'm saying, and I'll explain it better to you
Rating helpful answers is nice, but saying thanks can be even nicer.
|
|
|
|
|