Click here to Skip to main content
15,890,369 members

Welcome to the Lounge

   

For discussing anything related to a software developer's life but is not for programming questions. Got a programming question?

The Lounge is rated Safe For Work. If you're about to post something inappropriate for a shared office environment, then don't post it. No ads, no abuse, and no programming questions. Trolling, (political, climate, religious or whatever) will result in your account being removed.

 
GeneralRe: The joy of working at a big multinational company Pin
dandy721-Nov-21 4:13
dandy721-Nov-21 4:13 
GeneralRe: The joy of working at a big multinational company Pin
Super Lloyd1-Nov-21 14:07
Super Lloyd1-Nov-21 14:07 
GeneralRe: The joy of working at a big multinational company Pin
Stuart Dootson1-Nov-21 5:56
professionalStuart Dootson1-Nov-21 5:56 
GeneralRe: The joy of working at a big multinational company Pin
Super Lloyd1-Nov-21 14:08
Super Lloyd1-Nov-21 14:08 
GeneralRe: The joy of working at a big multinational company Pin
dandy722-Nov-21 4:32
dandy722-Nov-21 4:32 
GeneralThe Red Roses reign supreme Pin
Richard MacCutchan31-Oct-21 7:18
mveRichard MacCutchan31-Oct-21 7:18 
GeneralRe: The Red Roses reign supreme Pin
Greg Utas31-Oct-21 9:49
professionalGreg Utas31-Oct-21 9:49 
RantError handing is taking me more time than the functionality :( Pin
honey the codewitch31-Oct-21 5:16
mvahoney the codewitch31-Oct-21 5:16 
Edit: Aha! I knew there was an anti-pattern in what I was doing.

The simple solution is to modify the lexer rules, adding a new rule with a symbol ID of #ERROR (-1) and making it match EVERYTHING. Since it will be added to the state machine last every other match overrides it. This allows the state machine building code to do the heavy lifting of crafting the logic to gather error tokens.

I've actually done this before. I can't believe it didn't occur to me. I can't believe my code didn't already do it. It's one of those simple rules of lexing that I seemed to have learned at one point and then forgotten. Never again. This took me hours.

Edit 2: Aaand that didn't work because it needs to lazy match for that to be effective.

You can do lazy matching with DFAs using a little known technique. Constructing Fast Lexical Analyzers with RE/flex - Why Another Scanner Generator?[^] does it, and the author promised to release a paper on how that worked but I haven't seen anything further on it and the source code is ... creative - everything is done in constructors, just for starters so I haven't been able to make heads or tails of it.

Edit 3: I think I finally solved it without hacking anything too badly.

Edit 4: Solved it and have one of the template generators for the new code (targeting C# so far) implemented:

This is what victory looks like.

Terminal
Tokenizing: ...         /* ...a*/ baz  ... 12343 foo    123.22 bar....

AbsolutePosition: 0, AbsoluteLength: 3, Position: 0, Length: 3, SymbolId: -1, Value: ..., Line: 1, Column: 1
AbsolutePosition: 5, AbsoluteLength: 9, Position: 5, Length: 9, SymbolId: 40, Value: /* ...a*/, Line: 1, Column: 9
AbsolutePosition: 15, AbsoluteLength: 3, Position: 15, Length: 3, SymbolId: 6, Value: baz, Line: 1, Column: 19
AbsolutePosition: 20, AbsoluteLength: 3, Position: 20, Length: 3, SymbolId: -1, Value: ..., Line: 1, Column: 24
AbsolutePosition: 24, AbsoluteLength: 5, Position: 24, Length: 5, SymbolId: 3, Value: 12343, Line: 1, Column: 28
AbsolutePosition: 30, AbsoluteLength: 3, Position: 30, Length: 3, SymbolId: 6, Value: foo, Line: 1, Column: 34
AbsolutePosition: 34, AbsoluteLength: 6, Position: 34, Length: 6, SymbolId: 4, Value: 123.22, Line: 1, Column: 41
AbsolutePosition: 41, AbsoluteLength: 3, Position: 41, Length: 3, SymbolId: 6, Value: bar, Line: 1, Column: 48
AbsolutePosition: 44, AbsoluteLength: 4, Position: 44, Length: 4, SymbolId: -1, Value: ...., Line: 1, Column: 51

AbsolutePosition: 0, AbsoluteLength: 3, Position: 0, Length: 3, SymbolId: -1, Value: ..., Line: 1, Column: 1
AbsolutePosition: 5, AbsoluteLength: 9, Position: 5, Length: 9, SymbolId: 40, Value: /* ...a*/, Line: 1, Column: 9
AbsolutePosition: 15, AbsoluteLength: 3, Position: 15, Length: 3, SymbolId: 6, Value: baz, Line: 1, Column: 19
AbsolutePosition: 20, AbsoluteLength: 3, Position: 20, Length: 3, SymbolId: -1, Value: ..., Line: 1, Column: 24
AbsolutePosition: 24, AbsoluteLength: 5, Position: 24, Length: 5, SymbolId: 3, Value: 12343, Line: 1, Column: 28
AbsolutePosition: 30, AbsoluteLength: 3, Position: 30, Length: 3, SymbolId: 6, Value: foo, Line: 1, Column: 34
AbsolutePosition: 34, AbsoluteLength: 6, Position: 34, Length: 6, SymbolId: 4, Value: 123.22, Line: 1, Column: 41
AbsolutePosition: 41, AbsoluteLength: 3, Position: 41, Length: 3, SymbolId: 6, Value: bar, Line: 1, Column: 48
AbsolutePosition: 44, AbsoluteLength: 4, Position: 44, Length: 4, SymbolId: -1, Value: ...., Line: 1, Column: 51


___ snip ___
Trying to match tokens such that runs of error characters get reported as one error, rather than one error for each rejected character.

This is shockingly difficult. I've given up on it in the past, like with my Rolex project, but Reggie is to replace Rolex, among other things, and I'm not willing to ship the latest without that sorted out.

The reason it's such a big deal is multiple errors for one error result can mess with "panic mode" error recovery in parsers that are built on top of a lexer like this because it will get confused as to how many bad tokens there actually are in the text, which makes an already bad situation worse when parsing a document with errors in it.

The thing is it seems so bloody simple, but every similarly simple approach I've taken with it has fallen flat on its face.

This is getting in the way of me releasing code.
Real programmers use butterflies


modified 31-Oct-21 18:08pm.

PraiseRe: Error handing is taking me more time than the functionality :( Pin
User 1537660031-Oct-21 6:05
User 1537660031-Oct-21 6:05 
GeneralRe: Error handing is taking me more time than the functionality :( Pin
honey the codewitch31-Oct-21 6:11
mvahoney the codewitch31-Oct-21 6:11 
GeneralRe: Error handing is taking me more time than the functionality :( Pin
User 1537660031-Oct-21 6:22
User 1537660031-Oct-21 6:22 
GeneralRe: Error handing is taking me more time than the functionality :( Pin
honey the codewitch31-Oct-21 6:24
mvahoney the codewitch31-Oct-21 6:24 
GeneralRe: Error handing is taking me more time than the functionality :( Pin
Greg Utas31-Oct-21 6:21
professionalGreg Utas31-Oct-21 6:21 
GeneralRe: Error handing is taking me more time than the functionality :( Pin
honey the codewitch31-Oct-21 6:28
mvahoney the codewitch31-Oct-21 6:28 
GeneralRe: Error handing is taking me more time than the functionality :( Pin
Daniel Pfeffer31-Oct-21 6:55
professionalDaniel Pfeffer31-Oct-21 6:55 
GeneralRe: Error handing is taking me more time than the functionality :( Pin
honey the codewitch31-Oct-21 7:16
mvahoney the codewitch31-Oct-21 7:16 
GeneralRe: Error handing is taking me more time than the functionality :( Pin
Gerry Schmitz31-Oct-21 7:43
mveGerry Schmitz31-Oct-21 7:43 
GeneralRe: Error handing is taking me more time than the functionality :( Pin
ElectronProgrammer31-Oct-21 10:10
ElectronProgrammer31-Oct-21 10:10 
GeneralRe: Error handing is taking me more time than the functionality :( Pin
honey the codewitch31-Oct-21 10:59
mvahoney the codewitch31-Oct-21 10:59 
GeneralRe: Error handing is taking me more time than the functionality :( Pin
charlieg31-Oct-21 11:33
charlieg31-Oct-21 11:33 
GeneralRe: Error handing is taking me more time than the functionality :( Pin
honey the codewitch31-Oct-21 12:05
mvahoney the codewitch31-Oct-21 12:05 
GeneralRe: Error handing is taking me more time than the functionality :( Pin
Jon McKee31-Oct-21 11:33
professionalJon McKee31-Oct-21 11:33 
GeneralRe: Error handing is taking me more time than the functionality :( Pin
honey the codewitch31-Oct-21 12:01
mvahoney the codewitch31-Oct-21 12:01 
GeneralRe: Error handing is taking me more time than the functionality :( Pin
Rick York31-Oct-21 18:37
mveRick York31-Oct-21 18:37 
GeneralRe: Error handing is taking me more time than the functionality :( Pin
honey the codewitch31-Oct-21 18:38
mvahoney the codewitch31-Oct-21 18:38 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.