|
I'm dying to make this update to Reggie. It's worth at least two articles - one for the SQL targeting alone.
It's just this bloody error handling, and then backporting any changes I make to this C# code to the relevant templates used to generate it (both for it and for SQL, and there are two implementations for each target - one for tables and one for compiled - meaning 2x2 = 4 different places i need to alter the code templates)
Even with my tools to ease maintenance this project is getting a little too big for me.
Just wait til i make your RDBMS normalize structured text like JSON or XML or submitted to stored procedures. Parsing's coming text, once I have a good tokenizer. The parser's called Norm, because it's my data "normalizer"
Eventually I intend to target JS, C++, python, PHP and maybe Java or something, but it depends if I can get any help.
Reggie and Norm will make triple tier validation for all kinds of content possible, and then also so much more than that.
Real programmers use butterflies
modified 31-Oct-21 13:19pm.
|
|
|
|
|
Python, eh?
Keep me posted.
cheers
Chris Maunder
|
|
|
|
|
Oh you know I will. I don't even like python but a lot of people do so I figure it's probably worth targeting, so it's worth teaching myself a little more of it - right now I can read it but not write it.
I think in the end what I want is something you can use to generate validation code for any kind of middleware platform, as well as front ends and back ends.
Real programmers use butterflies
|
|
|
|
|
So, um, probably a dumb question. Why not use SQL's built in regex capability?
SELECT * FROM #Sample WHERE Field LIKE '%[^a-z0-9 .]%'
|
|
|
|
|
It's not full regex, and works very badly, plus its practical unicode support (such as having character classes for letters and numbers) is dodgy if it even has it at all.
It's not really regex. LIKE is simple pattern matching closer to glorified dos filename wildcards than anything regex-like - though I haven't checked if they've improved on it since say SQL2000
Also with this tool, you can have a single spec file that has the same regex's for your C# and SQL code (also potentially other targets like JS)
Real programmers use butterflies
|
|
|
|
|
|
Can it return a table? Can it return XML?
I have a few Regex CLR functions, from a simple IsMatch to one which is table-valued and returns all the groups and such.
|
|
|
|
|
The procedure returns resultsets based on matches. You can insert those into a table With INSERT INTO ... EXEC / You can do tokenization too, which returns tokens, that would give you something like your "groups" where it can distinguish between an int field and a literal string for example. IsXXXX indicates that the entire target field matches the expression. MatchXXXX finds all occurrences. Tokenize tokenizes (returning all tokens) - you typically use that for parsing.
Since the data is flat, there's no sense in returning XML. With a big *however*
When I build out Norm, my latest parser generator that can also target SQL, it will return hierarchical resultsets in the format consumable by SQLXML (constructed with OUTER JOIN s, special column names, and a parent id to impose a hierarchy so you can get full parse tables as XML. Parse JSON from your database if you like. This will do it, and without using any particularly fancy features. It's "flavored" for MSSQL right now but the templates can be extended easily to support MySQL, Postgre and Oracle since I've been careful to avoid most of the extended database features.
Real programmers use butterflies
|
|
|
|
|
Here's an example:
SELECT * FROM Library.dbo.RegEx ( 'foo=bar fizz=buzz' , '(?''Name''\w+)=(?''Value''\w+)' )
Index Match Groups
0 foo=bar <Groups><Group Name="0" Success="True" Offset="0" Length="7">foo=bar</Group><Group Name="Name" Success="True" Offset="0" Length="3">foo</Group><Group Name="Value" Success="True" Offset="4" Length="3">bar</Group></Groups>
1 fizz=buzz <Groups><Group Name="0" Success="True" Offset="8" Length="9">fizz=buzz</Group><Group Name="Name" Success="True" Offset="8" Length="4">fizz</Group><Group Name="Value" Success="True" Offset="13" Length="4">buzz</Group></Groups>
modified 30-Oct-21 21:42pm.
|
|
|
|
|
Yeah, that's poor man's tokenizing. To be honest, .NET's regex engine is crap at it, just because they didn't do the minor work necessary to implement the feature (it's a slight "hack" or rather "twist" on a a|b|c such that each expression a,b, and c has a symbol id associated with it, and the engine will tell you which it matched. It goes through the text beginning to end, reporting all matches like that, one row in the table for each match..
Your input spec might look something like this (.rl format)
VerbatimStringLiteral= '@"([^"]|"")*"'
StringLiteral='"([^"]|\\.)*"'
CharacterLiteral= '[\']([^\']|\\.)([\'])'
IntegerLiteral= '(0x[0-9A-Fa-f]{1,16}|([0-9]+))([Uu][Ll]?|[Ll][Uu]?)?'
FloatLiteral= '(([0-9]+)(\.[0-9]+)?([Ee][+-]?[0-9]+)?[DdMmFf]?)|((\.[0-9]+)([Ee][+-]?[0-9]+)?[DdMmFf]?)'
// the following takes a long time to generate
//Keyword = 'abstract|as|base|bool|break|byte|case|catch|char|checked|class|const|continue|decimal|default|delegate|do|double|else|enum|event|explicit|extern|false|finally|fixed|float|for|foreach|goto|if|implicit|in|int|interface|internal|is|lock|long|namespace|new|null|object|operator|out|override|params|private|protected|public|readonly|ref|return|sbyte|sealed|short|sizeof|stackalloc|static|string|struct|switch|this|throw|true|try|typeof|uint|ulong|unchecked|unsafe|ushort|using|virtual|void|volatile|while'
Whitespace<hidden>='[\t\r\n\v\f ]+'
Identifier='[_[:IsLetter:]][_[:IsLetterOrDigit:]]*'
CommentBlock<id=40,blockEnd="*/">="/*"
//Bar="bar"
Forgive the word wrapping but it's a line based grammar.
So if you tokenize something something by calling Tokenize you get back a row for each match and what Symbol it was plus where it was in the document and its actual value (like CommentBlock at position 3, value "/* bar */") and you get many of those for a potential string.
Real programmers use butterflies
|
|
|
|
|
I am working on a Kotlin app that needs to load a large file at startup. Because it can take up to 20 seconds it must be done in a separate thread so it won't block the UI thread. When it is loaded I need to send a signal to the main UI thread, so that the main thread will know the file is ready for further processing.
Being a Kotlin greenhorn, I battled for hours to send a signal from the background thread to the main thread, with no luck. Then I remembered reading something about a built-in Kotlin method :
runOnUiThread() . "Surely it cannot be that simple", I thought. But in desperation I tried adding this statement to the end of the background thread:
runOnUiThread { someMethodOnUiThread() } And voila! it worked. Apparently this little gem inserts the requested method into the execution queue of the UI thread.
Sometimes we look for complicated solutions when a perfect simple solution is right under our noses.
By the way: Please don't tell me I should have used a Kotlin coroutine. I tried that, but when you use a file reading class like ObjectInputStream the coroutine starts blocking the UI thread.
Get me coffee and no one gets hurt!
modified 30-Oct-21 15:09pm.
|
|
|
|
|
I thought that's what postMessage does - posts a message onto the UI thread.
Window.postMessage() - Web APIs | MDN[^]
But then again, I don't write apps in Kotlin so I needed something more native, using the Worker class Using Web Workers - Web APIs | MDN[^] and various machinations to deal with the fact that I didn't want external files for the source for the worker.
|
|
|
|
|
Thanks. I may just try that for fun!
Get me coffee and no one gets hurt!
|
|
|
|
|
Quote: coroutine starts blocking the UI thread I believe that you can set the dispatcher context of the coroutine thread so that it runs on a thread other than the UI thread.
Dispatchers in Kotlin Coroutines - GeeksforGeeks[^]
That said if your solution works then that's fine as far as I am concerned(I am from the school of software needs to work rather than it needs to be beautifully architected but not work) - Kotlin still seems to have a few quirks with coroutines.
“That which can be asserted without evidence, can be dismissed without evidence.”
― Christopher Hitchens
|
|
|
|
|
My experience with coroutines is that they run in non-blocking mode, until you start doing disk io. Then they block the UI thread. I have also seen a number of posts confirming this behavior. I think it was on Stackoverflow.
Get me coffee and no one gets hurt!
|
|
|
|
|
go with what works.
I would have thought that disk IO would occupy the IO thread but as you hint coroutines can be a bit weird.
“That which can be asserted without evidence, can be dismissed without evidence.”
― Christopher Hitchens
|
|
|
|
|
*sigh* - (half serious here) what even is the point of UTF16? You still have the possibility of encountering surrogate characters, which means if you want to support unicode streams you have to handle that possibility.
MSSQL supports UTF16, but not UTF32
Consequently, here's me fetching the next character of an NVARCHAR(x) or NTEXT stream, with UTF32 support. Forgive the grotty code, as there is no unsigned data types and no bit shifts, etc
DECLARE @valueEnd INT = DATALENGTH(@value)/2+1
DECLARE @index INT = 1
DECLARE @ch BIGINT
DECLARE @tch BIGINT
...
SET @ch = UNICODE(SUBSTRING(@value,@index,1))
SET @tch = @ch - 0xd800
IF @tch < 0 SET @tch = @tch + 2147483648
IF @tch < 2048
BEGIN
SET @ch = @ch * 1024
SET @index = @index + 1
IF @index >= @valueEnd RETURN -1
SET @ch = @ch + UNICODE(SUBSTRING(@value,@index,1)) - 0x35fdc00
END
This is hateful and slow. I haven't even tested it yet. I know it doesn't gracefully handle any and all invalid unicode streams but to make it do that is even worse. To heck with this.
Why UTF16 with no functions to convert a surrogate pair to UTF32? it's ridic.
Someone asked me the other day why I don't like Microsoft SQL Server. Here's reason #1359
Bad MSSQL! BAD!
Real programmers use butterflies
|
|
|
|
|
Yep it is something like a compromise. UTF16 is usually enough but we also need to do the unusual
Surrogate characters is only one beast.
Expressing accented characters e.g. 'é' in different ways is one other. Ok, here at least we have the possibility to normalize it (https://stackoverflow.com/questions/7811976/normalize-unicode-string-in-sql-server).
modified 27-Nov-21 21:01pm.
|
|
|
|
|
Isn't an issue for me, as this is a sqlized regex engine.
All matching conditions are specified using UTF32 stored as BigInt values (since SQL has no unsigned ints)
All incoming characters are resolved using the code I posted.
Accent marks will be matched properly.
Displaying them is firmly in "Somebody else's problem" territory.
Real programmers use butterflies
|
|
|
|
|
UTF-16 is the result of developers underestimating how many characters would be needed for a universal encoding and lingers like the stench of a soiled diaper because MS rushed it into production in the 90s before anyone realized it was a mistake.
Did you ever see history portrayed as an old man with a wise brow and pulseless heart, weighing all things in the balance of reason?
Is not rather the genius of history like an eternal, imploring maiden, full of fire, with a burning heart and flaming soul, humanly warm and humanly beautiful?
--Zachris Topelius
|
|
|
|
|
Just a vent. 30 minutes plus wasted, my guess is an MS update, although I didn't think they updated on Fridays.
Wacom tablet (old) was working just fine this morning. Doing a lot of work in Blender. 30 minutes ago (+) I went to use it and the pen totally stopped working in Blender after starting it from scratch. But the pen worked as usual on the desktop, so it was communicating with the OS.
It was one of those WTH??? puzzles. I feel sorry for anyone who faces such a situation who is not a good troubleshooter. I feel sorry for myself, because of the headache!
Clicked around and deleted and reinstalled the Blender options in the Wacom program. Didn't work. Finally found that Wacom had released an update for their drivers. So out of curiosity and prayer, downloaded it and installed. Everything started working again.
Why would the OS stop sending signals to specific programs when a third party updates their driver? Did MS send some update that triggered the situation, 'cause unless Wacom has a time bomb in their code, nothing else makes sense. (Unless I have a virus, but my antivirus says no, and it stared working with the updated drivers.)
Grrrr..... F whoever is responsible for such brain-dead decisions, or coding!
PS - anyone know why my laptop monitor brightness would change when Blender becomes active? I've got a 24" monitor set up as the primary screen, and when I open and close Blender the laptop brightness changes (not the external screen). I've disabled Game Mode, and also killed the Dell display program, so I don't know what else it could be. Thought the two items were related, but they don't appear to be.
modified 29-Oct-21 20:31pm.
|
|
|
|
|
Hi,
David O'Neil wrote: Why would the OS stop sending signals to specific programs when a third party updates their driver?
Blender is using DirectX which probably means that it's using Raw Input[^] and opening your HID devices with exclusive device access mode[^]. Exclusive access will fail if another program already has exclusive access. Most DirectX software (such as video games) do this.
David O'Neil wrote: PS - anyone know why my laptop monitor brightness would change when Blender becomes active? Just a guess here... but Blender could be setting the ICC color profile[^] and doing calibration. It could also be a third-party library doing that, I think Blender uses an open source library OpenColorIO [^] for color management. Maybe you can ask about it on their Github issue forum.
The events that occur on your workstation aren't always caused by an operating system update.
Best Wishes,
-David Delaune
|
|
|
|
|
Randor wrote: The events that occur on your workstation aren't always caused by an operating system update. I did not update Blender, the Wacom drivers, or do a manual Windows update, but Blender's response changed. The only realistic option I see is that a Windows update changed how it interacted with Blender. It may be related to the Raw Input you pointed out, but it only started working again after I updated Wacom's drivers. Unless Wacom has a time bomb check in their drivers that stop certain interactions under certain conditions.
Randor wrote: Just a guess here... but Blender could be setting the ICC color profile[^] You could very well be right.
Thanks for the input, and have a great weekend!
|
|
|
|
|
PS - I even rebooted Windows and tried running Blender only, and the shenanigans happened.
|
|
|
|
|
David O'Neil wrote: Grrrr..... F whoever is responsible for such brain-dead decisions, or coding! (Putting on my Mickey Mouse hat) M-I-C-K-E-Y-S-O-F-T
I have lived with several Zen masters - all of them were cats.
His last invention was an evil Lasagna. It didn't kill anyone, and it actually tasted pretty good.
|
|
|
|