|
|
It depends on what you are doing. For column validation in most cases you wouldn't need it.
However, if you intended to say, store chunks of JSON in your database (a surprisingly common practice these days) you could use a tokenizer (along with a parser generated by Norm targeting SQL) to validate or even normalize the JSON content so that invalid JSON does not get stored in the database but rather is rejected.
Typically this is done in the middleware, but such validation should in theory (where possible) take place in each of the "three tiers", which would include the database, for maximum assurance of data integrity..
Another use for it might be to allow you to search long structured text for particular things, like search an HTML document for all <B> tags and yield those as a table, which could be useful for things like CMS systems.
Real programmers use butterflies
|
|
|
|
|
Do. Not. Do. String manipulation. In. SQL.
Having to do so is a sign that you have done something wrong upstream.
You know this.
|
|
|
|
|
There's a very legitimate use case for it when you want to validate denormalized string fields like phone numbers if they're not stored numerically just as an example - in practice databases are messy in the real world often with fields that are denormalized lists of ints, json data, xml, etc.
Doing string manipulation in a database function or stored procedure allows you to reject data that comes in in an invalid format before it gets into the database.
Particularly these days, it's popular to store denormalized JSON in the database, for better or worse.
With Norm and Reggie, you can target parsing code to SQL that will allow you to validate or normalize your string data at the database level.
The reason is not for performance, but for security and data integrity. You wouldn't want to run something like this as part of a query. You primarily use it on data update where performance can be less of an issue depending on the scenario.
Real programmers use butterflies
|
|
|
|
|
But you should do that up-front, before it hits the database. Or, more likely, when populating a landing/staging table. With what I'm doing now (mostly ETL), we land the data and then go through doing stuff like that, making IP addresses binary, splitting FQDNs into their parts, checking dates, etc.
One of the things I challenge you to do in SQL is take big-endian DNS names and reformat them as little-endian FQDNs. Examples:
.com.acme.southwest.anvils -- anvils.southwest.acme.com.
.com.widgetco.sales -- sales.widgetco.com.
This is something I need to do and I use a CLR function to do it. It's more complex than these examples.
(This is not a programming question.)
|
|
|
|
|
Of course you "should" - there are a lot of things that people "should" and "shouldn't" do that don't actually happen the way they "should" in the real world.
The result is almost always a database that accepts denormalized data for some of its content.
Over a long enough timeline / large enough project this is an inevitability.
That means in an enterprise sized application of any significant age, you're dealing with denormalized data.
Furthermore, in some cases, with an RDBMS it is more efficient to accept denormalized data and normalize it in a stored proc to get results based on that. A good example is passing an array of integer cache ids so you only get particular rows from the database (such as updating a stale list) - there is simply no good (standard SQL) way to pass a small array of ints to the database, so 9 times out of 10, you'll see a procedure like this that takes a string or even a varbinary that has the data encoded it in. This is far more efficient than updating some kind of local session table you use to talk to your routines with (the other way to get a list of rows by ids)
With a tool like this you can harden that, and the bottleneck is network, not DB CPU, and not row locks (unless you're using the table based version, which I actually don't recommend)
I'm not sure about reformatting your domain names. They're easy enough to parse, but I don't know enough about the allowable formats to discern the logic for it without delving into RFCs.
Real programmers use butterflies
|
|
|
|
|
PIEBALDconsult wrote: But you should do that up-front, before it hits the database. True, we should. But in the real world, this doesn't necessarily happen. Like when dealing with a legacy Oracle system written in C++ (appears to be VS97 or VS6), where the DB design defies rational logic. Or any logic.
My queries have as many as 20 nested REGEXP_REPLACE statements to make sense out of the mess that some fields present. Ugly as sin, but it gets the job done. I'm counting the days until the replacement system goes online!
|
|
|
|
|
honey the codewitch wrote: There's a very legitimate use case for it when you want to validate denormalized string fields like phone numbers if they're not stored numerically just as an example - in practice databases are messy in the real world often with fields that are denormalized lists of ints, json data, xml, etc. In the real world, normalization means something different. I go to BCNF.
honey the codewitch wrote: Doing string manipulation in a database function or stored procedure allows you to reject data that comes in in an invalid format before it gets into the database. Means you storing it wrong. There's no format in storage.
"Normalization" might be different from your idea. Go look it up
--edit
I didn't see your name; I didn't intend to challenge you. Yet, explain me, if that is normal (ization) to you?
Bastard Programmer from Hell
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
According to a cursory google search:
> Normalization or normalisation refers to a process that makes something more normal or regular.
That's what I'm referring to.
Normal, as in regular, as in consistently structured, as in i can look impose a structure on it.
Further googling it sounds like there's a technical use for it that has to do with databases specifically, and how to impose some sort of notion of a "standard" order over the mess that is an RDBMS. In practice I can see why nobody cares.
Which is probably also why I don't care about that, as nobody has ever paid me to care about it. If I can star cluster my DB and get it to perform, nobody cares about BCNF, at least they never have in my 20+ years of development.
Real programmers use butterflies
|
|
|
|
|
honey the codewitch wrote: Which is probably also why I don't care about that, as nobody has ever paid me to care about it. If I can star cluster my DB and get it to perform, nobody cares about BCNF, at least they never have in my 20+ years of development. The fact that you had to look it up says enough
And, well, I do.
Bastard Programmer from Hell
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
If someone pays me to care about it I will. I promise.
Real programmers use butterflies
|
|
|
|
|
Avoid using user-defined sql functions in a sql query. It absolutly destroys the performance of the query. Just get the data and return it to the calling app, and let the app do the validation/correction (C#/C++ is much better for that than is SQL). If you want to get really fancy, you can perform the validation, and re-save the corrected data so that the next time you do the query, it will return fewer invalid records.
".45 ACP - because shooting twice is just silly" - JSOP, 2010 ----- You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010 ----- When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013
|
|
|
|
|
I didn't implement them as functions, but I could have. They are stored procedures in order to discourage their use in views and such.
This isn't for performance. This is for data integrity. Often those two things are somewhat at odds, meaning there are many situations where you have to rob Peter to pay Paul.
Real programmers use butterflies
|
|
|
|
|
I don't know SQL but loved your rant nevertheless. And a too.
|
|
|
|
|
Greg Utas wrote: I don't know SQL
You're a fortunate soul. I've worn many hats in my development lifetime. "DBA" was one of my least favorite.
Real programmers use butterflies
|
|
|
|
|
I think you just outed where in the US you're originally from with that title. Ssshhh, don't leak our secret
|
|
|
|
|
If you're gauging based on the title, I'm probably not where you think I'm from but it means what you think it does. I also use various britishism's like "whilst"
I steal vernacular from wherever. I'm a thief like that.
Real programmers use butterflies
|
|
|
|
|
It's a shame you're targeting MS SQL, it is weak in many areas compared to other SQL offerings. Postgres supports regex natively, as well as having a bunch more capability with string manipulation functions.
Is there an option for you to use .Net assemblies in MS SQL, or perhaps the R or Python environments? It seems MS is ignoring extending the base SQL capability, but offering these extensions.
Regardless, kudos for the tenacity required to do what your doing.
How are you finding the performance of the regex proc's in SQL?
Also, remember SQL is designed way back, before syntax highlighting/intellisense and many other modern expectations that would influence its design were around.
It's time to create SQLNext, and I've seen the odd concept put forward, but it would be really hard to shift the masses now.
Perhaps it'd best be done with a translation from SQLNext to the various SQL variants.
But ideally would have to exist closer to the database than the client application and eventually become a native database feature. Unfortunately, would be easiest to place in the client application - I'd hate to get into any DB Driver level stuff. Remember ODBC - yuck.
|
|
|
|
|
I intend to target others. I had to start somewhere, and I already had MSSQL, it's fairly popular, I'm familiar with it, and its many limitations, which actually made it a good place to start.
After all, I stuck as close to the SQL spec as a could, so I can port it to other databases, and I will be adding some sort of "flavor" switch to the app to specify which SQL to target. It's impossible to target SQL92 for this (i think) or I would.
Real programmers use butterflies
|
|
|
|
|
I didn't even respond to the rest of your questions! Whoops, let me try again.
I haven't tested the performance specifically. I wasn't impressed with the execution plan for the table version though. The compiled ones should perform "okay" once MSSQL gets done putting them into native code, but it won't be near as fast as outside of a DB.
I wouldn't use .NET's engine because it's not available everywhere. With Reggie you write the spec once, and it runs identically across any supported target. That's important for business schemas or even just application behavior consistency, if you're doing like, triple tier validation for example. Also I don't think you can use the CLR in the database without turning it on explicitly, and I could see a lot of people (if not most) leaving it off.
The takeaway here is Reggie will probably run where a .NET solution wouldn't, and it won't perform as well in the DB, unless you take the C# code reggie generates and run *that* in the database, which i suppose is doable as well.
So for a DB, this is best used for validation during updates that aren't intended to be super fast, full metal writes, though you can probably use this modestly in such a scenario for normalizing snatches of JSON and such.
I think I may solve my SQL problems with having structured chunks of sql that can render to multiple targets (like "advance the input position and move along a transition, with capture")
I've never heard of SQLNext.
Real programmers use butterflies
|
|
|
|
|
NeverJustHere wrote: It's a shame you're targeting MS SQL, it is weak in many areas compared to other SQL offerings. Postgres supports regex natively, as well as having a bunch more capability with string manipulation functions.
MS SQL supports that too.
NeverJustHere wrote: Is there an option for you to use .Net assemblies in MS SQL, or perhaps the R or Python environments? It seems MS is ignoring extending the base SQL capability, but offering these extensions.
Yes, you can do that (CLR), but local security restrictions may prevent you from doing so (like us - we can't do CLR because the security nazis said we couldn't).
NeverJustHere wrote: Also, remember SQL is designed way back, before syntax highlighting/intellisense and many other modern expectations that would influence its design were around.
Syntax highlighting is implemented by the editor, not the language. Syntax highlighting as been available in SSMS since at least 2007.
ANSI SQL has been around for years, but as corporate implementations are wont to have, extensions and variations to the language creep in (hence the "almost compatible" nature of many common functions - t's kinda like browsers). SQL variants are also updated at different times, so they tend to leap-frog each other in capabilities.
There's nothing really wrong with any of the SQL variants. You may have your preferences, but in the end, you work in the one that is responsible for your paycheck.
I think there are more important things to be concerned about - like why can't we use c# as a scripting language instead of javascript in ASP.NET web pages?
".45 ACP - because shooting twice is just silly" - JSOP, 2010 ----- You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010 ----- When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013
|
|
|
|
|
Quote: Syntax highlighting is implemented by the editor, not the language. Syntax highlighting as been available in SSMS since at least 2007.
Yep, syntax highlighting is fine. It's intellisense where it gets messed up. Select can't list relevant columns because the referenced tables haven't been entered yet. It's why LINQ is the other way around. FROM .... SELECT ...
Quote: like why can't we use c# as a scripting language instead of javascript in ASP.NET web pages?
You can - it's just called Blazor rather than ASP. Just starting exploring it myself.
|
|
|
|
|
NeverJustHere wrote: It's intellisense where it gets messed up. Select can't list relevant columns because the referenced tables haven't been entered yet. It's why LINQ is the other way around. FROM .... SELECT ...
That's not SSMS, it's the nature of the language. They can't do SQL radically different from the ANSI standard because nobody would use the product. Yeah, it's annoying, but you learn to deal with it, like with any other tool.
I personally think that SQL does stuff backwards from the way your typical developer thinks about coding. It's unfortunate that a lot of DBAs are not really qualified to write SQL code - and I've actually been told by a DBA (when asked about a SQL Server-specific technique) that he "wasn't a developer and couldn't help me". As a result, developers are forced into the SQL dev role as a result.
".45 ACP - because shooting twice is just silly" - JSOP, 2010 ----- You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010 ----- When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013
|
|
|
|
|
honey the codewitch wrote: It's impossible to remember it without using it every day
I USE SQL every day a week (Ok 5 out of 7)... Still there are things I have to look-up...
"The only place where Success comes before Work is in the dictionary." Vidal Sassoon, 1928 - 2012
|
|
|
|
|
honey the codewitch wrote: The syntax is inexplicable
honey the codewitch wrote: It's impossible to remember it without using it every day, As a long time web developer I feel that's true for most all languages. As a web developer I do html, css, javascript, jquery, xml, json, c#, and t-sql. I often have to google for syntax. But SQL is the one I probably look up least often.
|
|
|
|
|