|
It's due to the fact I took off the side and Newbe's saw it and went 'whats that?'...
|
|
|
|
|
I don't know where to post this as it's not a programming question as such, but more of a question about general interest of a certain little concept i cooked up.
Background: I built a thing that can run tokenization from inside a T-SQL database (and i'd support others but i want to try to use their builtin regexes if they have them, and that requires solving a math problem I can't solve yet - Ardens theorem or the state removal method of turning a state machine into a regex). In the future I'll support them efficiently.
anyway, point is, you can "type", validate and detect different types of fields in a text stream.
so what you can do with it is create "typed CSVs" you could submit to stored procedures to update a whole mess of data in one call, with validation and rejection of invalid CSVs.
The upshot is you don't need to grant a webserver access to your DB tables - just the stored procs - that becomes practical - at least in some situations - because you can update a whole CSV** worth of data at once.
**The CSV could only be as long as either NTEXT or NVARCHAR(MAX) but whatever.
I haven't performance tested it in MSSQL - yet. However, I stand by the general algorithm as over 3 times faster than the one the Microsoft .NET Regex engine uses though running it in a database is iffy performancewise. Still, I don't think it's bad enough to be unrealistic unless you need very rapid turnaround times on your updates.
Real programmers use butterflies
|
|
|
|
|
Not me. Better to do that on the client rather than the server.
|
|
|
|
|
How do you do it in the server without giving the server access to your tables, or perhaps using SQLXML diffgrams, which frankly i think is worse?
Answer me that.
PS: I've worked in more than one environment where direct access to DB tables is literally not allowed in production, but approved stored procs could be installed and run.
Real programmers use butterflies
|
|
|
|
|
I don't see how that's an issue.
Loading CSV is a back-end process, not a user process.
In my case we have servers dedicated to running SSIS which then connect to various database servers. We do as much as we can on the SSIS servers because the database servers get bogged down.
|
|
|
|
|
I think you're not hearing me. This isn't about users. It's about overcoming a situation where you're literally not allowed to do direct access to core tables in a DB from an external server like a webserver - from anything in the "DMZ" for that matter
So what bogs down a DB server more? Running EXEC 50 times or running EXEC 1 time and processing a bunch of data in batch, even if doing so requires string ops? Furthermore, what of maintaining concurrency and doing things atomically in that situation?
I'm not sure you've ever run into the situation where you need to send multiple keys to a stored procedure even. You write as though you've never had to deal with that.
Real programmers use butterflies
|
|
|
|
|
I have always been in the back-end, never even in the DMZ.
Were I to dabble farther out, I would have a Web Service performing all of that stuff.
And still, it wouldn't deal with CSV, that would be handled by something else -- such as SSIS.
It always comes down to something has to access the tables and adding needless complexity is a Very Bad Idea.
It sounds like you are misusing Procedures, like so many who think they must.
If I recall correctly SQL Server (v6 and v7*) prior to 2000 (?) -- or maybe it was just the ODBC driver, I don't fully remember -- didn't support transactions, so using Procedures to handle multi-statement transactions was critical. But that simply is not the case anymore.
-- PTSD over converting Oracle PRO*C code to T-SQL with ODBC in the late 90s. *shudder*
What I find most irksome is that so many developers think that even single-statement transactions must be performed by Procedures and that doing so somehow magically provides "security".
* My copy of the "SQL Server 7 Developer's Guide" (1999) seems to indicate that transactions are supported by DAO and ODBCDirect, but maybe by not ODBC. In the late 90s were using SQL Server 6 on Windows-NT.
modified 11-Nov-21 12:37pm.
|
|
|
|
|
What doing that does is allow the DBA to restrict access from the servers to the database tables, so if a box in the DMZ is compromised, they have limited access to the database, and usually read only, aside from specific functions.
There's nothing magic about it.
Real programmers use butterflies
|
|
|
|
|
So far I have not had a use-case that would justify this, and I will fight hard to keep it like that.
My validation belongs in the middle tier (so whatever host your REST API or whatever). The database maintains relational integrity, null checks - that type of stuff. My web server is the ONLY system with connection to my tables. Some validation is also present client side for usability (fail fast), but then it is always duplicated in the middle tier - it needs to be as the REST API can of course be called from anything, not just your own UI.
So the only real use case I can see, is when you need to stream large datasets where the validation code need access to a lot of data already in the database - as this would be too slow in my model as the middle tier would have to do a lot of reads while validating the data being streamed through it. Or am I misunderstanding the benefit of this?
|
|
|
|
|
I agree, but wouldn't you still land the incoming data in a staging table before performing such checks and moving the data to its destination?
|
|
|
|
|
Most likely yes. And if tokens needs to be extracted I would try to do it in the middle tier before streaming the data into temp tables. But I can't rule out there are use-cases where that is just too slow - and if you are locked into a relational database this might be a solution in that case... my solution would probably be to change job
|
|
|
|
|
Have you ever worked in an environment where your servers are not allowed to do direct table access?
If so, how do you update multiple records from a stored procedure? Or even do a request for multiple results using multiple keys from a stored proc *while maintaining atomic operations?*
I've run into this situation several times, where at the very least I needed to send a delimited list of ints to a stored proc, and parse it in the proc. That can be done with splits of course, but to do it cleanly you actually need to do more than that, to check for multiple runs of whitespace and such (or commas if you're doing that)
What then? Have you ever run into that?
Real programmers use butterflies
|
|
|
|
|
Nope. Stored procedures for security was the norm back when VB applications accessed the database directly. Modern .NET or Java makes it a lot easier to write secure code than stored procedures - and if you want security you want the code to be as simple as possible.
I have run into people who questioned this (and typically their security knowledge was... well... not as up to date as one could wish), but I have never had any serious issue deploying into pretty secure environments. I am sure those environment exists as all it takes is a CTO stuck in the past. I would avoid a work environment where this is the case - and the product I work on now will not be sold to a customer that have this requirement.
|
|
|
|
|
That's good to know. It has been years since I coded for SQL professionally. I didn't realize that had changed so much.
Real programmers use butterflies
|
|
|
|
|
One thing I forgot to mention - if the rule is "no machine with direct internet access is allowed to connect to the database", then standard reverse proxy between the internet exposed endpoint and the "real" webserver will do the trick.
|
|
|
|
|
No, I've only worked for worthwhile employers.
|
|
|
|
|
Yes. The standard framework I developed and use for pretty much all my web-based apps is access via SPs only; no direct table access at all. Generally I don't need to update multiple rows but I certainly can do, though I prefer to avoid it. If there are multiple updates required then I wrap the lot in a transaction persisted across SP calls. (For one thing, if something fails the webserver then has a much easier time of working out what failed).
And yes, there are also occasions when I need to send delimited lists to an SP (typically keys to match an IN clause) but then I just use a UDF to parse it and return a table result. (And the webserver is responsible for ensuring the list is "clean" wrt whitespace etc).
Conceptually, I'm not sure I'd feel comfortable loading an unverified CSV straight into the database using generic type validation routines. Generally there's more to checking input than just checking field type, and that sort of stuff belongs in the business layer, well away from the database. It feels a bit like this is a workaround to gain direct table access by the backdoor, bypassing the strong typing that SP parameters at least partially enforce.
|
|
|
|
|
I don't disagree with what you wrote, I'd just like to clarify that "typed" is my term.
What's literally happening is each field is being regex checked for consistency, such that Integers are matched with one, strings with the other. The text is "tokenized"
Real programmers use butterflies
|
|
|
|
|
A big thank you to all those who have served or are serving.
I see even has a red poppy.
"the debugger doesn't tell me anything because this code compiles just fine" - random QA comment
"Facebook is where you tell lies to your friends. Twitter is where you tell the truth to strangers." - chriselst
"I don't drink any more... then again, I don't drink any less." - Mike Mullikins uncle
|
|
|
|
|
For a second I thought got shot, but then I have no evidence he would even bleed red.
|
|
|
|
|
Shirley would simply curve the space around him, bullets shmullets.
"the debugger doesn't tell me anything because this code compiles just fine" - random QA comment
"Facebook is where you tell lies to your friends. Twitter is where you tell the truth to strangers." - chriselst
"I don't drink any more... then again, I don't drink any less." - Mike Mullikins uncle
|
|
|
|
|
It's actually Armistice Day over here.
|
|
|
|
|
Indeed it is.
"the debugger doesn't tell me anything because this code compiles just fine" - random QA comment
"Facebook is where you tell lies to your friends. Twitter is where you tell the truth to strangers." - chriselst
"I don't drink any more... then again, I don't drink any less." - Mike Mullikins uncle
|
|
|
|
|
And Remembrance Day in Canada, and surprised that an American didn't ask why was wearing a flower!
|
|
|
|
|
Not knowing that it is Remembrance Day in Canada, and recalling that does US-centric themes from time to time, like 4th of July. That's why I noted it, but didn't ask.
"the debugger doesn't tell me anything because this code compiles just fine" - random QA comment
"Facebook is where you tell lies to your friends. Twitter is where you tell the truth to strangers." - chriselst
"I don't drink any more... then again, I don't drink any less." - Mike Mullikins uncle
modified 11-Nov-21 15:50pm.
|
|
|
|