|
I'm trying to use regex to determine if a given stored proc body contains destructive sql clauses.
Here's my regex pattern (I want it to match if it contains a "select...from", but not match if it contains any of the destructive sql keywords shown):
"^(?=.*SELECT.*FROM)(?!.*(?:CREATE|DROP|UPDATE|INSERT|ALTER|DELETE|ATTACH|DETACH|TRUNCATE)).*$"
Here's the string I'm passing into the regex:
PROCEDURE [dbo].[GetAlertsForApp]
(
@appShortName nvarchar(16)
)
AS
BEGIN
SET NOCOUNT ON;
declare @appID int = (SELECT ID FROM dbo.Applications WITH(NOLOCK) WHERE AppShortName = @appShortName);
select [ID],[AlertName],[AlertTitle],[AlertMsg],[AlertStartDate],[AlertExpireDate],[AlertType],[AlertClassLevel]
,[AlertApps],[ActionName],[ControllerName],[DateAdded],[AddedByUMSUserID]
FROM [dbo].[Alerts] WITH(NOLOCK)
WHERE (UPPER(AlertApps) LIKE '%' + UPPER(@appShortName) + '%' OR UPPER(AlertApps) = 'ALL')
AND AlertExpireDate IS NULL OR AlertExpireDate > GETDATE() ;
END
My problem is that the regex does not result in a match. The string above does not contain any of the prohibited words, and does include a select statement. I'm confused, and maybe just a little bench blind (and not real experienced with regex)...
".45 ACP - because shooting twice is just silly" - JSOP, 2010 ----- You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010 ----- When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013
|
|
|
|
|
Try setting the RegexOptions.Multiline option to true: Without it the "$" will stop at the end of a line of input rather than the whole string.
Certainly if I set that on Expresso, it matches; without it it doesn't.
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
That doesn't always return the expected result. I did this and it works as expected:
string pattern1 = @"\b(SELECT)\b";
string pattern2 = @"\b(CREATE|DROP|UPDATE|INSERT|ALTER|DELETE|ATTACH|DETACH|TRUNCATE)\b";
Regex regex1 = new Regex(pattern1, RegexOptions.IgnoreCase);
Regex regex2 = new Regex(pattern2, RegexOptions.IgnoreCase);
bool match1 = regex1.IsMatch(query);
bool match2 = regex2.IsMatch(query);
result = match1 && !match2;
Regex hs always kinda baffled me, and I actively avoid using it unlesss it makes a lot more sense to use it.
".45 ACP - because shooting twice is just silly" - JSOP, 2010 ----- You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010 ----- When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013
|
|
|
|
|
They are powerful, but it's too easy to get carried away!
Since you have a two regex solution, I'd probably stick with that - it's a load more understandable and maintainable than the equivalent single regex would be.
But ... I'd personally use a user ID with only SELECT permissions and say "stuff it - that'll do".
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
In this particular case, I'm using that code to evaluate both a user entered query, and an existing stored proc that was pulled from the database.
It's for my entity factory article.
".45 ACP - because shooting twice is just silly" - JSOP, 2010 ----- You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010 ----- When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013
|
|
|
|
|
|
The sql contains: SELECT ... FROM ... select ... FROM
Maybe it doesn't even see it as a SELECT.
It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it.
― Confucian Analects: Rules of Confucius about his food
|
|
|
|
|
Some random comments.
1. Just noting that your database should probably provide the ability to preclude the db user from executing DDL commands, regardless of where they originate. So set up your application to only use users with appropriate permissions.
2. I suspect your regex should also use a boundary check ('\b') around the excluded word phrase.
3. If you allow dynamic SQL execution then someone could circumvent your check like: 'CR' + 'EATE'. Item 1 prevents that possibility also.
4. Allowing users to use SQL directly means you cannot protect the database absolutely. For example are users allowed to run "delete users" (no where clause)? Any protections added to protect against this can generally be circumvented.
5. Another problem with allowing user SQL is if the database contains large amounts of data users can end up writing very inefficient queries. Which can impact the entire enterprise. Not as much of a problem if each customer (and their users) are in a silo but a major problem if multiple customers exist on the same stack.
6. What about 'use'? Do you want users switching databases? Again something that 1 can prevent.
|
|
|
|
|
Stored procs are not an issue. If they're in the database, whatever is in them must be okay. The point of my app is to determine whether or not a given stored proc returns a dataset. That is all the regex is for. If a stored proc contains a DDL statement, even if it's in dynamic sql, it will be detected by the regex.
1) The regex string I'm using already checks for DDL commands.
2) I provided an amended version that I came up with (that works as intended) that in fact includes \b markers.
3) See my prelude.
4) User-provided queries MUST start with SELECT . That pretty much prevents the user from creating vars or executing dynamic sql/procs.
5) Not a concern. I modify the specified select command to start with "SELECT TOP(0)", which causes an empty data set to be returned.
6) Not a concern. For user-specified queries, I already insert a "USE" statement for the selected database.
7) Only one of your points had anything at all to do with my actual question. I guess that's kinda my fault for posting the actual regex strings I was using instead of using randomly selected words in my example.
".45 ACP - because shooting twice is just silly" - JSOP, 2010 ----- You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010 ----- When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013
|
|
|
|
|
|
sms sending free on fix price on desktop aplication
|
|
|
|
|
Your query does not make sense. Please go through the forum guidelines.
Quote: Be specific! Don't ask "can someone send me the code to create an application that does 'X'. Pinpoint exactly what it is you need help with.
Try out and ask for specific questions if you get stuck. People would try to help you unblock.
|
|
|
|
|
This is a more 'philosophical' question. As I will handle it in C#/WPF, I guess that it isn't that off topic top here:
I have this friend asking me to write software for trying out one of his pet theories, an engineering field far from mine. Essentially, I will program a simulation model for him. My task is as a coding monkey - he knows enough about software to understand what I am doing, but cannot contribute himself beyond providing formulas (which I code, but to not understand). We are close friends, that part causes little problems.
Both input data and results are essentially tabular data - time series and the like. He asks me: Can't you make both input and output as text files, so I can modify them using npp or notepad or something like that?
Even though my friend knows several (natural) languages, he never studied formal grammars. So I am scared, thinking of the input I might have to handle
I proposed: You either input interactively (mostly single-value parameters) interactively, or you supply data as an Excel file. Not like a free-format CSV file! I certainly do consider CSV a 'free-format' file if you let non-computerists in to edit it directly (and maybe even with computerists!).
My friend seemed was open to my suggestion to use Excel tables - but his version is Excel 2003... Yet, I do believe that it better to have him funnel his data through Excel 2003, so that I know it to be in that format, rather to assume that he will be able supply data in some more liberal format (such as one of the numerous variations of CSV, each with its quoting rules etc.)
Is this the right thing to do - forcing my friend to supply all input data as .xls/.xlsx spreadsheet file, and providing results in a similar format? Or is there a better alternative? That must be one that to a similar degree relieves me having to parse arbitrary input formats with arbitrary syntax errors
My friend is a 'common man', so *nix solutions are ... nix. We view the world through Windows. Then, is Excel what I should try to push as The Format for both input and output data? Or should I push him in a different direction, towards another alternative?
|
|
|
|
|
To be honest, I'd say that using Excel is probably a bit overkill - particularly the "providing results" part, given that XLS is a binary file format: Microsoft Office Excel 97-2003 Binary File Format (.xls, BIFF8)[^] and pretty easy to muck up when you are reading and generating it.
I'd suggest CSV as a "simple" input / output format that is remarkably easy for a non-computer-technical user to create by hand - remember that Excel (even 2003 version) can import and export it.
Certainly, it's easier to process (particularly if you use something like A Fast CSV Reader[^] which will do all the "donkey work" for you).
Don't complicate data entry any more than you have to: remember that Excel can include a lot of odd stuff in it's files if it wants to!
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
The question is which is easier to muck up, as seen from my point of view: If the file is produced by (some version of) Excel, I can have a certain confidence that Excel has made a file according to its own format rules.
It is not for me: I can easily handle any .xls file or CSV file, as long as it is in the proper format. The issue is 'in the proper format. That you cannot assume from non-computerese people, editing CSV files.
Assuming CSV as an I/O format assumes that the (very unqualified) user knows all the conventions of character sets, quoting, all sorts of escapes even within the quoting rules, ... If I tell that 'primitive' to read his CSV file into Excel and give tat .xls file to me, the csv syntax, character encoding and escaping and syntax are checked before it reaches me.
I sure could take one of the umpteen CSV variants as inputs. Maybe I will at some future time. But what is in it for me? Will it give me any benefit, beyond that of the ability to handle input files any crazy format that might require me to do a lot of error handling and report processing?
Wouldn't I be better off assuming that an .xls(x) file honors the .xls(x) format, so that I do not have to worry about escapes and character sets and such things?
|
|
|
|
|
trønderen wrote: It is not for me: I can easily handle any .xls file or CSV file, as long as it is in the proper format. The issue is 'in the proper format. That you cannot assume from non-computerese people, editing CSV files.
Assuming CSV as an I/O format assumes that the (very unqualified) user knows all the conventions of character sets, quoting, all sorts of escapes even within the quoting rules, ... If I tell that 'primitive' to read his CSV file into Excel and give tat .xls file to me, the csv syntax, character encoding and escaping and syntax are checked before it reaches me. I do not find your descriptions of what you think you know credible.
Until you and the other party agree on a formal spec, I believe you will waste even more time being vague.
«One day it will have to be officially admitted that what we have christened reality is an even greater illusion than the world of dreams.» Salvador Dali
|
|
|
|
|
I believe that the syntax of Excel sheets is not "vague", but rather well defined. Actually, that is why I suggest using Excel syntax. There may be even better alternatives that are suitable for a non-computer guy to supply input data. I see CSV as a significantly poorer alternative.
The semantics of each column in the the input tables are well defined. In fact, my friend supplies all the engineering formulas and other processing rules, with unambiguous references to the input sources; I just do the coding of the processing. It could be that his specification of his method is so vague that I misunderstand what to code - but that would be on the processing level, not on the input data format level.
|
|
|
|
|
Nothing about using Excel as a data-entry tool, and saving the data in a standard format, is vague
trønderen wrote: It could be that his specification of his method is so vague that I misunderstand what to code - but that would be on the processing level, not on the input data format level. Now that is vague, and the remedy is communication and interaction with the client, with the goal of making an unambiguous specification.
Your ability to philosophise about non-specifics is impressive
«One day it will have to be officially admitted that what we have christened reality is an even greater illusion than the world of dreams.» Salvador Dali
|
|
|
|
|
What on earth makes you think that Excel will error check a CSV? It'll make assumptions about the data that you will have no idea about and you can easily end up with worse data than you started with!
If you are seriously worried about handling user errors, then bite the bullet and produce your own simple data entry app which does validation for you and stores in whatever format you like - CSV, EXcel, JSON, XML, DB - it doesn't matter if you are in control of the user input as well as the later file processing.
But assuming that "Excel will make it all right" is a recipe for an upset user ...
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
I really do not want to relate to CSV at all. My friend is both producer and consumer of the data - if he messes up, he messes up for himself. Whichever way he gets data into the tables: He should not have to know about or worry about low level syntax. I want to help him avoiding problems caused by low level syntax errors. An Excel user does not manipulate data at the coding level. If he, for some reason, gets raw data from CSV format, and lets his Excel read it, I wouldn't know (or care). But he would see it immediately if the data are not read correctly, before he puts them into the program that I make for him.
The data are simple numeric tables - not huge, but more than what you can supply interactively at run time. Semantics of each column is well defined. In Excel, he will see that the input tables have the entries that he intends, and in the right places. My program will know that in the .xls file, there are no missing commas, quoting errors or other low level syntax errors. Some input is suitable for curve plotting: A plot would quickly reveal suspect input values such as one digit dropped out or a misplaced decimal point. (He is familiar with Excel, and knows how to generate a plot.)
Using Excel, he handles information at application level, in a way meaningful to him, like a printed table in a technical report. He is an engineer, but not all engineers know programming. If he must learn JSON or XML or even CSV, and have to manipulate his data sets in such formats, I am sure he would be bitching like crazy over the commas and quotes and how difficult it is to verify that the data set is correct before he it into the program. And I am sure that there would still be a lot of errors in the data sets.
Also for the results: He wants my program to present simple on-screen plots, but if he gets output a spreadsheet, he can easily plot other properties, make other statistical analyses etc.
Making my own data entry app - would it really be worth it, when I've got Excel to do the job for me? What could I get from my own data entry app that wouldn't be done much better by Excel? Most certainly as a whole, e.g. with Excel's plot facilities for easily detecting "out-of-line" input values.
I expected reactions of the kind "Excel is fine for non-programming users, but maybe an even better alternative is ...". The almost unison condemnation of .xsl grammar for input (/output) data sets comes as a great surprise to me. I see no alternate proposal that I consider viable: Telling a non-computer guy that to prepare tabular data for an application, learning JSON or XML syntax is significantly better than than typing the table into an Excel sheet, and will cause less errors and/or have other great advantages ... I don't know which arguments to use to make it appear as a better alternative. I am sure that I am not going to give my friend a JSON course to train him for giving input to my program, rather than using an Excel sheet!
|
|
|
|
|
You are confusing "Data Entry" with "Data Format" - and assuming that a generic Data Entry system (such as Excel, the application) will provide perfect data (in the form of Excel data files).
It won't. It will provide perfect Excel data, but that can have all the errors, omissions, additions, and plain mistakes that the user can make - because Excel the app does not validate anything: it just enters data into cells.
Excel is not a Data Entry system: they validate and only store "correct" information. Excel doesn't care what the user enters where and has no idea what data is "valid" and what isn't.
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
You are confusing different needs.
It may seem horrible to you, but I do my household accounting using Excel.
I do not use a Data Entry System to verify that my spending on a liter of milk, a loaf of bread or a kilogram of apples are "valid" prices for those products (except that a negative spending will stand out in red).
There may be (temporary) "omissions": Sometimes I am late in copying a receipt into the Excel, and I may discover it by the money in my billfold or bank account not matching what Excel claims there should be. Detecting it is a manual process.
I may be mistyping an amount, and overlook that the check column summing the line (i.e. all the sources - cash, account, card, ... - and the drains, which in my case is the various types of spending: food, transport, health, ...) is not zero.
Maybe I should develop an elaborate Data Entry system to validate every single amount against current product prices retrieved from a network service. Maybe it should flag as invalid my buying of a kilogram of Lapsang Suchong, because I bought another kilogram less than a week ago. Maybe it should raise an exception if I haven't bought a new toothbrush for half a year.
I am not going to do that, not even to make sure that "errors, omissions, additions, and plain mistakes" are caught. I never asked for a Household Spending Data Entry and Validator System. My friend never asked for a Data Entry and Validator System with lots of built-in domain knowledge and AI logic to detect missing or duplicated or illegal data.
He suggested a plain text file where he could edit the input data like a printed table, something resembling an CSV file (but fixed column layout rather than commas), and similar for results. I replied: "Wouldn't it be much better to do it as an Excel sheet? Then you can plot the input data. You can set up check columns to verify that values are within range. Values are nicely lined up, and do not end up in the wrong column due to a missing comma. You can have the output the same way, too, so you can use Excel to post-process them." He cheered: Great idea. Much better. The improvement was comparable to when I moved my household account from notebook and pencil to Excel.
Not at all do I take for granted that Excel is the ultimate solution for such needs. So I posted my question here, waiting for alternative proposals. You may argue that with unlimited resources, I could have a 100% perfect solution by developing a highly domain specific, AI based, dedicated Data Entry System for my household accounting, or for my friend's program. I think the cost would be far, far higher than the real benefits in practical use.
Such a 100% perfect Data Entry System may be the right answer to some questions, but not to mine.
|
|
|
|
|
(You can save an existing Excel file as a .csv)
I would just do it manually, if it's "once in a while". (Open Workbook / sheet in Excel and "Save As...")
If you want to use Interop ...
How to Convert Excel to CSV using Interop
It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it.
― Confucian Analects: Rules of Confucius about his food
|
|
|
|
|
Having banged my head against this issue for many years your ONLY really good option is to write your own data entry application. Resistance to using such a solution is monolithic, even a well written DE solution often gets rejected for Excel. He is your friend, the only user of the application, write the DE app and make him use that.
Never underestimate the power of human stupidity -
RAH
I'm old. I know stuff - JSOP
|
|
|
|
|
trønderen wrote: to parse arbitrary input formats with arbitrary syntax errors
Are you supposing that Excel would prevent that? I certainly wouldn't presume that.
Columnar data, to me, is one of the easiest formats to parse and validate. I would actually prefer text files since then I do not need to code an Excel API. When I do one shots with Excel I always just manually export it to a tab delimited file and then parse that. And validate it.
If the data is being manually entered then the volume would not be of concern so I would just load the entire data set into memory and parse it and validated it as I went. I would output errors for either parsing (like missing columns) or validation (a number that isn't a number) and would probably read the entire set putting out errors.
If there were any errors then do no more processing. One might continue though depending on the goal.
|
|
|
|
|