|
Gotcha! Thank you very much
Clean-up crew needed, grammar spill... - Nagy Vilmos
|
|
|
|
|
Let me start by saying that I'm not the greatest with Regular Expressions, but from research it seems that they are the best way to go for what I need; I just don't fully understand how to use them with matching groups.
In my application I need to parse a boolean string that can have any number of parameters. The following are some examples:
Name=John or Name=Jane
Name=John or Name=Jane and Date=Now
I need to be parse any variation of these strings ('and' and 'or' are the only boolean operators that can be present) so that I get a list such as:
Name=John
or
Name=Jane
and
Date=Now
I don't need to parse the strings with the '=' sign, but I do need the 'or' & 'and' operators so when I build my query in code I know how to connect each statement to the next (ie:Name=John should be ORed with Name=Jane which should be ANDed with Date=Now) So far I have only been able to get a list such as
Name=John
Name=Jane
Date=Now
by using the following code:
Dim Pattern As String = "(\S+\x3D\S+)"
While Regex.Matches(query, Pattern).Count > 0
Dim oMatches As MatchCollection = Regex.Matches(query, Pattern)
For Each oMatch In oMatches
Dim Operand1 = oMatch.Groups(0).Value
Next
End While
but I lose the boolean operators in the process. If anyone could please help me with a regular expression so I would get the groups I have now, but with the operators in between the appropriate expressions, it would be greatly appreciated.
Thanks.
|
|
|
|
|
I'd suggest doing the matches in phases so that you can track operator presedence: and is higher presendence than or :
In fact you probably don't need to use Regex , String.Split should be able to do the job:
First divide the input based on 'and':
Dim andSeparators() As String = {" and "}
Dim andTerms() as String
andTerms = String.Split(andSeparators, query)
Next for each of the andTerms do a similar String.Split on " or "
|
|
|
|
|
You can use the alternation syntax "(a|b)". A match in your case is an expression in form "parameter=value", or an operator. Operator can be "or" or "and" and must have a space before and after.
Following pattern produces the results you wanted. For an input string "Name=John or Name=Jane and Date=Now" you get matches "Name=John", " or ", "Name=Jane", " and ", "Date=Now":
"(\w+=\w+|\s(or|and)\s)"
If you want to use regex for validation only, you can do it this way (note that you get only a single match with this pattern):
Regex.IsMatch(inputStr, "^(\w+=\w+|\s(or|and)\s)+$");
And you can go even further with validation. Following pattern uses positive look ahead/behind syntax to ensure that operators are enclosed with valid expressions:
Regex.IsMatch(inputStr, "^(\w+=\w+|(?<=\w+=\w+)\s(or|and)\s(?=\w+=\w+))+$");
Gabriel Szabo
modified 8-Nov-13 3:57am.
|
|
|
|
|
Given this input pattern \[.+" and this string w2rddddd["oQookkkkkk"]rrrrrrrrrrr the pattern returns w2rddddd["oQookkkkkk"]rrrrrrrrrrr, that is, up to and including the second double-quote. What I'd like to return is up to and including the first double-quote like w2rddddd["oQookkkkkk"]rrrrrrrrrrr
Clearly my pattern is incorrect but I can't see what I'm doing wrong. Anybody see it?
If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.
|
|
|
|
|
I get ["oQookkkkkk" when I try that, which is what I expected. Maybe you want .+\[" ?
|
|
|
|
|
Oops. My bad...
I screwed up my original question. What I meant was, given
yakkety [ "123"]
I want to get
yakkety [ "123"]
That is, everything from the opening square bracket up to and including the first double quote only. There could be any amount of white space between the bracket and the quote. That's the only bit I'm after: square bracket through double quote. I can't work out the pattern that will give me just that.
If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.
|
|
|
|
|
How about \[.*?"
\s rather than the dot.
|
|
|
|
|
Thanks, that worked just perfick!
If the solution had been a snake it could have bitten me. It was so obvious it slid straight past me. (slaps self in face).
If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.
|
|
|
|
|
|
Just to complete things, I also wanted to pick up the other bit of text at the other end. That is, given:
yapyapyap ["fred"] moreyapyap I want to get the closing double quote through the closing square bracket whether there's white space between them or not. I did that using alternation as in this regular expression:
("]|"\s*])
It probably doesn't teach anyone anything they don't know already but I thought it would be worth a mention if only to complete what I need to do.
If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.
|
|
|
|
|
Given an input string like this:
Some text containing % complete and not much else
If I use a regex of \bcomplete\b I can find the whole word complete just fine. What I want to do is find % complete. I've tried \b% complete\b and \b\% complete\b and \b\x25 complete\b and other variations I can think of but I can never get it to select % complete. Does anyone know if it's possible to do it and how? I can find nothing anywhere that says % cannot participate in an expression.
I've tried two different apps, such as RegexBuilder and RegexBuddy but I can't get it working in either. Does anyone have any ideas? Thanks.
If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.
|
|
|
|
|
The % character won't work with the \b anchor. From MSDN:
The \b anchor specifies that the match must occur on a boundary between a word character (the \w language element) and a non-word character (the \W language element). Word characters consist of alphanumeric characters and underscores; a non-word character is any character that is not alphanumeric or an underscore. The match may also occur on a word boundary at the beginning or end of the string.
If you're using .NET, you could try something like:
((?<=\W)|^)%\s+\bcomplete\b
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Richard, I will try your example. I did though find the answer and indeed, the % has to be outside the \b symbols.
So, %~\bcomplete\b worked but \b%~complete\b did not. I use the tilde ~ to illustrate where a space character would be but it's not part of the expression itself.
If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.
|
|
|
|
|
Have you tried:
%\scomplete
The universe is composed of electrons, neutrons, protons and......morons. (ThePhantomUpvoter)
|
|
|
|
|
OG, that works partly fine but it will also detect % completehorses whereas I must only find the whole word % complete only.
If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.
|
|
|
|
|
Sorry!
Did you try combining it with your solution?
%\s\bcomplete\b
The universe is composed of electrons, neutrons, protons and......morons. (ThePhantomUpvoter)
|
|
|
|
|
OG, the background is I've been assigned the wonderful job of localising the strings in our app. In total, we have about 23,000 split over several projects. I'm using an extractor tool called lingobit and it lets you put in filters to eliminate specific strings. Once I've harvested the strings and the regex format you run the scan and it discards all strings that match the regex. In theory, you are then left with the strings to be localised and it will assign names to them and change the code while creating the resx file. There are a number of projects to be converted. Once I've eliminated the strings we don't want to convert I can then reuse the regex pattern on another project and add any additional patterns as I go along.
In effect then, the regex string is input to a third-party app. It just takes a while to harvest all the things that musn't be converted.
If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.
|
|
|
|
|
By the way, the thing I found most terrible when doing globalization/localization was string concatenation. I.e. something like
label1.Text = "Progress: " + n + " % complete";
instead of
label1.Text = string.Format("Progress: {0} % complete", n);
The translators will need the whole phrase for a useful translation, not single words, I had to change a few hundred such concatenations.
|
|
|
|
|
F*** off! Tell me it ain't so, Bernard.
Seriously, I have the same problem. There are few string.Format calls and 98% is concatenation. The context of the strings is the problem. I mean, there are sql statements containing strings like " and " and elsewhere there are messages boxes with " and ". There was one bastard of a string set to "Enter the " + something else and further on was something like reply.StartsWith("Enter the credit"). What a***hole would produce such awful stuff? And no, it wasn't me.
If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.
|
|
|
|
|
"Nice" to hear that those problems are so common.
Apart from SQL statements, we excluded the log messages from localization - they' be more than the "normal" texts. That meant quite some extra work. That's a point to consider - or do you want to analyse log files in e.g. Hungarian?
|
|
|
|
|
Bernhard Hiller wrote: or do you want to analyse log files in e.g. Hungarian?
We're starting with Italian and trying to restrict the text to what the users would expect or want to see. It's tricky trying to identify the context of a string so if later end up writing Simplified Chinese to a log file then so be it. I don't think our users would want to see a message box containing an sql statement. Oh wait (slapping forehead) they already do get some of those.
If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.
|
|
|
|
|
Programming is one industry where everyone is constantly attempting to specialize in generalizing!!!!
|
|
|
|
|
BTW: Get a copy of Expresso [^] - it's free, and it examines, explains and generates Regular expressions. I wish I'd written it!
The universe is composed of electrons, neutrons, protons and......morons. (ThePhantomUpvoter)
|
|
|
|
|
OG, thanks for the heads-up on the app. I already use RegexBuilder and RegexBuddy but you can't have too many regex tools in your toolbox. They bring different toys to the party and they're all helpful. I will download Expresso post-haste.
Update: Downloaded and quickly tried it out and the design mode tab looks very promising indeed.
If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.
|
|
|
|