|
Then there's either a bug with the regular expression engine you're using, or there's something wrong with your code.
[^\s]+ explicitly excludes any spaces. That pattern will match the first run of characters which are not spaces to appear after the prefix "Name is: ".
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Hi,
I'm using Notepad++ to do some Regex replacements and have come up against a problem I can't solve.
I'm looking to match a region between two fixed strings inclusive (called <startstring> and <endstring> for the sake of example). The problem I have is these strings appear multiple times throughout, but I want to match EACH instance, rather than one match from the very first <startstring> to the last <endstring>. They may or may not be over many lines. For example:
<startstring>
...text body...
<endstring>
...other text...
<startstring>
...text body...
<endstring>
...other text...
<startstring>
...text body...
<endstring>
I need my Regex to make 3 separate matches, rather than 1 big match which includes all the ..other text... which I need to be left intact.
The above needs to work irrespective of whether the <startstring> and <endstring> are on the same line or many lines apart.
Thanks in advance.
|
|
|
|
|
Try:
public static Regex regex = new Regex(
"\\<startstring\\>.*?\\<endstring\\>",
RegexOptions.Singleline
| RegexOptions.CultureInvariant
| RegexOptions.Compiled
);
...
MatchCollection ms = regex.Matches(InputText);
Sent from my Amstrad PC 1640
Never throw anything away, Griff
Bad command or file name. Bad, bad command! Sit! Stay! Staaaay...
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
Thanks, that's led me to the correct solution for all eventualities
|
|
|
|
|
You're welcome!
Sent from my Amstrad PC 1640
Never throw anything away, Griff
Bad command or file name. Bad, bad command! Sit! Stay! Staaaay...
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
Hello, I am new to using Regex and hope someone can help with my expresion.
I am using VB.net Regex.Match() to extract info.
I cannot figure out how to include the text at the end of a line in my expression.
I'm using Expresso for testing my expression.
Expression without double quotes:
"(.*?)(\\~zs\\~ls)(.*?)(\\~le\\~ks)(.*?)(\\~ke\\~hs)(.*?)(\\~he\\~ze)(.*?)"
Source Data1:
test \~zs\~lsMt+22:21\~le\~ksMt+22:211\~ke\~hs\~he\~ze end of line
Returns:
[test \~zs\~lsMt+22:21\~le\~ksMt+22:211\~ke\~hs\~he\~ze]
1: [test ]
2: [\~zs\~ls]
3: [Mt+22:21]
4: [\~le\~ks]
5: [Mt+22:211]
6: [\~ke\~hs]
7: []
8: [\~he\~ze]
9: []
I do not get a reference to [end of line]
Source Data2:
\~zs\~lsMark+1:1\~le\~ksMark+1:11\~ke\~hs\~he\~ze some data \~zs\~lsMark+2:1\~le\~ksMark+2:11\~ke\~hs\~he\~ze end of line
Returns 2 matches:
Match1:
[\~zs\~lsMark+1:1\~le\~ksMark+1:11\~ke\~hs\~he\~ze]
1: []
2: [\~zs\~ls]
3: [Mark+1:1]
4: [\~le\~ks]
5: [Mark+1:11]
6: [\~ke\~hs]
7: []
8: [\~he\~ze]
9: []
Match2:
[ some data \~zs\~lsMark+2:1\~le\~ksMark+2:11\~ke\~hs\~he\~ze]
1: [ some data ]
2: [\~zs\~ls]
3: [Mark+2:1]
4: [\~le\~ks]
5: [Mark+2:11]
6: [\~ke\~hs]
7: []
8: [\~he\~ze]
9: []
Again, I do not get a reference to [end of line]
I've tried end the expression with this (.*$) which works for Source Data 1 but
Source data 2 just returns one match with 9:[ some data \~zs....]
Can someone help me with extracting the [end of line] text to item 9:?
thank you
|
|
|
|
|
I'm not a Reg Exp expert, but I recall that '.' matches anything EXCEPT newline so try replacing (.*$) with ((.|\n)*$) . You may also need to switch on multiline matching
|
|
|
|
|
Hi, we are having troubles whit checking a textfield. What we try to do here is to check if the default text has changed. If that is not the case, then a warning will appear at the end of the form.
Everything is working fine until a textfield contains a 'return' (enter) in the text. Even when the default text is gone, the enter will be responsible for the regex-code to fail..
Here the code i have used:
^((?!\bThis contains the default text).)*$
Pleas let me know if you can help me out here
What I have tried:
several codes... Sorry, i don't have them anymore. We also checked the code with a validator.
|
|
|
|
|
You can use (c#) .Trim() on a string to remove leading / trailing "white space", carriage returns, etc.
I tend to do that automatically any time I "save" a string (in a database, for example).
"(I) am amazed to see myself here rather than there ... now rather than then".
― Blaise Pascal
|
|
|
|
|
It sounds like you need to add an option your regular expression to work "multiline"? Have you got some code you can show us?
Have a look at https://regexr.com/3maco
Now is it bad enough that you let somebody else kick your butts without you trying to do it to each other? Now if we're all talking about the same man, and I think we are... it appears he's got a rather growing collection of our bikes.
modified 31-Aug-21 21:01pm.
|
|
|
|
|
The code i have used is in niet question. IT is no longer
|
|
|
|
|
You're welcome!
Now is it bad enough that you let somebody else kick your butts without you trying to do it to each other? Now if we're all talking about the same man, and I think we are... it appears he's got a rather growing collection of our bikes.
modified 31-Aug-21 21:01pm.
|
|
|
|
|
|
Hi all,
I'm new to regular expressions and what I want to do seems a bit advanced for me.
I'd like to create a regular expression to locate valid Australian tax file numbers.
Here's the regular expression I've come up with so far:
(\d{8,9})|(\d\d\d[ ]\d\d\d[ ]\d\d\d)|(\d\d\d[-]\d\d\d[-]\d\d\d)
Tax file numbers can be either 8 or 9 digits and this string successfully finds them, however, it also picks up numbers like mobile phone numbers.
I also tried to incorporate a few different ways people generally type out tax file numbers which is why I've added in a - and also white space.
There is a formula to detect if a tax file number is valid and this is what id like to add to the string to remove the false positives.
From wikipedia:
Tax file number - Wikipedia[^]
As is the case with many identification numbers, the TFN includes a check digit for detecting erroneous numbers. The algorithm is based on simple modulo 11 arithmetic per many other digit checksum schemes.
Example[edit]
The validity of the example TFN '123456782' can be checked by the following process
The sum of the numbers is 253 (1 + 8 + 9 + 28 + 25 + 48 + 42 + 72 + 20 = 253). 253 is a multiple of 11 (11 × 23 = 253). Therefore, the number is valid.
Can it be done?
Can someone assist?
|
|
|
|
|
You'd be pushing it uphill with a sharp stick to write a regex to validate the check digit. Best to use a regex to get the basic format right, then feed it into a bit of code to do the checksum.
Software rusts. Simon Stephenson, ca 1994. So does this signature. me, 2012
|
|
|
|
|
Peter_in_2780 wrote: You'd be pushing it uphill with a sharp stick to write a regex to validate the check digit
Very true
|
|
|
|
|
I agree with Peter_in_2780 that you should separate the matching of the number and the validation calculation.
Just create two methods, one where you check the format and the other to calculate and validate the check digit.
You didn't specify any variants of the text you want to match, so I just guessed what it could look like.
For the actual regular expression you could do like this:
Input: TFN '123456782'
Regex: ^TFN\s*(')?(?<number>\d{3}\s*\d{3}\s*\d{3})(')?\s*$
It will get these variants:
TFN '123456782'
TFN'123 456 782'
TFN123456782
TFN 123 456 782
Explanation:
^ Start of the string
$ End of the string
\s* Consumes 0 or more white space characters. It will make sure you match TFN123 and TFN 123
(')? Optional quotation mark
(?<number> ...) Named group, makes it easier to extract the actual number
If necessary, you will have to remove the spaces in a second step.
Hope it helps.
|
|
|
|
|
Thanks everyone, really appreciate the responses.
Unfortunately I think the only way I can do it is via a regular expression as I am applying it to pre-defined field within a cloud based email security gateway.
Sorry I should have been more detailed in my post.
In answer to one of your questions, the email gateway supports two types of regex syntax Java and Perl.
In the regex I don't need to include looking for the words "TFN" or "Tax File Number" as I can do this via the word / phrase match list on the email gateway.
https://community.mimecast.com/docs/DOC-1613#jive_content_id_Regular_Expressions_Text_Matches
In summary, it will match the regex string defined and match the words TFN or Tax file number and then flag it for the user to review.
I assigned a value for the trigger otherwise known as an activation score which is currently configured as "2" the regex is worth 1 activation point and the words “TFN” and/or “Tax File Number” are both worth another activation point thus triggering the rule if the regex string is matched + either of the words.
From the email gateway.
# search for TFNs
1 regex (\d{8,9})|(\d\d\d[ ]\d\d\d[ ]\d\d\d)|(\d\d\d[-]\d\d\d[-]\d\d\d)
# search for words "TFN" and/or "Tax File Number"
1 "Tax File Number"
1 "TFN"
The three formats I’ve configured to look for the tfns are
123 456 782
123456782
123-456-782
|
|
|
|
|
Java/perl provide the following
\b
That represents a 'boundary' however you should read up on that to insure that is what you really want.
Might also keep in mind that Java/Perl are aggressive in that they look for the best match not the first match. That means that it will keep trying until it is sure. That can result in a lot of processing - sometimes leading to days or even infinite searches. Although your current formats should not do that.
Anchoring to anything will optimize the search.
|
|
|
|
|
Member 13555386 wrote: TFN includes a check digit for detecting erroneous numbers. ... Can it be done?
No.
Although if I was using Perl one can create a "regex" that call a method as part of the regular expression check itself.
But I still wouldn't suggest doing that.
There is no real standard for "regular expressions" so first you would need to define exactly what regular expression engine you are using.
If you are using perl or java then there is a boundary match which might or might not be appropriate for your actual content.
Member 13555386 wrote: (\d{8,9})|(\d\d\d[ ]\d\d\d[ ]\d\d\d)|(\d\d\d[-]\d\d\d[-]\d\d\d)
Following provides a single range of 8-9 digits and then a range of nine digits with spaces or dashes.
(\d{8,9})|(\d\d\d[- ]\d\d\d[- ]\d\d\d)
You could make one of those digits optional by adding a '?' after it. I do not know which one that should be.
Other suggestions really require knowing what regex engine you are using (rather than going through every possibility.)
|
|
|
|
|
See my response to George Jonsson above
|
|
|
|
|
Hi all,
I've been fighting with this for hours now, but don't get it to work. I want to strip out some information from an ini-file, which basically looks like this:
[CameraDefinition.1]
Title=Linke Seite
Guid={0ae3f864-da10-4e5a-977c-b9bba47d6f7a}
Description=Ansicht nach links
Origin=Center It's a standard Windows text file, the sections are separated with two new lines (\r\n\r\n). My regex currently looks like this: "\[CameraDefinition\.(?<camnumber>\d+)\][.|\s]*Guid=(?<guid>\{[0-9A-F\-]*\})"
While the first (CamNumber) and the last part (Guid) return correct results as 'partial match', the critical part seems to be the underlined expression for "everything between the top and the Guid", which might be several lines.
I'd be happy if someone of you helps me solve this... Thank you in advance!
Regards
Mick
|
|
|
|
|
Probably easier just to read it line by line and look for the keyword you are interested in.
|
|
|
|
|
Hi Richard, thank you for your response. It's exactly what I did meanwhile... still I hope someone knows an answer, so that this seemingly simple task wouldn't annoy me again
|
|
|
|
|
It's not clear exactly what you are trying to extract. But either way, my suggestion is much easier.
|
|
|
|