|
i need to find the strings like below
mTimerManager
mAutolockManager
but not like
mv this is comment timerManager
anand sunku
|
|
|
|
|
And?
What have you tried, and where are you stuck?
"I need..." is not a question.
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
You did not define which regex you are working with.
Your definition for the 'word' is anything except a space. Which is not really what 'word' generally means. But that is what I used.
In perl.
m[^ ]+
The above will match the following (because that is the definition of 'word')
m&extra---stuff.
It is also limited with the following since, again with the definition of word, it is not clear what might be expected
mFirst,mSecond
|
|
|
|
|
I am trying to write some regex to pull out fields from a set of web pages. The information contained in them can vary for example they can have all or some of the fields (I think I have identified all the possibilities). and I think I can deal with this by including all the potential options and have data returned if the field is present as long as I can figure out how to make them absolute references. The other challenge is that sometimes these fields contain bullet lists which can have 1 or more bullet items which I don't know how to handle. Example is below and i am trying to identify the details associated with "Type of surveyor", "Works for", "Business type", "Surveying services", "Partners and directors", "Accreditations", "Registered valuer". If anyone can help that would be greatly appreciated
<div class="office inner grid">
<!-- Office title -->
<h1 class="office__title grid__col grid__col--md-16 grid__push--md-8">Patterson Surveying</h1>
<!-- Office information -->
<div class="office__content grid__col grid__col--md-16">
<p class="office__about">Patterson Surveying is an independent surveying firm run by Paul Patterson</p>
<section class="office__info">
<div class="office-info__row">
<h3 class="office-info__heading">Type of surveyor</h3>
<div class="office-info__content">
<ul class="bullet-list">
<li class="bullet-list__item">Chartered Valuation Surveyor</li>
</ul>
</div>
</div>
<div class="office-info__row">
<h3 class="office-info__heading">Works for</h3>
<div class="office-info__content">
<ul class="bullet-list">
<li class="bullet-list__item">Residential customers</li>
<li class="bullet-list__item">Commercial contracts</li>
</div>
</div>
<div class="office-info__row">
<h3 class="office-info__heading">Business type</h3>
<div class="office-info__content">
Private Practice
</div>
</div>
<div class="office-info__row">
<h3 class="office-info__heading">Surveying services</h3>
<div class="office-info__content">
<ul class="bullet-list bullet-list--2col">
<li class="bullet-list__item">Building surveying</li>
<li class="bullet-list__item">RICS Home Survey – Level 2</li>
</ul>
</div>
</div>
<div class="office-info__row">
<h3 class="office-info__heading">Partners and Directors </h3>
<div class="office-info__content">
<ul class="bullet-list bullet-list--2col">
<li class="bullet-list__item">Mr P M Patterson MRICS</li>
</ul>
</div>
</div>
<div class="office-info__row">
<h3 class="office-info__heading">Accreditations</h3>
<div class="office-info__content">
<h3>Registered Valuer</h3>
<ul class="bullet-list">
<li class="bullet-list__item">Mr P M Patterson MRICS</li>
</ul>
</div>
</div>
<div class="section section--shaded">
<a name="Contact"></a>
<h3 class="office__title">Contact Patterson Surveying</h3>
|
|
|
|
|
Basically, don't use a Regex: HTML is notorious for being difficult to process effectively if you treat it as text - it pretty much needs a browser engine to mostly render the page before it can be parsed effectively as it contains so many different ways to do anything.
Instead, I'd suggest you use an HTML parser (I use HTMLAgilityPack[^] in C#, but your language may need a different one) and scrape the sites that way - it's a lot easier to work with, and a whole load easier to change when the site admin alters the format, which happens a lot as features are added, removed, modified, or bugs are fixed.
Doing it with a regex means it might work for a week, and then fail - and then the whole regex has to be re-written, re-tested, fixed, and released.
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
Thanks Original Griff,
That is beyond my technical know-how at this point but I am looking to learn. I am using this within Octoparse which from what I have learnt to date can only use regex to make the fields absolute / more accurate. So I think I am stuck with trying to make it work using regex. Unless anyone knows differently or can help with the regex please?
|
|
|
|
|
This is for my VMware vCenter servers where I am trying to clean out extra log files which are no longer required. The type of files for this example are:
Quote:
/storage/log/vmware/eam/web/localhost_access_log..2020-12-06.txt
/storage/log/vmware/eam/web/localhost_access_log..2020-09-13.txt
/storage/log/vmware/eam/web/localhost_access_log..2020-10-31.txt
/storage/log/vmware/eam/web/localhost_access_log..2020-12-13.txt
/storage/log/vmware/eam/web/localhost_access_log..2020-10-03.txt
/storage/log/vmware/eam/web/localhost_access_log..2020-09-08.txt
/storage/log/vmware/eam/web/localhost_access_log..2020-07-21.txt
/storage/log/vmware/eam/web/localhost_access_log..2020-08-03.txt
/storage/log/vmware/eam/web/localhost_access_log..2020-11-30.txt
/storage/log/vmware/eam/web/localhost_access_log..2020-11-08.txt
/storage/log/vmware/eam/web/localhost_access_log..2020-11-27.txt
/storage/log/vmware/eam/web/localhost_access_log..2020-12-14.txt
/storage/log/vmware/eam/web/localhost_access_log..2020-09-28.txt
/storage/log/vmware/eam/web/localhost_access_log..2020-10-01.txt
/storage/log/vmware/eam/web/localhost_access_log..2020-11-29.txt
/storage/log/vmware/eam/web/localhost_access_log..2020-10-19.txt
/storage/log/vmware/eam/web/localhost_access_log..2020-12-05.txt
The expression which works for me in my Linux file system is this one:
find /storage/log/vmware/ -mtime +10 -type f -name "localhost_access_log..2020-[0-9][0-9]-[0-9][0-9].txt"
It uses the Linux "find" command to find the files, the files have to be "mtime=10" 10 days or older. I would like to shorten the regex to simplify it, and using RegExr: Learn, Build, & Test RegEx[^] as my tester, I found that the following regex works:
localhost_access_log\.\.2020-[0-9]{2}-[0-9]{2}\.txt
However when I try it on my Linux filesystem, it fails to produce results. I get nothing returned.
|
+-- JDMils
|
+-- VB6
+-- VB Dot Net
|
|
|
|
|
|
Go back to the version that does work.
|
|
|
|
|
Unless told otherwise, find uses file globs, I think. But you can change the regex engine using -regextype. e.g. -regextype posix-extended. find will tell you what regex engines it knows about if you say -regex help. Possibly one of the engines knows how to parse your regex expression to your liking.
Keep Calm and Carry On
|
|
|
|
|
Thanks K5054, that was the clincher. To find the files I need, I had to perform the following:
* State the Regex Engine as 'posix-extended'
* Put the expression '.*' at the start of the filename as the files are treated as fully qualified filenames (file path & filename).
Thus, I can now use the following:
find /storage/log/vmware/ -type f -regextype posix-extended -regex '.*vpxd-svcs-access-.2022-[0-9]{2}-[0-9]{2}.log.gz'
And....
find /storage/log/vmware/ -type f -regextype posix-extended -regex '.*sps.log.[0-9]{2}.gz'
|
+-- JDMils
|
+-- VB6
+-- VB Dot Net
|
|
|
|
|
|
Regular Expression to find parts of a <script/img src=''> or <link href=''> attribute value
Been using my go-to regex101.com editor to work this out, but I always have problems with URLs and filesystem paths. I generally have the 'https' URL/resource in order.
I am trying to read and parse the link 'href' and img/script 'src' attribute values from the elements extracted in the markup.
The groupings/captures I want are
- "path provider" (PowerShell terminology), basically the drive
- The path leading to the file part. I prefer groupings between the path separator "\" or "/", both must be accounted for but will accept a long string
Thus, suppose D:\a\b\c\file.ext
This part can be grouped as '\a\b\c' but if it can multiple groups '\a', '\b', '\c', even better.
One more more path separators required - The file basename without path separator
- The file extension with the leading '.' which is the last '.' of the path
My working pattern/RE is: ^([a-zA-Z] ?(([/\]?[^/\]+)*)[/\]([^.]+).(\S+)$
The pattern might be more specific regular expressions separated by the alternative separator (|) instead of trying to match the strings with a single expression.
I specifically include the '^' and '$' start and end assertions for the markup attribute value.
Test string #1: ${SPREST_JS_FolderPath}/SPListREST.js
- No path provider/drive, so no Group 1 - OK
- Group 2: ${SPREST_JS_FolderPath} # Item (ii)
- Group 3: ${SPREST_JS_FolderPath} # repeat of Group 2 -- not wanted
- Group 4: SPListREST # file basename Item (iii)
- Group 5: js # file type/extension Item (iv)
Test string #2 D:\dev\SharePoint\SPTools\src\pagestyle.css
- Group 1: D: # Item (i)
- Group 2: \dev\SharePoint\SPTools\src # Item (ii) exactly as required if groupings by '\pathseg' not possible
- Group 3: \src # the last path segment--unwanted
- Group 4: pagestyle # file basename Item (iii)
- Group 5: css # file type/extension Item (iv)
Test string #3 ./js/SPREST/SPRestEmail.js
- No path provider/drive, so no group 1
- Group 2: ./js/SPREST # Item (ii) exactly as required if groupings by '\pathseg' not possible
- Group 3: /SPREST # the last path segment--unwanted
- Group 4: SPRestEmail # file basename Item (iii)
- Group 5: js # file type/extension Item (iv)
[composed in Markdown, so presentation affected by your settings/stylings]
|
|
|
|
|
I am trying to create a formula in Data Studio using RegEx that extracts the channel from the below URLs, all channels are passed as the first value before the first _
ppcnonbrand_sports_uk_google_ourwebsite_b5g5_onlinebetting
crm_sports_uk_email_ourwebsite_b5g5_welcomeemail4b_signin_0_0_0
crm_sports_uk_email_ourwebsite_superboost_221206crmuksportstuenewsletterb_bethere_221206_rnd_0
socialbrand_sports_uk_facebook_ourwebsite_none_carouselawareness_23852435274560534_23852435274610534_0_Facebook_Desktop_Feed_23852435274780534
so from the above i want to return:
ppcnonbrand
crm
socialbrand
I have tried the following but it only returns "null" for everything .. i'm all out of ideas, can anyone suggest something?
REGEXP_EXTRACT(Session Campaign, "^[^_]*")
|
|
|
|
|
I found a solution that worked:
REGEXP_EXTRACT(Session campaign, "^([^_]+)")
|
|
|
|
|
I own few virtual pinball files that I need to be clean up.
I was planning to use RegEx in Excel to clean them out, I'm ok with VBA programming
I know what RegxEx is and what it can do but I'm not familiar on how to write them
Can you help me with those ?
Here is an example of the files and the suggested RegEx "\(.*\).*(?=\[0-9a-zA-Z\.]+\.vpx)":
F:\vPinball\VisualPinball\Tables\Cross Town (Gottlieb 1966).vpx
F:\vPinball\VisualPinball\Tables\Wild Fyre (Stern 1978) 1.1.vpx
F:\vPinball\VisualPinball\Tables\Pirates of the Caribbean Siggis Mod 1.0.vpx
F:\vPinball\VisualPinball\Tables\Punk wip 12.vpx
F:\vPinball\VisualPinball\Tables\Attack from Mars (Bally1995) g5k 1.1.vpx
F:\vPinball\VisualPinball\Tables\Flying Chariots (Gottlieb 1963).vpx
F:\vPinball\VisualPinball\Tables\Lady Luck (Bally 1986) 1.1.vpx
F:\vPinball\VisualPinball\Tables\Fathom (Bally 1981) 1.1a.vpx
F:\vPinball\VisualPinball\Tables\Conan (Rowamet 1983) v1.1a.vpx
F:\vPinball\VisualPinball\Tables\Black Knight - Sword Of Rage.vpx
F:\vPinball\VisualPinball\Tables\Heat Wave (Williams 1964).vpx
F:\vPinball\VisualPinball\Tables\Full (Recreativos Franco 1977).vpx
Ideally the expected name will be Back To The Future.vpx by example
Someone did suggested me this RegEx mentionned earlier, which did most of the cleaning but after running the VBA there a still few that weren't cleaned
Here are those:
Back To The Future 1.4 PF Mod.png
Back To The Future 1.4 PF Mod_thumb.png
Back To The Future 1.4 PF Mod_thumb_sm.png
Back to the Future 1.05 .png
Back to the Future 1.06 .png
Back to the Future 1.14.png
Back to the Future 1.15.png
back to the future 2 mf.png
back to the future mf.png
Back To The Future Starlion MoD.png
Back To The Future Starlion MoD_thumb.png
Back To The Future Starlion MoD_thumb_sm.png
Ideally I was looking for Back To The Future.png
Can you help me please ?
Thanks
|
|
|
|
|
The reason they are left behind is because your Regex only includes filenames that contain the text ".vpx" - which your .png files don't, so can't be matched or removed.
Do two things:
1) Get a copy of Expresso[^] - it's free, and it examines and generates Regular expressions.
2) Learn how regexes work: Regular Expression Tutorial - Learn How to Use Regular Expressions[^]
I'm not going to suggest a regex at all - I have no idea what else is in the folder(s) or how you collected them and no way to find that out - a "random" regex" would be likely to delete a lot more than you actually wanted with no way to test it before you used it!
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
This works without the $ but I need it to work from the end of the line not the start of the line.
Working:
(\b[A-Za-z]{1}[A-Za-z]+\b)
Not Working:
(\b[A-Za-z]{1}[A-Za-z]+\b)$
What I'm trying to do is grab the street type from an address which works fine using just
[a-zA-z]+$ When the address is like "123 Easy Street" but it doesn't work well when there's a street direction afterwards like "123 Easy Street N". I'm trying to skip over the single character direction indicator and match "Street" whether the single character is there or not, but it's also not always "Street" as it could be avenue, crescent, drive, highway, road, etc. I think my line above will do what I need but I can't get it to work with the $ to read backwards. Any help would be appreciated!
|
|
|
|
|
Couple of questions. What regex flavor are you using POSIX, PCRE, .NET, Java, etc?
Is this part of a compiled/interpreted program (e.g C++, C#, python), or is it part of something more like a shell script?
Keep Calm and Carry On
|
|
|
|
|
I'm trying to use it within Pabbly Connect, which is an integration system like Zapier or Make. I can use the following spreadsheet type formulas found here https://www.pabbly.com/spreadsheet-formulas/ and similar usually to Google sheet formulas.
|
|
|
|
|
Andrew St. Hilaire wrote:
it doesn't work well when there's a street direction afterwards like "123 Easy Street N"
Your pattern explicitly requires at least two letters at the end of the line, with a non-word character before them. "123 Easy Street N" only has one letter at the end of the line, and is therefore not a match for your pattern.
NB: Your pattern could be simplified to:
(\b[A-Za-z]{2,})$
You need to consider the data you are trying to match, and come up with a pattern to match it. Given your example, you could try:
(\b[A-Za-z]{2,}\b)(\s+[A-Za-z])?$ which would match "Street" in both "123 Easy Street" and "123 Easy Street N".
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Andrew St. Hilaire wrote: I'm trying to skip over the single character direction indicator and match "Street" whether the single character is there or not,
So match the following so it returns 'Main' in the match for any of the following
1. 123 Main
2. 123 Main View
3. 123 Main N
4. 123 Main View S
(\b[A-Za-z]+)((\s+[A-Za-z])?)$
First clause matches the street.
Second clause optionally matches a single last character.
Why the extra parens in the second clause? Because I prefer to always have a match for optionals. So in this case first match is street name and second match (always there) is either something like 'S' or it is empty/null. If the parens were not there then there might or might not be a second match (you would need to test for it.)
|
|
|
|
|
Hello everyone!
since more than two decades programming with VC++ I never used Regular Expressions.
But now I need... The solution in VS2022 contains some thousands line looking like:
some_object.MyMethod(comma_separated_parameter_list, &SomeClass(argument)optiolal_parameter-list)
What I need is removing the ampersand (&) from these code expressions.
I found a way to search for such an expression. It is something like something like
.*MyMethod\(.*, &SomeClass\(
But I could not find a working expression for Replace.
So I need your help guys!
|
|
|
|
|
Typically, one would use capture grops and replacement expressions. In your case you might do
(.*MyMethod(.*, )&(SomeClass()/s\1\2
If there are many SomeClasses that you would like to replace, you might be able to use
(.*MyMethod(.*, )&(\w*()/s\1\2 Be cautious! I Have not tested either of these, and any time you're experimenting with regular expression replacements, Bad Things can happen. Back up Early! Back up Often!
This all assumes that you actually want to make replacements using Visual Studio, more information for which can be found here:
Keep Calm and Carry On
|
|
|
|
|
Thank you for pointing me out to capture groUps! Very interesting.
However your suggestion with Quote: (.MyMethod(., )&(SomeClass()/s\1\2 doesn't work.
Example:
Quote: original line:
qwerty.MyMethod(param1, param2, &SomeClass(param3), param4);
search for the parttern
.*MyMethod\(.*, &SomeClass\( finds the substring of the origin up to
SomeClass(
But trying to replace it (to remove the &) using your suggestion results to
(.MyMethod(., )&(SomeClass()/s\1\2param3), param4);

|
|
|
|
|
Visual Studio's "Replace in Files" option (Ctl-Shift-H) allows you to use regular expressions, for both the find and the replace parts, and even offers a drop down of selections for the more common situations.
|
|
|
|
|
Thank you Richard.
I know about these options in Visual Studio.
What I don't know is how to remove an & from a code line that I already found (using regular expressions) but leave all the other texts before and after this & unchanged!
|
|
|
|