|
Hello,
I am trying to learn Regex on my own, but got stuck with select from an unordered list.
At this moment i managed to list the cattegories, but i cant figure how to
1) capture the node ID for each cattegory (for example 560884 for the first one)
2) how to define ">" not to be listed as cattegory.
Here is my code:
select div#wayfinding-breadcrumbs_feature_div li >>> category_tree {
select span.a-list-item >> category_name;
select div#wayfinding-breadcrumbs_container .a-link-normal >> attr(href) >> capture "[node=\\d+]" >> node_id;
}
Here is part of the output:
"category_tree": [{
"category_name": "Portable Sound & Video",
"node_id": null
}, {
"category_name": "›",
"node_id": null
}, {
"category_name": "Accessories",
"node_id": null
Here is source code:
<pre><div id="wayfinding-breadcrumbs_feature_div" class="a-subheader a-breadcrumb feature" data-feature-name="wayfinding-breadcrumbs" data-cel-widget="wayfinding-breadcrumbs_feature_div">
<ul class="a-unordered-list a-horizontal a-size-small">
<li>
<a class="a-link-normal a-color-tertiary" href="/mp3-ipod-headphones-DAB-radio/b/ref=dp_bc_aui_C_1?ie=UTF8&node=560884">
Portable Sound & Video
</a>
</li>
<li class="a-breadcrumb-divider">
›
</li>
<li>
<a class="a-link-normal a-color-tertiary" href="/Accessories-Portable-Sound-Vision-Tapes/b/ref=dp_bc_aui_C_2?ie=UTF8&node=560910">
Accessories
</a>
</li>
<li class="a-breadcrumb-divider">
›
</li>
<li>
<a class="a-link-normal a-color-tertiary" href="/b/ref=dp_bc_aui_C_3?ie=UTF8&node=16700222031">
Portable Speakers & Docks
</a>
</li>
</ul>
</div>
Thank you for your help
|
|
|
|
|
Jukec wrote: "[node=\\d+]"
No idea what language that is in. But all of the major ones use the same regex semantics for the most part.
The square brackets should not be there. Presumably the rest of the code is actually going to 'capture' what is matched. That is a specific term for regex.
If so it will look like 'node=16700222031' which means you would need to parse it again to get the number out.
|
|
|
|
|
I am trying to match the following sample patterns:
- 1
- 23-4B
- 2,1-12
- 15A
- 2-5,12
- 12A-4
so I tried the following regex but it seems it only matches some of above patterns...
^\d+[A-Z]?(-|,)?\d+[A-Z]?(-|,)?\d+[A-Z]?$
For example, the above matches all samples but not
- 1
- 15A
- 2,5
What did I miss here...
|
|
|
|
|
Hi!
Try this: ^\d*(-|,)?\w*(-|,)?\w*$
They all start with a indefinite digit at the beginning of string: ^\d*
In the middle there is optional characters: (-|,)?
Ending with either digit or letter: \w*
Hope it helps.
Nice day!
|
|
|
|
|
Thank you for the reply.
However, the above expression doesn't match the following test cases:
-2C-5,3
-2,1-12
-2-5,12
|
|
|
|
|
Hi!
You did not list them from the beginning.
But, as you do say about them now, it is matching:
^\d*(\w+)?(-|,)?\w*(-|,)?\w*$
Have a nice day!
|
|
|
|
|
You have six different patterns to match so regex is probably not the optimum choice of tool.
|
|
|
|
|
Hello, sir!
Your above comment intrigued me and made me curious.
Would you care to share something or a link that would help me what would a optimum choice of tool for over six different patterns be. I want to learn to optimize my work.
Thank you, sir!
|
|
|
|
|
Member 15245024 wrote: share something or a link that would help me Yes, but I have no idea what problem you are trying to solve.
|
|
|
|
|
Hello, sir!
Yes. I have a case that I defined hopefully in a accurate way as being a case of various, simultaneous and multiple replacements. I do not even know if it is even possible.
I have a text, a long one, and sprinkled all over the place in the text, in the beginning of the string, in the middle, at the end, in parenthesis or not, with such expressions of 8 kinds:
Abc. 22
Defg 22
Hijkl 22
Mnoprs 22
2 Tuvx 22
2 Zcb 22
2 Fh 22
C.C. 22
I need to replace each of the above with these below:
Abc 22
Def 22
Hij 22
Mno 22
2Tu 22
2Zc 22
2Fh 22
Cal 22
This is the kind of situation I have, but not exactly these letters and digits - these are a dummy text as an example.
Is it possible to do this simultaneously in parallel and in the same time?
What would you use?
THANK YOU, sir.
|
|
|
|
|
Because of all the differences I think you probably need to use a translate table. Create two lists of strings, the first is the strings to search for, and the second is the matching replacement. You would then need to go through each table for each line of text looking for strings in table 1. For each match you need to replace it with the appropriate entry in table 2. I am not sure how you could do this in parallel processing as it is really a sequential process.
|
|
|
|
|
If this matches to much, I would need more samples to know what 'should' and 'should not' be matched...
^\d{1,2}[-,A-Z\d]{0,5}$
modified 5-Oct-21 21:01pm.
|
|
|
|
|
Hi to all! My name is Dumitru, and I am a newbie to regex.
My situation is that I studied some regex on my own, but I realized I have some lacks so I have to ask other people who really understand regex.
<b>An example of my target text:</b>
Abc. 2:5a; 24:51d, 53; 1:9b, 22-23c; 1:22-23, 9; 1:22-23, 24-25;
<b>Am example of my insufficient regex formula:</b>
((((\d)?\w*(\.)?\s\d*:\d*)((-|: |,)?\s)?)(((-)?|(\d*))?(,)?(\s)?){5})
# Case 01. My formula does not match small letters, after and next to digits, as <b>a</b>, <b>d</b>, <b>b</b>, <b>c</b>
# Case 02. My formula does not find <b>;</b>
# Case 03. When Replace I have to add <b>Abc.</b> after each <b>;</b> and before the next series of digits, like this: <b>Abc. 2:5a; Abc. 24:51d, 53; Abc. 1:9b, 22-23c;</b> etc.
<b>Note:</b> The small letter after and next to the digits, may or may not be there.
And I need to find / match only those that are <b>not</b> at the start of a string, that is, only those that are in the middle or at the end of a string.
I would really appreciate any help. Thank you!
|
|
|
|
|
Instead of saying "it doesn't do this", tell us what it is expected to, and what it actually does.
So if you want it break out 5 matches delimited by semicolons, and within that split them into smaller matches, then tell us that - and show us examples of what output you want.
Regexes can be a pain to read at best, and just showing us a "bad regex" and saying "it doesn't do this" Isn't really very helpful!
And are you using a tool to help you design and test Regexes? If so, which one?
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
I am using Notepad++
Thank you for pointing that out, I was not aware and I have comply to the forum common rules.
What is expected is:
1. a string like this to be matched:
Abc 2:5a; 24:51d, 53; 1:9b, 22-23c; 1:22-23, 9; 1:22-23, 24-25;
2. this string should not be matched if it is found at the start of a line, that is, a new row; only in the middle o at the end of a row.
3. if in the middle or at the end of a string/row, it should match only if it starts with capital letter. E.g. Abc, not abc
4. After matching the Replacing I need is like this:
Abc 2:5a; Abc 24:51d, 53; Abc 1:9b, 22-23c; Abc 1:22-23, 9; Abc 1:22-23, 24-25;
I hope I was more clear this time.
I welcome any perfecting, because reading regex is indeed pain enough. So, I should be very accurate.
Thank you, sir!
-- modified 14-Jun-21 5:31am.
|
|
|
|
|
You're welcome - it's not "forum common rules", it's just common sense - we only get exactly what you type, so the more accurate your question, the better the answer.
To be honest, I wouldn't faff too much with a regex - use it to extract the basics, and then use your presentation language string handling to break the rest up. You'll end up with much more readable code, and given the "vagueness" of your description, it's very likely that something will change or have been forgotten.
It's a lot easier to maintain PL code than a complicated regex!
I'd also suggest that you get a copy of Expresso[^] - it's free, and it examines and generates Regular expressions.
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
Greetings Dumitru, Im not expert because there seems to be many different flavors, so Im just guessing on this.
But it does seem like each "semicolon-comma pair" could be represented by something like ...
(\d{1,2}(?:[-:]\d{1,2}[a-z]?)?; )(\d{1,2}:\d{1,2}(?:[a-z]|-\d\d)?, )
So depending on your flavor, since the sample has 4-1/2 pairs, you might have to type a VERY long string like...
^([\w]+\. )(\d{1,2}(?:[-:]\d{1,2}[a-z]?)?; )(\d{1,2}:\d{1,2}(?:[a-z]|-\d\d)?, )(\d{1,2}(?:[-:]\d{1,2}[a-z]?)?; )(\d{1,2}:\d{1,2}(?:[a-z]|-\d\d)?, )(\d{1,2}(?:[-:]\d{1,2}[a-z]?)?; )(\d{1,2}:\d{1,2}(?:[a-z]|-\d\d)?, )(\d{1,2}(?:[-:]\d{1,2}[a-z]?)?; )(\d{1,2}:\d{1,2}(?:[a-z]|-\d\d)?, \d{1,2}(?:[-:]\d{1,2}[a-z]?)?;)$ With a replacement like...
\1\2\1\3\4\1\5\6\1\7\8\1\9
modified 5-Oct-21 21:01pm.
|
|
|
|
|
I want to match exact strings with a vertical bar and square brackets, but I want to match the shortest possible.
Meaning, I want to match the string
[aaa|bbb]
not anything in this
[aaa] | [bbb]
or this
[aaa] | [aaa|bbb] (in this example, I only want [aaa|bbb], and want the the first [aaa] be ignored (not matched)
so I can't use the regex:
\[.*?\|(.*?)]
It can have non-alphabetical chars, like this [aaa|(ccc)bbb], or be in different languages.
What can I do?
|
|
|
|
|
How about:
\[[^\]]*\|[^\]]*\] This will match:
It will not match:
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Doesn't seem to work.
Using this engine it only matched the first line:
a [aaa|bbb] b
a [aaa|bbb] b
[|bbb]
[aaa|]
[|]
[aaa]
[aaa]|[bbb]
|
|
|
|
|
That site finds all five matches for me.
Demo[^]
Screenshot[^]
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Needed to turn on the global flag...
Thank you!
|
|
|
|
|
Hello,
I would like to process a regex to look for only the internal links containing the rel="noopener"
For example search for this link:
<a href="https://www.linkinterno.it/2018/10/titolo/" target="_blank" rel="noopener"> anchor text </a>
In this case the regex should be:
href="https://www.linkinterno.it(.*?)rel="noopener"
and it should work as I checked it with the following regex test:
https://www.freeformatter.com/regex-tester.html
However, I am not looking for internal links correctly, only those with rel="noopener". How can I solve?
Thank you
|
|
|
|
|
Try this
href="https:\/\/www.linkinterno.it.*rel="noopener"
I had to escape the forward slashes '/' to try it in some of the regex testers so depending on the flavour of regex you are using you may need to remove them.
|
|
|
|
|
Thanks a lot
So, the concept is to find out if inside an href there is the site name (so that I can understand that it is an internal link) and that it contains the rel="noopener".
I have adapted the regex to the best, modifying it like this:
www.sito.it. * rel = "noopener"
and removing the https: // protocol which can generate errors.
Unfortunately, however, something is still wrong. I state that the regex I need for a seo spider (ScreamingFrog), in which it is possible to set the targeted regex for searches within a website.
I await your clarifications on this.
Thanks a lot
|
|
|
|
|