|
This is fantastic. However, my platform uses javase and the following regex was able to pick out the strings without leading 0's e.g. 121 and 122. However, the regex "(9, '0')" doesn't replace 121, and 122 with 000000121 and 000000122. Nevertheless, this is great.
\b[1-9]\d*\b</
|
|
|
|
|
Does it have to be regex?
There's
StringUtils.leftPad () if you want to pad with leading 0s, as long as you know the total length you want. Or use a format string to do it.
String paddedStr = String.format("%09d", originalVal); (I think)
|
|
|
|
|
I need to get the values from below following html snippet. So far I came up with this regex which helps me trim it down to the values I needed, but to automate this I need to join 2 regex statements to get the result "18" which is where I am stuck at. Or Please suggest a better method for me get the values.
I am using WebHarvey scraping tool. The program is based on .net but it doesn't support inserting .net code so I need only regex command.
First Regex Statement
(?s)(?<=attribute bathroom).+?(?=\/span)
Result:
" title="Bathrooms" style=" ">
<span class="value" style=" ">18<
Second Regex Statement
(?s)(?<=<span class="value" style=" ">).+?(?=<)
Result: 18
HTML Snippet
<ul class="iconContainer" style=" ">
<li class="attribute propertyId">
xxx1
</li>
<li class="attribute propertyType">
Factory
</li>
<li class="attribute bathroom" title="Bathrooms" style=" ">
18
</li>
<li class="attribute carspace" title="Car Spaces" style=" ">
18
</li>
<li class="attribute landArea">
<span title="Land Area">
5,010<span class="unit">mclass="superscript"></span>
</span>
<span>|</span>
<span title="Floor Area">
9,270<span class="unit">m^__b class="superscript">2</span>
</span>
</li>
</ul>
|
|
|
|
|
Please do not repost the same question. You can easily edit your own questions if you need to add more details.
|
|
|
|
|
Don't try to use Regex to parse an HTML document. You'll end up with an extremely fragile solution, where even the slightest change to the source document will cause it to break.
Use a proper HTML parsing library instead - for example, AngleSharp[^].
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
In my question I have mentioned "
I am using WebHarvey scraping tool. The program is based on .net but it doesn't support inserting .net code so I need only regex command. "
I cannot use any solution except using regex in this tool. When 2 of my regex statements are bringing the result I wanted then I am pretty sure using regex can get the solution needed but due to lack of knowledge I am stuck here.
Parsing HTML with regex is not best practice but I am willing to take the risk. Suggest a solution please.
|
|
|
|
|
He was saying instead of using WebHarvery, use AngleSharp instead.
|
|
|
|
|
I'd suggest getting a better scraping tool, or writing your own.
Given the sample input, this regex should match:
(?<=class="attribute bathroom"[^>]*>\s*<span[^>]*>)[^<]+ Demo[^]
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Hi, I'd lke to subsitute the string Alfa("Beta") to Alfa(Gamma("Beta")) using regular expressions in Visual Studio using regexp.
The first part is simple, the search string will be Alfa\("(.*)"\)
But how to specify the replacement string? I used Alfa\(Gamma\("(.*)"\)\) , but the result was Alfa(Gamma("(.*)")) and not the requested Alfa(Gamma("Beta"))
Thank you for your advice in advance
|
|
|
|
|
|
Thank you!
|
|
|
|
|
Hi Richard,
I used your advice and it worked perfectly. The search string was alfa\("(.*)"\) and the regexp substitute string was alfa(gamma("$1")) Thus I obtained the wished result string alfa(gamma("beta"))
But one more question: I encountered an input string alfa("beta","delta") and the wished result string should be alfa(gamma("beta"),"delta"), but I obtained alfa(gamma("beta","delta"))
How to change regexp to achieve this?
Thank you, best regards,
Michael
|
|
|
|
|
Maybe so?
alfa\("([^"]*)"(.*)
replace with
alfa(gamma("$1")$2
|
|
|
|
|
|
Hi
I have a line in my csv file as below
""|*"I have delimiter |* and an escaped \" quote in me"|*100|*200|*300|*"am a string"|*""
I have to interpret " quote as text-qualifier and |* as delimiter. I have to ignore escaped quote \" and consider it part of the string. 100, 200, 300 are integer data fields, so, they are not surrounded by text-qualifier.
The expected result is an array of strings.
a[0] = "" which is a Null string
a[1] = "I have delimiter |* and an escaped \" quote in me"
a[2] = "100"
a[3] = "200"
a[4] = "300"
a[5] = "am a string"
a[6] = "" which is a Null string
Code is as below, it looks like \" is not getting escaped properly, could you please let me know how to fix this, thanks.
The RegularExpression code is as in here: Split Function that Supports Text Qualifiers[^]
using System.Text.RegularExpressions;
public string[] Split(string expression, string delimiter,
string qualifier, bool ignoreCase)
{
string _Statement = String.Format
("{0}(?=(?:[^{1}]*{1}[^{1}]*{1})*(?![^{1}]*{1}))",
Regex.Escape(delimiter), Regex.Escape(qualifier));
RegexOptions _Options = RegexOptions.Compiled | RegexOptions.Multiline;
if (ignoreCase) _Options = _Options | RegexOptions.IgnoreCase;
Regex _Expression = New Regex(_Statement, _Options);
return _Expression.Split(expression);
}
|
|
|
|
|
Your function works fine for me if you pass in the correct values.
const string input = "\"\"|*\"I have delimiter |* and an escaped \\\" quote in me\"|*100|*200|*300|*\"am a string\"|*\"\"";
string[] result = Split(input, "|*", @"\""", false); Split | C# Online Compiler | .NET Fiddle[^]
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Hi,
i got an 9 million row excel file and i need to join it with another excel file base on numeric values.
Issue :
I suppose to have only unique numbers in column A, but unfortunately i got some characters in some columns.
When i try to join the two excel i get an error : dataFormat.Error: We couldnt convert number.
How can i :
1. delete rows where i got alphanumeric values in columns
2. filter out alphanumeric - powerbi, excel or any other software
Thank you
|
|
|
|
|
I can find plenty of examples of how to find a number in a string, but now how to find a number preceded by a specific string.
Like...
[2020-07-24_184102][ACF11B2Y90][CCT]_Cycle0001.csv
How can I say "give me the number that follows the text 'Cycle'?"
|
|
|
|
|
|
Thanks Richard. I finally ended up working out the following:
(?<=Cycle)\d{4}
This returns the '0001' like I want. I actually came close to working this out before posting, but failed to notice that I had a space before the \d. Duh.
|
|
|
|
|
Hello.
I am an editor at english wikipedia. I am adding template of wikiproject banners on the talkpages of articles which come under the scope of wikiproject. Basically, I am adding
{{WikiProject Espionage |importance=Mid |class=Start}}
to the talkpages of the articles. For that I use a tool called as auto wiki browser (AWB). Wikipedia:AutoWikiBrowser - Wikipedia[^]
Currently, I am using following module:
public string ProcessArticle(string ArticleText, string ArticleTitle, int wikiNamespace, out string Summary, out bool Skip)
{
Regex header = new Regex(@"\{\{{WikiProject Espionage|{{WikiProject Espionage |{{WP Espionage|{{WikiProject Intelligence |", RegexOptions.IgnoreCase);
Summary = "Added banner for [[WP:WikiProject Espionage|WikiProject Espionage]]";
Skip = (header.Match(ArticleText).Success || !Namespace.IsTalk(ArticleTitle));
if (!Skip)
ArticleText = "{{WikiProject Espionage|class=|importance=|listas=}} \r" + ArticleText;
return ArticleText;
}
In the module, "WP Espionage", and "WikiProject Intelligence" are shortcuts/redirects to WikiProject Espionage. The module tells AWB to add the template to the article, and skip if any of these templates are already present. But the problem is, there is another wikiproject by the "Wikiproject Intelligence Agency". So my module is skipping the pages if there is Wikiproject Intelligence Agency already present on the page.
So my question is: is there a way to tell AWB to skip only if {{WikiProject Espionage, {{WP Espionage, and {{WikiProject Intelligence and to avoid the presence of {{Wikiproject Intelligence Agency ?
Any help will be appreciated a lot. Regards, —usernamekiran.
|
|
|
|
|
if ( ArticleText.Contains( "Wikiproject Intelligence Agency" ) ){
} else {
}
It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it.
― Confucian Analects: Rules of Confucius about his food
|
|
|
|
|
I need to find 5 sequence of alphabets has exists in a given string using regular expression.
Sequence which should check in any of the form like ABCDE, BCDEF, DEFGH, EFGHI,......,PQRST,...,VWZYZ.
Eg.
1) ABCDECODEPROJECT - Match (because "ABCDE" has exists in the string)
2) TESTSTRING - Not Match (because No 5 sequence of alphabets)
3) SAMPLEQRSTUEND - Match (because "QRSTU" has exists in the string)
4) WEBSITELMNOPEND - Match (because "LMNOP" has exists in the string)
5) QUESTIONANSWER - Not Match (because No 5 sequence of alphabets)
Can you help me on build regular expression to check all possible combination in alphabets?
|
|
|
|
|
|
This doesn't look like a suitable case for regular expressions. You'd have to build a regex pattern containing every possible sequence to test for, because regex doesn't have a construct for comparing the difference between two characters within a match.
It would probably be easier to implement this using a loop. Check each five-character substring from the input to see if it's contained within a string containing the letters in sequence, remembering to account for case and accents if required.
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|