Hello all. This is again a question related to wikipedia. There is a thing called wikilink (display one title, take to another article). For example, "[[metro station|station]]" will be displayed as "station", but will take you to the article of "metro station". And "[[Wyn Jones (rugby union)|Wyn Jones]]" will take you to Wyn Jones (rugby union)".
I am doing a find and replace task, using following regex:
Match m = Regex.Match(ArticleText, @"\[\[[W]yn Jones]]");
if (m.Success) ArticleText = ArticleText.Replace("Wyn Jones", "Wyn Jones (rugby union)");
With the regex provided above, my tool (auto wiki browser - AWB), replaces only "Wyn Jones" with "Wyn Jones (rugby union)".
What I want to do is, convert instances like [[Wyn Jones|W. Jones]] to "[[Wyn Jones (rugby union)]]". For that particular example, I can use something similar to:
Match m = Regex.Match(ArticleText, @"\[\[[W]yn Jones\|W. Jones\]\])");
if (m.Success) ArticleText = ArticleText.Replace("\[\[Wyn Jones\|W. Jones\]\]", "\[\[Wyn Jones (rugby union)\]\]");
But how do I replace whatever thats between "|" and "]]"? In this example, it is "W.Jones"; but in next it might be "Wyn J." I think we could use
abc(?=xyz)
but I am not sure how to. Basically we have to tell the program to look for "[[Wyn Jones" followed by "|" and replace whatever is upto "]]".
Any feedback will be appreciated a lot.
What I have tried:
following is the entire code that I have created. Note that first 17 lines of the codes are embedded in the AWB itself, so my code begins with line 18. Also, I ran only the first module without second module, and it worked fine. Apparently module2 doesnt have any effect. The module1 changed [[Wyn Jones]] to [[Wyn Jones (rugby union)]], [[Wyn Jones (rugby union)]] remained unchanged, [[Wyn Jones (rugby union)|Wyn Jones]] also remained unchanged, [[Wyn Jones (rugby union)|rugby player]] also remained unchanged. All of this is intended, but [[Wyn Jones|rugby player]] remained unchanged as well. That should have been replaced as [[Wyn Jones (rugby union)|Wyn Jones]].
In very simple words: I want to find all the
[[Wyn Jones]], and
[[Wyn Jones|(any variable words)]], and these to be replaced with "[[Wyn Jones (rugby union)|Wyn Jones]]".
Currently, [[Wyn Jones]] is being handled, but not the variables. To achieve that, we first have to find for
[[Wyn Jones|, and then we have to tell the program to replace [[Wyn Jones| and
]], and whatever comes between these two with "[[Wyn Jones (rugby union)|Wyn Jones]]". To put it in even simpler terms, we have to term A, and then replace term A, B, and whatever comes between these two with term C.
AWB also supports programs in C#. So it is not necessary to be done in regex. A program in C# can work as well.
public string ProcessArticle(string ArticleText, string ArticleTitle, int wikiNamespace, out string Summary, out bool Skip)
{
Skip = false;
Summary = "fixed disamb link(s)";
ArticleText = CustomModule1(ArticleText, ArticleTitle, wikiNamespace, ref Summary, ref Skip);
ArticleText = CustomModule2(ArticleText, ArticleTitle, wikiNamespace, ref Summary, ref Skip);
return ArticleText;
}
public string CustomModule1(string ArticleText, string ArticleTitle, int wikiNamespace, ref string Summary, ref bool Skip)
{
if (!Skip)
{
Summary = "fixed disamb link(s)";
Match m = Regex.Match(ArticleText, @"\[\[[W]yn Jones]]");
if (m.Success) ArticleText = ArticleText.Replace("[[Wyn Jones]]", "[[Wyn Jones (rugby union)]]");
else ArticleText += "";
}
return ArticleText;
}
public string CustomModule2(string ArticleText, string ArticleTitle, int wikiNamespace, ref string Summary, ref bool Skip)
{
if (!Skip)
{
Summary = "fixed disamb link(s)";
Match m = Regex.Match(ArticleText, @"\[\[[W]yn Jones\|");
if (m.Success) ArticleText = Regex.Replace(ArticleText, @"[[Wyn Jones\[[Wyn Jones|, ]]}", "[[Wyn Jones|Wyn Jones (rugby union)]]", RegexOptions.IgnoreCase);
}
return ArticleText;
}