Click here to Skip to main content
15,886,110 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
Updated

I need help with a regex that will find a "@[..[...]]" pattern.

I will try to explain.
----------------------
A text will contain placeholders which will be replaced with values upon display of that very same text.

A place holder has 3 parts;
- an open tag, starts with "@[" followed by "a dot delimited text" and ends with "[",
- a property list, a "comma separated list" with qouted (double qoutes) values,
- a close tag, "]]".

The property list items can contain one or many placeholders (nested) and both double qoutes (escaped) and brackets.

The regex must overcome the issues with nested placeholders by knowing when it reached the end of the "outer" placeholder as well as any escaped qoutes and brackets.

Sample
------
Consider the following text fragment:
Linklist
@[Link.AppText["[startpage]", "startpage"]]
@[Link.Text["[startpage] loggedin", "The \"@[Text.AppText["startpage"]]\" for users"]]
@[Link.Text["@[Link["startpage"]]", "@[Text.AppText["startpage"]]"]]


The text fragment match should look like this:
match 1  =  @[Link.AppText["[startpage]", "startpage"]]
   Gr.1  =  Link.AppText
   Gr.2  =  "[startpage]", "startpage"

match 2  =  @[Link.Text["[startpage] loggedin", "The \"@[Text.AppText["startpage"]]\" for users"]]
   Gr.1  =  Link.Text
   Gr.2  =  "[startpage] loggedin", "The \"@[Text.AppText["startpage"]]\" for users"

match 3  =  @[Link.Text["@[Link["startpage"]]", "@[Text.AppText["startpage"]]"]]
   Gr.1  =  Link.Text
   Gr.2  =  "@[Link["startpage"]]", "@[Text.AppText["startpage"]]"


With suggestion (by @ridgerunner from another community) I came this far:
@\[([._\w]+)\[([^[\]""]*(?:""[^""]*""[^[\]""]*)*)\]\]

@\[                                # Outer open delimiter.
([._\w]+)                          # 1:st group.
\[                                 # Inner open delimiter.
(                                  # Start of 2:nd group.
[^[\]""]*(?:""[^""]*""[^[\]""]*)*  # Contents.
)                                  # End of 2:nd group.
\]\]                               # Close delimiter.


Which gives the following result
match 1  =  @[Link.AppText["[startpage]", "startpage"]]
   Gr.1  =  Link.AppText
   Gr.2  =  "[startpage]", "startpage"

match 2  =  @[Text.AppText["startpage"]]
   Gr.1  =  Text.AppText
   Gr.2  =  "startpage"

match 3  =  @[Link.Text["@[Link["startpage"]]", "@[Text.AppText["startpage"]]"]]
   Gr.1  =  Link.Text
   Gr.2  =  "@[Link["startpage"]]", "@[Text.AppText["startpage"]]"


As you can see it doesn't match the wanted result. Match 2 is wrong.

BUT, if I change the escaped qoutes (from \" to "") I get this result:
match 1  =  @[Link.AppText["[startpage]", "startpage"]]
   Gr.1  =  Link.AppText
   Gr.2  =  "[startpage]", "startpage"

match 2  =  @[Link.Text["[startpage] loggedin", "The ""@[Text.AppText["startpage"]]"" for users"]]
   Gr.1  =  Link.Text
   Gr.2  =  "[startpage] loggedin", "The ""@[Text.AppText["startpage"]]"" for users"

match 3  =  @[Link.Text["@[Link["startpage"]]", "@[Text.AppText["startpage"]]"]]
   Gr.1  =  Link.Text
   Gr.2  =  "@[Link["startpage"]]", "@[Text.AppText["startpage"]]"


Left to solve is how to make it work with both "escaped" and "doubled" qoutes.

As discussed with Sergey (see below), I updated this question and of course a combination of regex and a parser could make things more fail-safe .. and .Net's "Balanced groups" might be an alternative as well.
Posted
Updated 27-Nov-13 21:02pm
v2
Comments
Sergey Alexandrovich Kryukov 27-Nov-13 13:10pm    
One little problem is: Regular expressions are fairly easy to write but surprisingly hard to understand. If you need help, you probably need to formally describe what you want to match and/or transform, not just on examples...
—SA
LGSon2 27-Nov-13 14:09pm    
I am very well aware of its complexity .. and I will try to explain:
- In a text I have placeholders which will be replaced with values upon display of that very same text.
- A place holder has 3 parts, an open tag, a property list and a closing tag.
- The open tag starts with "@[" and ends with "[" and contains a dot delimited text.
- The property list is in the form av a comma separated list with qouted values.
- The close tag ends with "]]".
So far this was quite easy to fix with regex, but when a property list items can have placeholders and contain single double qoutes, it became more difficult.
The regex must overcome these issues by both knowing when reached the end of each outer placeholder as well as take care of single qoutes. About quotes, I prefer if they don't have to be escaped as they can be dynamically added with some difficulties to take control of.
Sergey Alexandrovich Kryukov 27-Nov-13 15:14pm    
It needs some thinking. One problem could be quote mark nesting, right? The problem is that the same character is use for opening and closing quotation marks (a major problem of using " or '). This is something like a context-sensitive rule, not something Regex is well designed for. I used to solve a similar problem and found that direct scanning of text solves the problem, but not Regex...
—SA
LGSon2 27-Nov-13 15:55pm    
I have considered to write a parser though regex looked like a quicker solution. Maybe a combo where regex finds the placeholder and the parser do the property list. Then it will be easy to deal with both quotaion marks and nested placeholders. A concern is of course performance, whether regex or parser is faster.
Sergey Alexandrovich Kryukov 27-Nov-13 15:57pm    
Exactly, a combo. Maybe this is a good idea...
—SA

1 solution

After struggling with google search and reading about balanced group I finally got things working, though I had to alter the pattern slightly to make it work, at least for me :)
Regex:  @([._\w]+)\[\[""((?:[^\[\]]*|\[[^\[]|[^\]]\]|(?<counter>\[\[)|(?<-counter>\]\]))+(?(counter)(?!)))""\]\]

@([._\w]+)\[\[""          #   start tag, 1:st group
  (                       #   start 2:nd group
    (?:                   #   non capturing group
      [^\[\]]*            #   any char but [ or ]
      |                   #   or
      \[[^\[]             #   if [, not followed by a [
      |                   #   or
      [^\]]\]             #   if ], not followed by a ]
      |                   #   or
      (?<counter>\[\[)    #   counter start tag
      |                   #   or
      (?<-counter>\]\])   #   counter stop tag
    )+                    #   end non capturing group
    (?(counter)(?!))      #   if counter <> 0, regex fails
  )                       #   end 2:nd group
"\]\]                     #   end tag

Updated placeholders with new pattern; (@..[[...]]
Linklist
@Link.AppText[["[startpage]", "startpage"]]
@Link.Text[["[startpage] loggedin", "The "@Text.AppText[["startpage"]]" for users"]]
@Link.Text[["@Link[["startpage"]]", "@Text.AppText[["startpage"]]"]]

Which gives me exactly want I want:
match 1
   Gr.1   Link.AppText
   Gr.2   [startpage]", "startpage

match 2
   Gr.1   Link.Text
   Gr.2   [startpage] loggedin", "The "@Text.AppText[["startpage"]]" for users

match 3
   Gr.1   Link.Text
   Gr.2   @Link[["startpage"]]", "@Text.AppText[["startpage"]]
 
Share this answer
 
v2
Comments
Maciej Los 28-Nov-13 15:16pm    
Well done!
LGSon2 29-Nov-13 0:17am    
Thanks

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900