Search and insert a string in a file using regex

Question

0.00/5 (No votes)

See more:

Hello all,

I want to verify a file so that no nested braces should be present in that file. If present it is supposed that user forgot to give closing brace(i.e. file is not correct). And my code should insert a closing brace after the last occurrence of semicolon. Like the following: If the file contains

abc{def;ghe;ijk{lmn}

Then this should be after correction: abc{def;ghe;}ijk{lmn}

Problem1: My code for verification of the file is like

C#

Regex regex = new Regex(@"\{(.*?[^{][^}])\{");

Match match = regex.Match(streamReader.ReadToEnd());

                if (match.Success)

//report error and correct the file

This is working for single line statement(i.e. if "{def;ghe;ijk{" is in one line or in the next line(i.e. "{def;ghe;ijk" is in one line and "{" is on the next line)). But if i give 2 line breaks between ghe;ijk and { then the pattern is not matching.

Help me find the corrected regex.

Problem2: How can I insert a closing brace after the last occurrence of a semicolon in the matched pattern and write it into the same file? I want to do it using regex. I mean,the steps will be like: getting the content between 2 opening braces which doesn't contain any opening or closing braces. And then the content will be checked for the last occurrence of semicolon and add a closing brace after it.

Please Help me out !!!

Posted 31-Jan-12 23:27pm

Sovan Kumar Das

Updated 1-Feb-12 9:45am

Orcun Iyigun

v2

Add a Solution

Comments

BillWoodruff 1-Feb-12 11:59am

This looks like an "intractable" problem to me. How can you possibly evaluate if the person who typed: "abc{def;ghe;ijk{lmn}" did not intend to type "abc{def;ghe;ijk} {lmn}" ? or "abc{def; ghe; {ijk}} {lmn} ?

A major reason we use IDE's, like Visual Studio, is the visual help they give us in making sure our nested structures of whatever are correctly delineated.

Sovan Kumar Das 2-Feb-12 1:15am

My file looks like this. It doesn't contain any nested braces. So I want to verify and correct it by my code.
Thanks.

Andreas Gieriet 2-Feb-12 1:52am

Your Regex does look a bit odd. You might try the following to overcome problem 1: @"\{[^}{]*?\{".

Problem 2: snice you say it's not a nested problem, you might be able to do it with regex. For that, you had to list all possible legal and not legal blocks, concattenate and group them accordingly in a regex, and replace the non-legal blocks with the block plus the closing }.
Quite a challenge... Do it with a parser (e.g. see solution 2).

Cheers

Andi

Sovan Kumar Das 2-Feb-12 4:57am

Thanks a lot Andi.

Andreas Gieriet 2-Feb-12 13:03pm

See Solution #3 - this should solve your problems.

Andi

3 solutions

Solution 2

Regex can not handle nested problems (by definition).

You need to tokenize the stream and handle the nesting by pasing the tokens. The tokenizing can be done by Regex, though.

In the simples situation, you have simple stream of characters. Use the approach as suggested in soulton 1.

If the file is a programming language, you must tokenize according to the language in order to get reasonable results.

For your case of pair-wise elements, you could tokenize as follows:
1. comments
2. string literals
3. character literals
4. { and }
5. rest (individual irrelevant characters for the given problem)

Then you write a simple parser that eats up all tokens one after the other and increment with the opening { and decrement with the closing } .

Postcondition: counter == 0 or error.

And here comes the code for C#:

C#

string file = @"your-full-path-to-thissource-file.cs";
string data = File.ReadAllText(file);

string cmt = @"//.*?$|/\*[\s\S]*?\*/";
string str = @"@""(?:""""|[\s\S])*?""|""(?:\.|.)*?""";
string chr = @"'[^']*?'";
string rex = "(?:" + cmt +"|"+str+"|"+chr+@")|([{}])|[\s\S]";
Regex tokens = new Regex(rex, RegexOptions.Compiled | RegexOptions.Multiline);
var q = from m in tokens.Matches(data).Cast<Match>()
        where m.Groups[1].Success
        select m.Groups[1].Value == "{";
int level = 0;
foreach (bool plus in q)
{
    Console.WriteLine("{0}", plus);
    level += plus ? 1 : -1;
}
Console.WriteLine("level = {0}", level);

Have fun!
BTW: If it's a homework assignment, try to explain it to your teacher ;-)

Cheers

Andi

Posted 1-Feb-12 10:21am

Andreas Gieriet

Updated 1-Feb-12 10:46am

v2

Comments

Sovan Kumar Das 2-Feb-12 1:48am

It's a small part of my project. I made it manually but later I thought whether there is any approach for the problem to solve it through regex. I was thinking for comments and suggestions from you guys. That's why I posted here.
BTW Thanks a lot.
I just want the simplest but efficient solution for the problem. The file is generated by some code. And it should not contain any nested braces and if contains then it is supposed to be a mistake which may happen by the time of creation of the file. So in that case the solution is that put a closing brace at the specified place because there will never be a closing brace for the opening brace which contains the nested brace.

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

TRK3 · Accepted Answer · 2012-02-01T08:36:00

Solution 1

Regex is the wrong tool for the job.

The easiest way to do it is the simplest.

Scan through the file, reading one character at a time into a buffer.

Set a flag if you see an open-brace.

If you see a semi-colon write out the buffer to a temp file.

If you see a closed-brace write out the buffer to the temp file and unset your flag.

If you see an open-brace while the flag is set, write a close-brace to the temp-filem then write the buffer to the temp file and set a flag to remember that you've inserted something in the file.

When you get to the end of the file, write out the buffer. If you have an unmatched open-brace, write out a close brace.

If you inserted anything into the file, delete the original file and rename the temp file to the original. If you didn't insert anything, delete the temp file.

Don't try to do it with regex -- it might be possible, but it's not pretty and it's not easy to debug or fix or add additional functionality to. (And it's not as efficient if you actually have to modify the file.)

Posted 1-Feb-12 8:36am

TRK3

Comments

Sovan Kumar Das 2-Feb-12 1:20am

Actually,I did it the way you are saying. But for a very lengthy file it is time consuming. So I was thinking to solve it through regex. I was looking for something like
"{\<contentbeforethelastsemicolon\>(;)+\<contentafterthelastsemicolon\>{"
(This is a rough regex)And from the regex I can extract the two content and adding an extra brace is easier.
Or I just split the (in the regex "{\<content\>{") content by semicolon. Then add one extra brace after the last one.
Or I can extract the content (between two opening brace which doesn't contain any closing brace) by Match.Value and modify it.
So I wanted your suggestions guys. Thank you for your reply.
Thanks a lot.

TRK3 2-Feb-12 13:37pm

Regex won't actually speed it up. In fact, it is much more likely to make it worse on a long file if you have a * in your regex.

Fundamentally, any algorithm would have to read in the file and scan through it character by character. That's exactly what you are doing.

Regex would potentially be worse if it has to backtrack and try different possible partial matches.

If you want to speed it up on a long file then:

(a) make sure your algorigthm is O(N). That is, you are only scanning through the file once and you aren't doing something silly like having to rescan from the beginning to find the next character.

Pay strict attention to what your intermediate data structures are [i.e. array vs. list vs. string] and how you are accessing them. In particular, beware of using something like a string for your temp buffer. A string potentially gets re-allocated and copied every time you append another character to it. Without your being aware of it your algorithm is suddenly O(N^2), which will definitely kill your performance on a long file.

(b) After that profile your code and find out where you are spending the most time.

The problem would be much simpler if you wrote this program in plain old C rather than C#. The problem doesn't require all the fancy advantages of a garbage collected object oriented language, and all those advantages add overhead that you normally don't thing about, but kill your performance in a case like this. In C, none of that overhead is hidden from you, you know exactly where and when it occurs because you have to do it explicitly.

Sovan Kumar Das 3-Feb-12 1:24am

Hi TRK3,
Thank you a trillion..
I wanted this kind of suggestions..
Like it.

Andreas Gieriet · Accepted Answer · 2012-02-02T06:47:00

Now, this should solve your problem #1 and #2:

C#

        static void Main(string[] args)
        {
            Verify("abc{def;ghe;ijk{lmn}");
            Verify(@"{def;ghe;ijk{
{def;ghe;ijk");
            Verify("");
        }
        public static string Verify(string data)
        {
            StringBuilder sb = new StringBuilder();
            //           1             2           3           4        5
            string p = @"(\{[^}{]*?\})|(\{[^}{]*;)|(\{[^}{;]*)|([^}{]+)|(\})";
            Regex rex = new Regex(p, RegexOptions.Compiled);
            foreach (Match m in rex.Matches(data).Cast<Match>())
            {
                if (m.Groups[2].Success
                    || m.Groups[3].Success
                    || m.Groups[5].Success)
                    Console.WriteLine("fixing error");

                if (m.Groups[2].Success)
                    sb.AppendFormat("{0}{1}", m.Groups[2].Value, "}");
                else if (m.Groups[3].Success)
                    sb.AppendFormat("{0}{1}", m.Groups[3].Value, "}");
                else if (m.Groups[4].Success)
                    sb.Append(m.Groups[4].Value);
                else if (m.Groups[5].Success)
                    sb.Append("{}");
                else
                    sb.Append(m.Groups[1].Value);
            }
            string s = sb.ToString();

            Console.WriteLine("from {0}", data);
            Console.WriteLine("to   {0}", s);

            return s;
        }

Cheers

Andi

Search and insert a string in a file using regex

3 solutions

Solution 1

Solution 3

Solution 2

Add your solution here

Preview 0

Existing Members

...or Join us