Click here to Skip to main content
15,887,822 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hello all,

I want to verify a file so that no nested braces should be present in that file. If present it is supposed that user forgot to give closing brace(i.e. file is not correct). And my code should insert a closing brace after the last occurrence of semicolon. Like the following: If the file contains

abc{def;ghe;ijk{lmn}

Then this should be after correction: abc{def;ghe;}ijk{lmn}

Problem1: My code for verification of the file is like

C#
Regex regex = new Regex(@"\{(.*?[^{][^}])\{");

Match match = regex.Match(streamReader.ReadToEnd());

                if (match.Success)

//report error and correct the file
This is working for single line statement(i.e. if "{def;ghe;ijk{" is in one line or in the next line(i.e. "{def;ghe;ijk" is in one line and "{" is on the next line)). But if i give 2 line breaks between ghe;ijk and { then the pattern is not matching.

Help me find the corrected regex.

Problem2: How can I insert a closing brace after the last occurrence of a semicolon in the matched pattern and write it into the same file? I want to do it using regex. I mean,the steps will be like: getting the content between 2 opening braces which doesn't contain any opening or closing braces. And then the content will be checked for the last occurrence of semicolon and add a closing brace after it.

Please Help me out !!!
Posted
Updated 1-Feb-12 9:45am
v2
Comments
BillWoodruff 1-Feb-12 11:59am    
This looks like an "intractable" problem to me. How can you possibly evaluate if the person who typed: "abc{def;ghe;ijk{lmn}" did not intend to type "abc{def;ghe;ijk} {lmn}" ? or "abc{def; ghe; {ijk}} {lmn} ?

A major reason we use IDE's, like Visual Studio, is the visual help they give us in making sure our nested structures of whatever are correctly delineated.
Sovan Kumar Das 2-Feb-12 1:15am    
My file looks like this. It doesn't contain any nested braces. So I want to verify and correct it by my code.
Thanks.
Andreas Gieriet 2-Feb-12 1:52am    
Your Regex does look a bit odd. You might try the following to overcome problem 1: @"\{[^}{]*?\{".

Problem 2: snice you say it's not a nested problem, you might be able to do it with regex. For that, you had to list all possible legal and not legal blocks, concattenate and group them accordingly in a regex, and replace the non-legal blocks with the block plus the closing }.
Quite a challenge... Do it with a parser (e.g. see solution 2).

Cheers

Andi
Sovan Kumar Das 2-Feb-12 4:57am    
Thanks a lot Andi.
Andreas Gieriet 2-Feb-12 13:03pm    
See Solution #3 - this should solve your problems.

Andi

Regex is the wrong tool for the job.

The easiest way to do it is the simplest.

Scan through the file, reading one character at a time into a buffer.

Set a flag if you see an open-brace.

If you see a semi-colon write out the buffer to a temp file.

If you see a closed-brace write out the buffer to the temp file and unset your flag.

If you see an open-brace while the flag is set, write a close-brace to the temp-filem then write the buffer to the temp file and set a flag to remember that you've inserted something in the file.

When you get to the end of the file, write out the buffer. If you have an unmatched open-brace, write out a close brace.

If you inserted anything into the file, delete the original file and rename the temp file to the original. If you didn't insert anything, delete the temp file.

Don't try to do it with regex -- it might be possible, but it's not pretty and it's not easy to debug or fix or add additional functionality to. (And it's not as efficient if you actually have to modify the file.)
 
Share this answer
 
Comments
Sovan Kumar Das 2-Feb-12 1:20am    
Actually,I did it the way you are saying. But for a very lengthy file it is time consuming. So I was thinking to solve it through regex. I was looking for something like
"{\<contentbeforethelastsemicolon\>(;)+\<contentafterthelastsemicolon\>{"
(This is a rough regex)And from the regex I can extract the two content and adding an extra brace is easier.
Or I just split the (in the regex "{\<content\>{") content by semicolon. Then add one extra brace after the last one.
Or I can extract the content (between two opening brace which doesn't contain any closing brace) by Match.Value and modify it.
So I wanted your suggestions guys. Thank you for your reply.
Thanks a lot.
TRK3 2-Feb-12 13:37pm    
Regex won't actually speed it up. In fact, it is much more likely to make it worse on a long file if you have a * in your regex.

Fundamentally, any algorithm would have to read in the file and scan through it character by character. That's exactly what you are doing.

Regex would potentially be worse if it has to backtrack and try different possible partial matches.

If you want to speed it up on a long file then:

(a) make sure your algorigthm is O(N). That is, you are only scanning through the file once and you aren't doing something silly like having to rescan from the beginning to find the next character.

Pay strict attention to what your intermediate data structures are [i.e. array vs. list vs. string] and how you are accessing them. In particular, beware of using something like a string for your temp buffer. A string potentially gets re-allocated and copied every time you append another character to it. Without your being aware of it your algorithm is suddenly O(N^2), which will definitely kill your performance on a long file.

(b) After that profile your code and find out where you are spending the most time.

The problem would be much simpler if you wrote this program in plain old C rather than C#. The problem doesn't require all the fancy advantages of a garbage collected object oriented language, and all those advantages add overhead that you normally don't thing about, but kill your performance in a case like this. In C, none of that overhead is hidden from you, you know exactly where and when it occurs because you have to do it explicitly.


Sovan Kumar Das 3-Feb-12 1:24am    
Hi TRK3,
Thank you a trillion..
I wanted this kind of suggestions..
Like it.
Now, this should solve your problem #1 and #2:

C#
        static void Main(string[] args)
        {
            Verify("abc{def;ghe;ijk{lmn}");
            Verify(@"{def;ghe;ijk{
{def;ghe;ijk");
            Verify("");
        }
        public static string Verify(string data)
        {
            StringBuilder sb = new StringBuilder();
            //           1             2           3           4        5
            string p = @"(\{[^}{]*?\})|(\{[^}{]*;)|(\{[^}{;]*)|([^}{]+)|(\})";
            Regex rex = new Regex(p, RegexOptions.Compiled);
            foreach (Match m in rex.Matches(data).Cast<Match>())
            {
                if (m.Groups[2].Success
                    || m.Groups[3].Success
                    || m.Groups[5].Success)
                    Console.WriteLine("fixing error");

                if (m.Groups[2].Success)
                    sb.AppendFormat("{0}{1}", m.Groups[2].Value, "}");
                else if (m.Groups[3].Success)
                    sb.AppendFormat("{0}{1}", m.Groups[3].Value, "}");
                else if (m.Groups[4].Success)
                    sb.Append(m.Groups[4].Value);
                else if (m.Groups[5].Success)
                    sb.Append("{}");
                else
                    sb.Append(m.Groups[1].Value);
            }
            string s = sb.ToString();

            Console.WriteLine("from {0}", data);
            Console.WriteLine("to   {0}", s);

            return s;
        }


Cheers

Andi
 
Share this answer
 
v3
Comments
Sovan Kumar Das 3-Feb-12 1:37am    
Hi Andi,
The result is what I want, with some performance issues. Like your approach. I am working on this to get going.
Thank you a trillion..
Andreas Gieriet 4-Feb-12 17:05pm    
Hello Sovan Kumar Das,

you might consider to rate one/some of the solutions and mark one/some as solving the problems at hand (see header of each solution).

Cheers

Andi
Regex can not handle nested problems (by definition).

You need to tokenize the stream and handle the nesting by pasing the tokens. The tokenizing can be done by Regex, though.

In the simples situation, you have simple stream of characters. Use the approach as suggested in soulton 1.

If the file is a programming language, you must tokenize according to the language in order to get reasonable results.

For your case of pair-wise elements, you could tokenize as follows:
1. comments
2. string literals
3. character literals
4. { and }
5. rest (individual irrelevant characters for the given problem)

Then you write a simple parser that eats up all tokens one after the other and increment with the opening { and decrement with the closing } .

Postcondition: counter == 0 or error.

And here comes the code for C#:

C#
string file = @"your-full-path-to-thissource-file.cs";
string data = File.ReadAllText(file);

string cmt = @"//.*?$|/\*[\s\S]*?\*/";
string str = @"@""(?:""""|[\s\S])*?""|""(?:\.|.)*?""";
string chr = @"'[^']*?'";
string rex = "(?:" + cmt +"|"+str+"|"+chr+@")|([{}])|[\s\S]";
Regex tokens = new Regex(rex, RegexOptions.Compiled | RegexOptions.Multiline);
var q = from m in tokens.Matches(data).Cast<Match>()
        where m.Groups[1].Success
        select m.Groups[1].Value == "{";
int level = 0;
foreach (bool plus in q)
{
    Console.WriteLine("{0}", plus);
    level += plus ? 1 : -1;
}
Console.WriteLine("level = {0}", level);


Have fun!
BTW: If it's a homework assignment, try to explain it to your teacher ;-)

Cheers

Andi
 
Share this answer
 
v2
Comments
Sovan Kumar Das 2-Feb-12 1:48am    
It's a small part of my project. I made it manually but later I thought whether there is any approach for the problem to solve it through regex. I was thinking for comments and suggestions from you guys. That's why I posted here.
BTW Thanks a lot.
I just want the simplest but efficient solution for the problem. The file is generated by some code. And it should not contain any nested braces and if contains then it is supposed to be a mistake which may happen by the time of creation of the file. So in that case the solution is that put a closing brace at the specified place because there will never be a closing brace for the opening brace which contains the nested brace.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900