Click here to Skip to main content
15,881,248 members
Please Sign up or sign in to vote.
2.67/5 (4 votes)
See more:
I have a complicated problem I am trying to solve. Here it goes:

This is my string:

string RangeNum1 = "R(1 - 9) AND B750KK OR R(4 - 12) AND S750FF OR R(1 - 10) AND Y750RR";

I want to loop through the ranges and output it with the condition is true; Results should look like this:

OUTPUT

1 B750KK, Y750RR
2 B750KK, Y750RR
3 B750KK, Y750RR
4 B750KK, S750FF, Y750RR
5 B750KK, S750FF, Y750RR
6 B750KK, S750FF, Y750RR
7 B750KK, S750FF, Y750RR
8 B750KK, S750FF, Y750RR
9 B750KK, S750FF, Y750RR
10 S750FF, Y750RR
11 S750FF
12 S750FF

Thanks for your help in advance!

Matt T Heffron answer was great for my original question.

Is it possible to further process Matt T Heffron answer with the following extra requirements:

EXTRA REQUIREMENTS:

string RangeNum1 = "R(1 - 9) AND B750KK NOT IN StarX OR R(4 - 12) AND S750FF OR R(1 - 10) AND Y750RR NOT IN MoonX";

string mostExist = "B750KK, S750FF, T768RR, F453PP";

string StarX = "B750KK, S750FF, T768RR, F453PP";

string MoonX = "N750KK, D768DD, A453AA";

So basically, the output most meet this conditions before writing it in the final table. This will be the condition:

1. It must exist in the MostExist string. If it doesn't exist in the MostExist string it would not be written in the table.

2. If it says NOT IN StarX, then it must exist in MostExist string and must not exist in StarX. If it exist in the StarX then it should not be written in the final table.

I don't know if my example is clear. Look forward to your help?
Posted
Updated 29-Sep-13 11:38am
v9
Comments
Richard C Bishop 25-Sep-13 16:56pm    
What have you tried?
Sergey Alexandrovich Kryukov 25-Sep-13 17:01pm    
And what's the problem? http://www.whathaveyoutried.com so far?
—SA
BillWoodruff 26-Sep-13 0:00am    
I'm curious about how you are doing with solving this interesting string manipulation problem. Are you satisfied with the answers so far which use RegEx (both of which I think are quite good), or are you looking for an example of how to achieve this without using RegEx ?
CodeBuks 28-Sep-13 1:04am    
Hi Bill, I have updated my question. Do you have an idea about how to solve this extra requirements?
Akinmade Bond 29-Sep-13 12:05pm    
On your "requirement" can you be more explicit on how the output would look like?

I'd do it like this:
C#
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace ConsoleApplication20
{
  class Program
  {
    static void Main(string[] args)
    {
      string rangeNum1 = "R(1 - 9) AND B750KK OR R(4 - 12) AND S750FF OR R(1 - 10) AND Y750RR";
      ParseAndReportRangeMap(rangeNum1);
    }

    private static readonly Regex AndClauses = new Regex(@"\bR\((?<low>\d+) *- *(?<high>\d+)\) +AND +(?<term>[A-Z0-9]+)", RegexOptions.Compiled | RegexOptions.IgnoreCase);
    private static void ParseAndReportRangeMap(string rangeConditions)
    {
      if (string.IsNullOrWhiteSpace(rangeConditions))
        return;

      SortedDictionary<int, List<string>> rangeMap = new SortedDictionary<int, List<string>>();
      var clauses = AndClauses.Matches(rangeConditions);

      foreach (Match clause in clauses)
      {
        var groups = clause.Groups;
        int low = int.Parse(groups["low"].Value);
        int high = int.Parse(groups["high"].Value);
        string term = groups["term"].Value;
        for (int i = low; i <= high; i++)
        {
          List<string> terms;
          if (!rangeMap.TryGetValue(i, out terms))
          {
            terms = new List<string>();
            rangeMap[i] = terms;
          }
          terms.Add(term);    // not checking for duplicates
        }
      }

      foreach (var item in rangeMap)
      {
        Console.WriteLine("{0} {1}", item.Key, string.Join(", ", item.Value));
      }
    }
  }
}
 
Share this answer
 
Comments
BillWoodruff 26-Sep-13 0:01am    
+5 Excellent answer ! I'm going to learn a lot studying your code. What you are doing with a SortedDictionary is very interesting. thanks, Bill
Akinmade Bond 26-Sep-13 5:29am    
+5 Really good.
CodeBuks 27-Sep-13 21:55pm    
+5 Excellent! Thanks so much once again. Is it possible to process it further with the following extra requirements:

string RangeNum1 = "R(1 - 9) AND B750KK NOT IN StarX OR R(4 - 12) AND S750FF OR R(1 - 10) AND Y750RR NOT IN MoonX";

string mostExist = "B750KK, S750FF, T768RR, F453PP";

String StarX = "B750KK, S750FF, T768RR, F453PP";

String MoonX = "N750KK, S750FF, D768DD, A453AA";

So basically, the output most meet this conditions before writing it in the final table. This will be the condition:

1. It most exist in the MostExist string. If it doesn't exist in the MostExist string it would not be written in the table.

2. If it says NOT IN StarX, then it most exist in MostExist string and most not exist in StarX. If it exist in the StarX then it should not be written in the final table.

I don't know if my example is clear. Look forward to your help?
Note: this has become too lengthy a reply to include as a comment, imho.

A direction I'd like to point you in, given the requirements of your parser handling both the input string shown in the original question, and the new, revised, input string:

~ 1. "meditating on the data:" looking for patterns

a. consider the two possible strings to be parsed:
C#
string RangeNum1 = "R(1 - 9) AND B750KK OR R(4 - 12) AND S750FF OR R(1 - 10) AND Y750RR";

string RangeNum2 = "R(1 - 9) AND B750KK NOT IN StarX OR R(4 - 12) AND S750FF OR R(1 - 10) AND Y750RR NOT IN MoonX";
b. I find it helpful to actually go ahead and re-format the input strings; it helps me visualize what can be done:
C#
string RangeNum1 =
    "R(1 - 9) AND B750KK
    OR R(4 - 12) AND S750FF
    OR R(1 - 10) AND Y750RR";

string RangeNum2 =
    "R(1 - 9) AND B750KK NOT IN StarX
    OR R(4 - 12) AND S750FF
    OR R(1 - 10) AND Y750RR NOT IN MoonX";
What "jumps out at me" from studying the examples is that: in both cases you can divide the string to be parsed based on "OR."

c. Then, I think it's helpful to visualize what needs to be stripped away by mentally "boiling down" the string to its essentials:
C#
/*string RangeNum1 =
    "1 9 B750KK
    4 12 S750FF
    1 10 Y750RR";

string RangeNum2 =
    "1 9 B750KK NOT IN StarX
    4 12 S750FF
    1 10 Y750RR NOT IN MoonX";*/
d. Then, move on to start actually coding.

~ 2. initial coding and testing

a. I conclude that if I split either string using "OR" as the delimiter, I will have a set of valid sub-strings to parse. So I can implement, and test, that:
C#
// in Form scope
string[] OrStrings;
string[] split1 = new string[] {"OR"};

// inside some method/function/EventHandler, etc.
// note we use split with an array of string here
OrStrings = RangeNum2.Split(split1, StringSplitOptions.RemoveEmptyEntries);
b. Then, having tested that, and further having "immersed myself in the data," I can think about what I need to get rid of, to trim away, to leave me with the minimal usable set of parameters to be used in actually implementing the parsing.

c. Since I know that each item rendered by the split operation needs to be processed, I can now make a first attempt to define the loop, and to remove extraneous information in the loop:
C#
foreach(string orstr in OrStrings)
{
    // trim white space front/back, just in case
    finalString = orstr.Trim();

    // remove the characters "R("
    finalString = finalString.Remove(0,2);

    // remove ") AND";
    finalString = finalString.Replace(") AND", "");

    // remove "-"
    finalString = finalString.Replace("-", "");
}
d. after running this test, and carefully observing the results of parsing both original, and revised, input strings, then I will start thinking about what needs to be done next. And, at this point I will usually make notes for the future, like:

1. for final code: need to use StringBuilder !
2. for final code: can I somehow improve, or combine, Replace, and Remove operation ?

e. since I observe that I can handle the first three items in 'finalString in both original, and revised, input format cases, and, for this stage in the solution, ignore the additional content in the revised example, temporarily, my next task is to make the solution work using only the first three entries. That will tell me to what extent I can reuse the code for parsing the first input data format in parsing the revised input data format.

f. so now I focus on how to split the substrings in 'finalString usefully: and it's obvious that I need to split using a space character only.
// in Form scope
string[] RangeStrings;
// note we use split with an array of char here
char[] split2 = new char[] {' '};

// inside the main loop that parses the original input string: see 2.c. above
// get the start index, end index, and value
rangeStrings = finalString.Split(split2, StringSplitOptions.RemoveEmptyEntries);
// check to make sure I have a valid entry
if(rangeStrings.Length < 3 || rangeStrings.Length > 6) throw new IndexOutOfRangeException();
So now I have an array of either three items (original string), or six items (revised string). I can now "sketch in" what my code is going to look like for processing these:
C#
// will definitely need to reuse the key terms, like 'S750FF, in each entry to be parsed.
string key = rangeStrings[2];

// in every case the key string must be in the 'mustInclude string !
if (! mustInclude.Contains(key)) continue; 

// do we need to consider the revised format
bool doProcess = true;

if (rangeStrings.Length == 6)
{
    // test the two cases in which we'll exclude the current entry
    // from being processed further

    // tbd: this kind of sucks: clean this up

    if (rangeStrings[3] + rangeStrings[4] == "NOTIN")
    {
        // the exclusion key is always in position #5 ... we hope
        string testString = rangeStrings[5];

        if (testString == "MoonX")
        {
            if (MoonX.Contains(key)) doProcess = false;
        }
        else if (testString == "StarX")
        {
            if (StarX.Contains(key)) doProcess = false;
        }
    }
}

// keep going ?
if(! doProcess) continue;

// now we're going to need the range in integers
int start = Convert.ToInt32(rangeStrings[0]);
int end = Convert.ToInt32(rangeStrings[1]);

// create the final data structure [1]
buildOutput(start, end, key);

// create a report ?
// tbd
Once again, at this point, I'd "step back," and make notes, or insert comments in the code, maybe go back and refactor the code into separate method calls for clarity. Test possible revisions of the code for improved efficiency, or memory conservation, etc.

Obviously I'll have to implement a method 'buildOutput that takes the integer values for 'start and 'end, and the string 'key as parameters.

[1] But, what kind of data structure do I want/need to build ? Or, do I even need to build a data structure ?

There are many possibilities for that, and, imho, this is the time to think about what the data structure, if any, should be, because: depending on what your overall goal is, and the extent to which the "distilled" data needs to be reused in your application, you might make very different choices.

~ 3. summary

The overall strategy I'm trying to demonstrate with all this (and, it's only one of many possible strategies ... one that reflects my temperament and biases) is what some would call "divide and conquer," or "incremental" or "step-wise" solution. I like to think of it as a strategy where you alternate coding with "meditating on your data (working result set)," and testing, breaking off "small chunks," and solving the problems inherent in those "chunks," and then moving on to "bigger picture" aspects of the solution.

At each state of the solution process, I think it's valuable to "pull back," and reflect on what's been done, make notes for future improvements (perhaps as comments in your code) to be implemented when you have reached the proof-of-concept stage.

~ 4. questions for you

Have you tried running the first two solutions on this thread you've got now with the revised RangeNum string, and seeing what happens ?

Have you tried modifying Matt's code to handle the revised data ?

Depending on where you are in your software career, is this the right time for you to make a major investment to learn RegEx (imho, RegEx is a programming language in "its own right") ?

What are you doing now to solve parsing the revised problem ?

The revised problem has the challenge that: the first entry it will parse is the one where you want 'S750FF in rows 4~12 of your output, and the output is "empty:" that requires you (depending on whatever data structure you created and use to store your "results" in) to create some virtual "empty place-holders" so that the first occurrence of 'S750FF is at position #4 in "whatever:" so that the next parsing step, where you handle the need for 'Y750RR in positions 1~10, gives correct results.

That's an example of where a two-pass process of parsing might become valuable: since, if you began parsing the 'Y750RR entry before the 'S750FF entry, the first ten rows would be already defined ... although you'd still need to create new rows for 'S750FF at positions 11~12. Such optimizations become "worth the time/money" to develop them when your application is under "high load," and performance is critical.

In considering a strategy to handle both the original string format, and the revised string format, it's important to clarify how "robust" the solution has to be in terms of handling future variations in the data.

For example, is it the case that in the future, you may need to deal with something like:

string RangeNum3 = "R(1 - 9) AND B750KK NOT IN StarX NOT IN MOONX OR R(4 - 12) AND S750FF OR R(1 - 10) AND Y750RR IN STARX NOT IN MoonX";

Where there are multiple logic clauses, to determine if the case is handled ?

Some clarification, please.
 
Share this answer
 
v5
Comments
Akinmade Bond 29-Sep-13 11:46am    
Very detailed. :thumbsup: +5
Abhinav Gauniyal 29-Sep-13 16:42pm    
Serious explanation :upped
CodeBuks 29-Sep-13 18:08pm    
+5 Great Explanation:

Thanks for breaking down the problem in such a simply way. Matt's Answer gave me the solution I was looking for my original question and I was just seeing if it was possible to process it further with the extra requirements. I have no knowledge of REGEX, hence, I couldn't process it further. You are very right I need to invest time learning Regex. I am planning on devoting lot of time learning Regex as I often and have need to manipulate strings.

I am a definitely not a pro, am still trying to sharpen my skills. I appreciate the time and effort you spend on this problem.

I have revised the question, based on the strings you worked on there will be no output. The output based on the revised question will be this:

OUTPUT:

4 S750FF
5 S750FF
6 S750FF
7 S750FF
8 S750FF
9 S750FF
10 S750FF
11 S750FF
12 S750FF

Because only S750FF meets the condition.
BillWoodruff 29-Sep-13 21:02pm    
Hi, I'm glad you found some value in my response; I hope other people on CP may benefit from it. It was an interesting problem to think about.

I never really mastered RegEx, I regret, but an issue I have with it ... in commercial software development ... is that unless a complex RegEx expression is documented very completely, there is a risk in the future that when the code needs maintaining, or changing, it could be as time-consuming for someone new to the code to understand, and use it, as it was for the person who developed it, and tested it, to create.

In terms of the development process, however, if I wanted to use RegEx, I would apply the same approach I (hopefully) demonstrated in my response: step-by-step exploration, and testing, and making frequent notes in the code.

good luck, Bill
Matt T Heffron 30-Sep-13 12:41pm    
+5 for the detailed analysis of attacking the problem.

BTW. Regex is not that hard to get started with and usually a simple regex is sufficient for a problem. (Of course, there are those who have "lived in Regex-land" or "Perl" who can do amazing things with it, and that code will be nearly incomprehensible!)
If you need to capture the ranges in the expression better, you could use Regular Expressions[^]
C#
using System.Text.RegularExpressions;

Regex rx = new Regex(@"\bR\((?<range>\d+ ?- ?\d+)\)",
              RegexOptions.Compiled | RegexOptions.IgnoreCase);

          string text = "R(1 - 9) AND B750KK OR R(4 - 12) AND S750FF OR R(1 - 10) AND Y750R";
          
            // Find matches.
            MatchCollection matches = rx.Matches(text);

          foreach (Match match in matches)
          {
              GroupCollection groups = match.Groups;
              MessageBox.Show(groups["range"].Value);
          }
 
Share this answer
 
v3
Comments
BillWoodruff 26-Sep-13 0:05am    
Nice ! Upvoted. This gets you the ranges, but the task of creating the final output/strings using the ranges is not addressed.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900