Note: this has become too lengthy a reply to include as a comment, imho.
A direction I'd like to point you in, given the requirements of your parser handling both the input string shown in the original question, and the new, revised, input string:
~ 1. "meditating on the data:" looking for patterns
a. consider the two possible strings to be parsed:
string RangeNum1 = "R(1 - 9) AND B750KK OR R(4 - 12) AND S750FF OR R(1 - 10) AND Y750RR";
string RangeNum2 = "R(1 - 9) AND B750KK NOT IN StarX OR R(4 - 12) AND S750FF OR R(1 - 10) AND Y750RR NOT IN MoonX";
b. I find it helpful to actually go ahead and re-format the input strings; it helps me visualize what can be done:
string RangeNum1 =
"R(1 - 9) AND B750KK
OR R(4 - 12) AND S750FF
OR R(1 - 10) AND Y750RR";
string RangeNum2 =
"R(1 - 9) AND B750KK NOT IN StarX
OR R(4 - 12) AND S750FF
OR R(1 - 10) AND Y750RR NOT IN MoonX";
What "jumps out at me" from studying the examples is that: in both cases you can divide the string to be parsed based on "OR."
c. Then, I think it's helpful to visualize what needs to be stripped away by mentally "boiling down" the string to its essentials:
d. Then, move on to start actually coding.
~ 2. initial coding and testing
a. I conclude that if I split either string using "OR" as the delimiter, I will have a set of valid sub-strings to parse. So I can implement, and test, that:
string[] OrStrings;
string[] split1 = new string[] {"OR"};
OrStrings = RangeNum2.Split(split1, StringSplitOptions.RemoveEmptyEntries);
b. Then, having tested that, and further having "immersed myself in the data," I can think about what I need to get rid of, to trim away, to leave me with the
minimal usable set of parameters to be used in actually implementing the parsing.
c. Since I know that each item rendered by the split operation needs to be processed, I can now make a first attempt to define the loop, and to remove extraneous information in the loop:
foreach(string orstr in OrStrings)
{
finalString = orstr.Trim();
finalString = finalString.Remove(0,2);
finalString = finalString.Replace(") AND", "");
finalString = finalString.Replace("-", "");
}
d. after running this test, and carefully observing the results of parsing both original, and revised, input strings, then I will start thinking about what needs to be done next. And, at this point I will usually make notes for the future, like:
1. for final code: need to use StringBuilder !
2. for final code: can I somehow improve, or combine, Replace, and Remove operation ?
e. since I observe that I can handle the first three items in 'finalString in both original, and revised, input format cases, and, for this stage in the solution, ignore the additional content in the revised example, temporarily, my next task is to make the solution work using only the first three entries. That will tell me
to what extent I can reuse the code for parsing the first input data format in parsing the revised input data format.
f. so now I focus on how to split the substrings in 'finalString usefully: and it's obvious that I need to split using a space character only.
string[] RangeStrings;
char[] split2 = new char[] {' '};
rangeStrings = finalString.Split(split2, StringSplitOptions.RemoveEmptyEntries);
if(rangeStrings.Length < 3 || rangeStrings.Length > 6) throw new IndexOutOfRangeException();
So now I have an array of either three items (original string), or six items (revised string). I can now "sketch in" what my code is going to look like for processing these:
string key = rangeStrings[2];
if (! mustInclude.Contains(key)) continue;
bool doProcess = true;
if (rangeStrings.Length == 6)
{
if (rangeStrings[3] + rangeStrings[4] == "NOTIN")
{
string testString = rangeStrings[5];
if (testString == "MoonX")
{
if (MoonX.Contains(key)) doProcess = false;
}
else if (testString == "StarX")
{
if (StarX.Contains(key)) doProcess = false;
}
}
}
if(! doProcess) continue;
int start = Convert.ToInt32(rangeStrings[0]);
int end = Convert.ToInt32(rangeStrings[1]);
buildOutput(start, end, key);
Once again, at this point, I'd "step back," and make notes, or insert comments in the code, maybe go back and refactor the code into separate method calls for clarity. Test possible revisions of the code for improved efficiency, or memory conservation, etc.
Obviously I'll have to implement a method 'buildOutput that takes the integer values for 'start and 'end, and the string 'key as parameters.
[1] But, what kind of data structure do I want/need to build ? Or, do I even need to build a data structure ?
There are many possibilities for that, and, imho,
this is the time to think about what the data structure, if any,
should be, because: depending on what your overall goal is, and the extent to which the "distilled" data needs to be reused in your application,
you might make very different choices.
~ 3. summary
The overall strategy I'm trying to demonstrate with all this (and, it's only one of many possible strategies ... one that reflects my temperament and biases) is what some would call "divide and conquer," or "incremental" or "step-wise" solution. I like to think of it as a strategy where you alternate coding with "meditating on your data (working result set)," and testing, breaking off "small chunks," and solving the problems inherent in those "chunks," and then moving on to "bigger picture" aspects of the solution.
At each state of the solution process, I think it's valuable to "pull back," and reflect on what's been done, make notes for future improvements (perhaps as comments in your code) to be implemented when you have reached the proof-of-concept stage.
~ 4. questions for you
Have you tried running the first two solutions on this thread you've got now with the revised RangeNum string, and seeing what happens ?
Have you tried modifying Matt's code to handle the revised data ?
Depending on where you are in your software career,
is this the right time for you to make a major investment to learn RegEx (imho, RegEx is a programming language in "its own right") ?
What are you doing now to solve parsing the revised problem ?
The revised problem has the challenge that: the first entry it will parse is the one where you want 'S750FF in rows 4~12 of your output, and the output is "empty:" that requires you (depending on whatever data structure you created and use to store your "results" in) to create some virtual "empty place-holders" so that the first occurrence of 'S750FF is at position #4 in "whatever:" so that the next parsing step, where you handle the need for 'Y750RR in positions 1~10, gives correct results.
That's an example of where a two-pass process of parsing
might become valuable: since, if you began parsing the 'Y750RR entry before the 'S750FF entry, the first ten rows would be already defined ... although you'd still need to create new rows for 'S750FF at positions 11~12. Such optimizations become "worth the time/money" to develop them when your application is under "high load," and performance is critical.
In considering a strategy to handle both the original string format, and the revised string format, it's important to clarify how "robust" the solution has to be in terms of handling future variations in the data.
For example, is it the case that in the future, you may need to deal with something like:
string RangeNum3 = "R(1 - 9) AND B750KK NOT IN StarX NOT IN MOONX OR R(4 - 12) AND S750FF OR R(1 - 10) AND Y750RR IN STARX NOT IN MoonX";
Where there are multiple logic clauses, to determine if the case is handled ?
Some clarification, please.