Is there a cleaner way? Parsing and Splitting Lists

Question

0.00/5 (No votes)

See more:

>I am working on a project for a major tire manufacturer, and they want to put drop downs on their website to help the user pick from among the tire sizes they offer. The drop downs should be like those on www.tirerack.com[^] and then when we click "Tires by Size."

Anyway, I know how to code the ASP etc etc. But my code is all

C#

var slashedItems = page.Options.FindAll(x => x.Contains('/') && !x.StartsWith("LT"));
            var itemsContainingX = page.Options.FindAll(x => x[2] == 'X');
            var slashedItemsWithLTInFront = page.Options.FindAll(x => x.Contains('/')
                && x.StartsWith("LT"));

            List<string> scratch1 = new List<string>(), scratch2 = new List<string>(), scratch3 = new List<string>();
            foreach (string item in slashedItems)
            {
                scratch1.Add(item.Split(new char[] { '/', 'R' })[0]);
                scratch2.Add(item.Split(new char[] { '/', 'R' })[1]);
                scratch3.Add(item.Split(new char[] { '/', 'R' })[2]);
            }

            // now remove dupes from the lists
            scratch1 = scratch1.Distinct().ToList();
            scratch2 = scratch2.Distinct().ToList();
            scratch3 = scratch3.Distinct().ToList();

            scratch1.Sort();
            scratch2.Sort();
            scratch3.Sort();

            _dropdown1List.AddRange(scratch1);
            _dropdown2List.AddRange(scratch2);
            _dropdown3List.AddRange(scratch3);

And I have to do this because as you know, according to this guide, tire sizes come in a few slighltly varying formats.

Anyway, there's got to be a more tight way to code this parsing algorithm, but nothing occurs to me up front. Page.Options is a List<string> and it contains a number of tire size entries, e.g.,

VB

155/80R13
165/65R13
165/65R14
165/80R13
175/55R15
175/65R14
175/65R15
33X12.50R15LT
345/35ZR19
8R19.5

That's why I have the lines

C#

var slashedItems = page.Options.FindAll(x => x.Contains('/') && !x.StartsWith("LT"));
            var itemsContainingX = page.Options.FindAll(x => x[2] == 'X');
            var slashedItemsWithLTInFront = page.Options.FindAll(x => x.Contains('/')
                && x.StartsWith("LT"));

My question is, is there a cleaner and more tight way of doing it?

Brian

Posted 7-Sep-11 11:52am

Brian C Hart86.6K

Updated 7-Sep-11 11:54am

v3

Add a Solution

2 solutions

Solution 1

Of course there is a cleaner way. It's good that you're trying to find one.
To me, it looks pretty obvious.

First, parsing is always bad. You need to do composing instead of parsing.

Your problem is that your tire property and naming nomenclature are non-uniform. You can generalize it. You need to create a data model which describe all possible combinations of tire characteristics mentioned in the model/part name/number. Some combinations of those characteristics exist in real life, some are not. On the top of your model's object graph you should have the list of records which references the choice of one of possible formats, model name, size, etc. Let's say you have 5 different name formats, 4 different sized, 7 manufacturer's companies, 12 model names. Your data model has a separate container of each of these categories. Then, you create one more container of the class which defines a combination of each item of each category. It's not a problem if some fields are null.

Sorry I cannot quickly draw you a UML diagram of this simple thing.

The object graph in memory won't be a tree — it will be a graph with loops. No matter. There is a way to persist it all automatically. Use Data Contract. It can automatically store your object graphs in XML file and restore back. See http://msdn.microsoft.com/en-us/library/ms733127.aspx[^], http://msdn.microsoft.com/en-us/library/system.runtime.serialization.datacontractserializer.aspx[^].

Now, when you populate a list with all possible really existing kinds of tires, you will use the container of instances of the class with those references. The text of the list item will be build based of format (one of the references) and available characteristics.

When you click on a list item, you should look at the container of those instances (you can find the selected instance by index in list and in the container, for example) and look at all the references in the class instance. Each reference will point you to the separate characteristic such as size, model number, etc.

Now, the problem is reduced to the problem of writing XML file manually. Do the following: for a bootstrap, program an incomplete sample of data graph which builds all the containers in memory and save it using the DataContractSerializer. Keep this temporary code for future use, but not deploy it. It will produce you a sample of XML file. Data Contract files are very readable; you will immediately see how to support it. If you want, you can easily create an updater — your support tool.

No parsing, for goodness sake — you cannot make such code reliable and supportable.

—SA

Posted 7-Sep-11 12:32pm

Sergey Alexandrovich Kryukov

Comments

Brian C Hart 7-Sep-11 18:36pm

OK, I was not looking for a Ph.D. in Math doctoral dissertation though :) Also, the size formats are all specified by internationally-published standards and I cannot change the formats since the company is too large for me to foist them on the company. so I have to make do with the data as it comes. It comes out of a SQL server database table. The list of sizes is not able to be specified by me and I always know the entire contents of the list before I sit down and split it up.

And you are absolutely right in that the DB design is terrible. The entire size string is in just one DB column, it really should be broken up into three DB columns to begin with.

Sergey Alexandrovich Kryukov 7-Sep-11 19:41pm

Ph.D? Joking? Of what sciences? "A science of doing trivial well-known things"? Be serious. The matter is really trivial.
I'm sorry if I did not explain it clearly; this is simple but graphical thing, it's awkward to explain in words only...

Well, it it comes out of DB, even easier: if you can change DB structure, use separate tables for every characteristic and one for formats. Now, the table with really available kinds of tires will only reference all other table via their keys.
I was no sure if you use database for this tire classification, which is quite natural though.

If you cannot modify database, every thinkable way is not clean enough, already. You can re-map database data to the objects in memory I mentioned, blah-blah. Anyway, it could be better. I explained why parsing is bad in my comment to the comment to TRK3.

--SA

Mehdi Gholam 7-Sep-11 18:45pm

Couldn't have said it better, 5!

Sergey Alexandrovich Kryukov 7-Sep-11 19:33pm

Thank you, Mehdi.
Unfortunately, OP does not get it. Frankly, I had trouble to explain it: the thing is very simple, but it needs either code sample of graphics (UML).
--SA

TRK3 7-Sep-11 19:28pm

What's the big objection to parsing?

Parsing is a well understood subject. Parsing done correctly is neither unreliable or unsupportable.

The problem is that people tend to write some obscure regex string splitter and say they have now "parsed" the data when in fact they have written some hack that isn't a parser at all and only sort of works.

The expectation is that there is some "tight" simple call you can make to something in a library or function that will do it for you. There isn't. To write a parser you have to understand and handle all the formats you expect to encouter and handle the exceptions. There is a clean way to do that, but it's not "tight".

Sergey Alexandrovich Kryukov 7-Sep-11 19:31pm

Why parsing at all? It is not reliable (what if non-parsing item appears later during support? no support from compiler). More importantly, it is not needed.
--SA

TRK3 7-Sep-11 20:39pm

I see your point, but it depends on the exact requirements that OP has.

He's stuck with the industry "standard" description of tire sizes that is already in a database.

If he wants to only display the possible options based on what's in the database it seems he has to parse it.

If he wants to display all possible options -- then yeah, he doesn't need to parse anything.

But even then, once he's got the options the user selected he's going to need to go back and find if it matches something in the database.

Since the database is set up already, he's going to have to do some parsing somewhere. Maybe he does it offline and converts that ugly list into something he can use easily at runtime, but he still has to write the parser and run it at least once.

Sergey Alexandrovich Kryukov 9-Sep-11 0:22am

I'm still not sure DB cannot be updated. I understand what you say. You pointed out the possible problems correctly, but following this way is pointless -- the way you describe is already implemented; OP is rightfully looking for a cleaner way.
--SA

TRK3 9-Sep-11 15:05pm

The way I described isn't what is implemented. What's implemented is a non-rigorous attempt at "parsing".

I agree that if you can compose instead of parse, it's a lot cleaner, safer and provably correct.

But sometimes you do have to parse.

If you have to parse, then parse in a rigorous fashion with code that correctly parses only those patterns you expect and rejects those that you don't. And have a log of the things you reject so that you can improve your code later to handle them if they are actually valid.

The problem is that when most people "parse" they do some hack that handles the pattern they expect but doesn't reject all kinds of other patterns. So when they get input that isn't in the format they expect their code does wildly bad things instead of cleaning handling the exception.

I added my solution more as attempt to show how you should parse if you have to -- not necessarily as the best solution to the particular problem OP is trying to solve. (Maybe I should write my first article?)

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

TRK3 · Accepted Answer · 2011-09-07T14:32:00

"Tight" isn't necessarily a good criteria for correct or maintainable code.

In fact, I think it's better to write clear code that's less tight.

First thing is to describe the valid formats for your data.

I don't fully understand the tire size nomenclature, but there seems to be:

# / # R #
# / # ZR #
# X # R #
# R #

with "LT" potentially appearing at the beginning or end.

To be very clear about what you are doing, I'd use the RegEx class to create a regular expression that describes each of your formats and then test against each RegEx and use the thing found:

C#

Regex rxslash = new Regex(@"^\s*(LT)?\s*(?<n1>[0-9]+)\s*/\s*(?<n2>[[0-9]+)Z?R(?<n3>[0-9]*)\s*(LT)?\s*$");

Regex rxX = ...

foreach (string item in page.Options) {

   if (rxslash.IsMatch(item)) {

          MatchCollection matches = rxslash.Matches(item);
          GroupCollection groups = matches.item(0).Groups;
          scratch1.Add(groups["n1"]);
          scratch2.Add(groups["n2"]);
          scratch2.Add(groups["n3"]);

   } else if (rxX.IsMath(item)) {

      ...

   } else {

       // I don't recognize this format, throw an error or log it
       //   to fix later...
   }
}

This is definitely not "tighter" than your code, but it explicitly handles each item in page.Options. It either parses the item because it matches one of the regular expressions you defined or it logs an error because the item didn't match. (And you can use that log to find out what you missed and fix it. Your current code will do something unexpected if an item isn't in one of the fomats you expect. It'll either drop it entirely or it'll end up putting garbage in one of the drop downs.)

Also it gives you complete control over what you do with the different pieces that you matched in the regular expression.

It's pretty clear that you want to split the # / # R # pattern into 3 numbers to display, but I'm not sure what you want to do with the # X # R # or the # R # pattern or the LT or Z -- maybe you want to keep #X# together, maybe you want to put LT in a separate drop down ... Whatever you decide to do, you have each individual component that you matched and you can do exactly what you want with them.