Algorithm to be used for searching using multiple input variables in records containing ranges of values of different data types

Question

0.00/5 (No votes)

See more:

, +

I have lot of entities which contain thousands of records, with diverse values like some have ranges using BETWEEN numbers,some with IN variable having multiple values in it, some with LIKE variable containing multiple values of different data types such as integer, string & decimal. For example - a typical record looks like -

A = New | Code = IN 101,102,103 | Values = LIKE A*,B*,C* | Quality = BETWEEN 1,9 |
Test = IN A

If i get an input for the fields - A,Code, Values & Quality per above I need to return the corresponding Test field for all the matching records. The input can be per below:

A = New , Code = 102, Values = Alpha, Quality = 8.2

Given the above I was thinking about using pattern matching algorithms to be used to identify the matching records. Please do suggest which algorithm will fit the bill to correctly identify a given record(s) based on the set of values provided.

What I have tried:

Given the above I was thinking about using pattern matching algorithms to be used to identify the matching records. Also to hash the inputs to be used as a lookup when the inputs are to be searched for.

Posted 13-Feb-17 6:39am

SanalSundar

Updated 13-Feb-17 11:03am

Add a Solution

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Patrice T · Answer 1 · 2017-02-13T11:03:00

Quote:
Please do suggest which algorithm will fit the bill to correctly identify a given record(s) based on the set of values provided.

You are in the worst possible situation, the flat text data imply the use of brut force. The name of fields in every records make it even worse the CSV format.
The only optimization you can expect is to first check the field that will best filter the records. But it will not change much the time to process the data.
So:
1) read all lines, no optimization
2) for each line, split on comma, no optimization
3) for each field, split on equal, modest optimization possible depending on filter.