Generate unique no. from char*

Question

0.00/5 (No votes)

See more:

Hi,
I want to generate unique no. or key from char[400] which is unique.
So please tell me any technique or algoritm to generate unique no. or key from char[400]
Thanks in advance :)

Posted 10-Oct-11 2:00am

LaxmikantYadav

Add a Solution

2 solutions

Solution 2

If you need a general purpose string hash function then check out the djb2 hashing on this site: http://www.cse.yorku.ca/~oz/hash.html[^]. If I remember right this general purpose string hash function is used in java as well. But in the comments of the previous solution the guys already pointed out: char[400] is already a 400*8 bit number, and you can not generate a unique number represented in less bits for each member of the previous much bigger set. This is where you have to make sure that the hash function you are using doesn't have a lot of collisions on the set of input data you are usually using. If you don't exactly know what is the content of your input strings, then the general purpose djb2 hash will be nice for you. But for example if you know that your strings will contain the textual representation of numbers between 0 and 2^32, then a function that converts the string to integer is a perfect hash function (without collisions), but to achieve this we need to know something about the input data (like with compression algorithms). Hashing and compression have a lot in common as a previous poster pointed out. Hashing can improve the performance of some simple data structures like set and map where some operations include searching for a key. Searching is usually done by performing binary halving on a list of sorted items, but with hashing you can achieve much better performance if your hash function doesn't have a lot of collisions (in english, you don't have a lot of items with the same hash).

Posted 10-Oct-11 5:41am

pasztorpisti

Updated 10-Oct-11 5:44am

v2

Comments

LaxmikantYadav 11-Oct-11 4:56am

Actully, This char[400] is a finger print data of user which is stored in the flat file and if i consider this as unique key it will take huge time while finger print matching for no. of records. If i able to generate key using this char[400] with help of that key i can calculate offset of it. And can perform 1:1 finger print matching rather than 1:N, which will save my time while finger print verification.

pasztorpisti 11-Oct-11 11:10am

You can not generate a smaller unique key, but with hashing you can make the process much more faster. Basically what you might want to do is store the char[400] blobs in a hashmap in which the keys are the generated hash numbers, and the values are lists that contain all char[400] blobs for the specified hash number. After this if you want to search for a specific char[400] then all you have to do is generate the hash from it, search for the hash in the map, and then you have to compare your char[400] only with those char[400] blobs that are in the list of the specified hash number. This will drastically reduce the number of comparisons between char[400] blobs if the hash function doesn't have lot of collisions.

I guess you know what do I mean on hashmap, or generally a map (associative array, a general purpose data structure), but I link this nicely written wiki page: http://en.wikipedia.org/wiki/Hash_table#Separate_chaining
I think the "Separate chaining" hashmap is one of the easiest to implement.

pasztorpisti 11-Oct-11 11:19am

In C++ ppl often implement strings that automatically precalculate their own hashes when they are modified. This way you can easily implement very fast != (not equal) and == (equal) operators for the string class. If you want to perform comparision between two strings, then you just compare their hashes. If the hash numbers differ, then its 100% that the strings are different. BUT if the hash numbers are the same, it isn't guaranteed that the strings are the same, so you must perform a real string comparison. Still this can improve performance a lot if you use strings that aren't modified a lot of times (because modification involves recalculation of hash numbers). Basically your char[400] should work the same way as the string I described above. You have to perform real comparisons between char[400] blobs only if their hash is the same (you have collision).

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CPallini · Accepted Answer · 2011-10-10T02:22:00

Solution 1

Genrally speaking there is no way to do that, I mean the char[400] itself is the simpler theoretically possible unique key if you don't put constraints on your requirements.

Posted 10-Oct-11 2:22am

CPallini

Comments

Stefan_Lang 10-Oct-11 8:50am

Exactly. There are 256^400 different values that could be stored in a char[400], and if you want unique keys for each value, then you need at least 256^400 different keys. Therefore the key length will need to be at least 400 chars long, or else it cannot possibly be unique.

If shorter keys are needed, then this basically requires a compression algorithm. Compression algorithms don't create constant length 'keys' though, only ones that are shorter on average.

It is actually possible to compress data even when there are no known constraints as long as there are some - but in that case a representative sample of the data is required to take statistics from. There is no guaranteee though that a compression will indeed generate shorter keys, as there is always an overhead for the compression method, and this overhead can turn out to be larger than the average amount of space you saved.

So, in short: the data itself is the only unique key to data records that is guaranteed to be no longer than the data.

LaxmikantYadav 11-Oct-11 4:49am

Actully, This char[400] is a finger print data of user which is stored in the flat file and if i consider this as unique key it will take huge time while finger print matching for no. of records. If i able to generate key using this char[400] with help of that key i can calculate offset of it. And can perform 1:1 finger print matching rather than 1:N, which will save my time while finger print verification.

Stefan_Lang 11-Oct-11 6:07am

Now, this is a wholly different requirement, and it shows that your initial requirement was no requirement at all, just one potential solution to the actual problem (and an unsuitable one at that). What you *really* need is a method to efficiently find records that are associated with a 400 char 'finger print'.

What I don't understand is why you think that an unknown algorithm that generates some 'key' will for some reason determine an 'offset' into your data record table?

Please edit your posting to reflect your real requirments, including details about what these 'finger prints are', how these records are organized, and in what way they are associated to the 'finger prints'.

It's really hard to conjure a solution when you don't give us the most basic details about the objects you need to deal with!