|
Thanks Ennis,
My scenario is like this.
1. Using .Net 2.0 code generate hash code using GetHashCode function for name "abc";
2. We can further using the hash value as the unique identifier for "abc", e.g. check whether such hash value exists is the same as checking "abc" existence;
3. Upgrade to .Net 3.5, and calculate new hash value for "abc", since the new value does not match the old hash value, the system will believe "abc" does not exist and some other value exists.
Do you think this is an issue?
regards,
George
|
|
|
|
|
Like I said, you can't use GetHashCode for persistence. In fact, there is no guarantee that within the same .NET version it will generate the same hash.
Your scenario looks like a rewrite of some common database functionality which would be best stored in a database.
There is one other option, load the names into a hash table at start-up using the names. Then you can check for existence using a hash look-up without needing the hash code.
Need a C# Consultant? I'm available.
Happiness in intelligent people is the rarest thing I know. -- Ernest Hemingway
|
|
|
|
|
Yes, Ennis!
My purpose is to let database VARCHAR small enough (INT32) to save memory. Since the # of strings are large. It is good to see you understand my needs without looking at my code. Magic.
Now, I create an INT32 column and a VARCHAR column and using INT32 as the hash code.
"load the names into a hash table at start-up using the names" -- since the # of strings are large, I think the elapsed time is big.
regards,
George
|
|
|
|
|
George_George wrote: 2. We can further using the hash value as the unique identifier for "abc", e.g. check whether such hash value exists is the same as checking "abc" existence;
No, that is not correct. Checking the hash code is not the same as checking the actual value.
It's only possible to get a unique 32-bit hash code if you exclusively have very short strings, i.e. not more than four characters. If character codes outside the regular ASCII character set is used, you can't have longer strings than two characters. If you have longer strings than that, it's not possible to guarantee a unique hash code.
George_George wrote: 3. Upgrade to .Net 3.5, and calculate new hash value for "abc", since the new value does not match the old hash value, the system will believe "abc" does not exist and some other value exists.
That's definitely an issue (although not with .NET 3.5 as it still uses framework 2.0). The hash code provided by the GetHashCode method is not intended for persistent storage.
Despite everything, the person most likely to be fooling you next is yourself.
|
|
|
|
|
Thanks Guffa,
Two more comments,
1.
" i.e. not more than four characters. If character codes outside the regular ASCII character set is used, you can't have longer strings than two characters. If you have longer strings than that, it's not possible to guarantee a unique hash code." -- I am confused. Could you show me a sample please?
2.
"That's definitely an issue (although not with .NET 3.5 as it still uses framework 2.0). The hash code provided by the GetHashCode method is not intended for persistent storage." -- great to learn this from you! Do you have any MSDN or official document support the point? I want to find a link (but failed) to send to my friends interested.
regards,
George
|
|
|
|
|
George_George wrote: " i.e. not more than four characters. If character codes outside the regular ASCII character set is used, you can't have longer strings than two characters. If you have longer strings than that, it's not possible to guarantee a unique hash code." -- I am confused. Could you show me a sample please?
To make a simple example, lets say that we have a number of strings that only contains printble ASCII characters, i.e. each character can only have 96 different values. For a ten character string, there are 66483263599150104576 possible combinations of characters.
As an Int32 can only have 4294967296 different values, there are far from enough values to give each possible string a unique hash code. There is by average 15479341056 different ten-character strings that get the exact same hash code.
(As strings are unicode, in reality each character can have a lot more than 96 different values, which of course greatly increases the number of possible combinations.)
George_George wrote: "That's definitely an issue (although not with .NET 3.5 as it still uses framework 2.0). The hash code provided by the GetHashCode method is not intended for persistent storage." -- great to learn this from you! Do you have any MSDN or official document support the point? I want to find a link (but failed) to send to my friends interested.
MSDN Library: String.GetHashCode method[^]
"The behavior of GetHashCode is dependent on its implementation, which might change from one version of the common language runtime to another. A reason why this might happen is to improve the performance of GetHashCode."
Despite everything, the person most likely to be fooling you next is yourself.
|
|
|
|
|
Great analysis, Guffa!
"(As strings are unicode, in reality each character can have a lot more than 96 different values, which of course greatly increases the number of possible combinations.)" -- do you mean unicode strings are more prone to have the same hash value (i.e. conflicting) for the same input?
regards,
George
|
|
|
|
|
George_George wrote: do you mean unicode strings are more prone to have the same hash value (i.e. conflicting) for the same input?
In .NET all strings are unicode.
No, I mean that if you consider the full unicode character set, there are a lot more possible combinations of characters in a ten character string. That means that there are more combinations that share the same hash code, but that doesn't mean that any two strings have a higher chance of sharing the same hash code.
Despite everything, the person most likely to be fooling you next is yourself.
|
|
|
|
|
Thanks Guffa!
I agree. Cool!
regards,
George
|
|
|
|
|
If the distribution is good, you can just use any 32 bits from the 128 bits.
Despite everything, the person most likely to be fooling you next is yourself.
|
|
|
|
|
Thanks Guffa,
Two more questions,
1.
"distribution is good" -- distribution of string to hash or distribution of MD5 result?
2.
"distribution is good" -- how do you define good generally?
regards,
George
|
|
|
|
|
Is there something wrong with the GetHashCode method in the string class?
Despite everything, the person most likely to be fooling you next is yourself.
|
|
|
|
|
It's not enterprisey.
Need a C# Consultant? I'm available.
Happiness in intelligent people is the rarest thing I know. -- Ernest Hemingway
|
|
|
|
|
Ennis Ray Lynch, Jr. wrote: It's not enterprisey.
Oh. I thought that it perhaps was too NIH.
Despite everything, the person most likely to be fooling you next is yourself.
|
|
|
|
|
NIH is short for?
regards,
George
|
|
|
|
|
Not Invented Here
Despite everything, the person most likely to be fooling you next is yourself.
|
|
|
|
|
Ok, Guffa!
Any existing technologies is fine. Any solution you could provide to me for my original question?
regards,
George
|
|
|
|
|
Can you speak in some other words? My English is not good, sorry. "enterprisey" you mean?
regards,
George
|
|
|
|
|
Genrally that means overly complicated.
As there is a built-in method in the string class that does what you ask for, you should obviously look at that before you look for anything "better".
As it turns out, the built-in method doesn't fulfill all your requirements, but it sure fulfills all the requirements that you mentioned in your original post.
Despite everything, the person most likely to be fooling you next is yourself.
|
|
|
|
|
Thanks Guffa,
The "built-in method" -- you mean GetHashCode?
regards,
George
|
|
|
|
|
Yes.
Despite everything, the person most likely to be fooling you next is yourself.
|
|
|
|
|
Cool, Guffa!
regards,
George
|
|
|
|
|
Probably not, but George like to know his limits
|
|
|
|
|
Yeah its kinda crap
Well... its very very fast, uses some sort of parity method. You get collisions all the time though in fairly general usage.
|
|
|
|
|
Hi Mark,
Sorry I lost your context. Do you mean GetHashCode method?
regards,
George
|
|
|
|