Click here to Skip to main content
15,885,216 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
How do I write a hash function in C++ that accepts virtually all data ( intergers, strings, objects etc) as Key?

What I have tried:

I have leant how to write simple hash function such as hash(k) = k%buckets that accepts integer.But that doesn't meet my need.
Posted
Updated 22-Oct-21 8:14am
Comments
k5054 21-Oct-21 21:05pm    
Have you considered std::any as your key value? That might work, but you might then have to box/unbox the values. If you do need to box/unbox, it might be better to create an ADT with a hash() member function - or perhaps even a template class.
Gbenbam 22-Oct-21 4:23am    
I need the hash function for a hash table.
Greg Utas 22-Oct-21 8:59am    
What are you going to use the hash table for? It's one thing to generate a hash from any type of data, but how are you going to determine the data's type once you find it in the hash table?

You only need a void pointer and its size as input.
C++
HASH hash(void* data, int byteSize);

//usage
int number = 42;
hash(&number, sizeof(number) );

Consider a useful HASH to avoid collisions. Like a long long.
 
Share this answer
 
Comments
Gbenbam 22-Oct-21 4:28am    
Can you kindly give some details as to how the function internals can be? What do you mean by useful HASH? What I really need is some guide on how to hash interger, how to hash double, how to hash strings, how to hash objects, etc. Using the approach HASH(k) = k% buckets is too trivial. It will not work for strings, neither for doubles, neither for objects etc
Greg Utas 22-Oct-21 9:04am    
I don't have a lot of experience with this, but he suggested a long long (probably 64 bits). That would reduce the number of collisions given that you're simply hashing on the contents of memory (effectively treating the input object as an array of bytes).

As far as the internals go, you might take bit 0 of the first byte, bit 1 of the second byte, bit 2 of the third byte, and so on. If the size of what you're hashing is less than 64 bytes, you could then wrap around and also take bit 1 of the first byte, bit 2 of the second byte, and so on until you have 64 bits.
To expand just a little on what KarstenK wrote, let's start with the prototype he showed :
C++
HASH hash(void* data, int byteSize);
Here is a simple hash function that could be implemented using that prototype.
C++
using HASH = long long int;
using UCHAR = unsigned char;

HASH hash( void * data, int byteSize )
{
    HASH hv = 0x0030507;    // an initial value, could be anything
    UCHAR * pd = (UCHAR *) data;
    for( int n = 0; n < byteSize; ++n )
    {
        hv << 4;
        hv += pd[ n ];
    }
    return hv;
}
I won't claim that is a good hash function but it is a simple one that illustrates how they are sometimes implemented. A test of a hash function would be to throw a whole bunch of possible input values at it and then see how many collisions result and how the bucket distribution looks. Ideally there are few collisions and the buckets have roughly the same occupancy.
 
Share this answer
 
Comments
Gbenbam 23-Oct-21 16:58pm    
Thanks. I wish I can get an answer from an expert on hash function writing.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900