Click here to Skip to main content
15,881,173 members
Articles / Programming Languages / SQL

Persistant Hash Table

Rate me:
Please Sign up or sign in to vote.
4.00/5 (1 vote)
27 Oct 2011CPOL3 min read 25.5K   552   12   6
Key value store

Introduction 

I got inspired by the NO SQL and wanted to start writing one myself, I saw few very good open source C++ NO SQL like “MongoDB” and “levelDB” and wanted to write a key value store myself. I wanted to start off with a minimalistic set of goals to start and complete in a weekend. So with these parameters, I am posting this one. I hope it will be helpful to someone who wants to develop key store as a beginner.  

Background

There are many ways to create a table and access it, each one has its own advantages and disadvantages. Here, I am using a hash function to access a record in a file. Other methods include maintain index tables, sequential access, b+trees, persistent pointers, etc.

Using the Code

I have used a simple logic to construct the persistent hash table. If we define number of buckets, their size and record size, it is easy to store and retrieve. When we want to write a record or read a record, key is passed to the hash function, with the hash value, the bucket offset is derived. Then we need to jump to that offset and write the record or read the record. Before doing that, a check is made if a record is already present in that location, if present then we jump to the next record and then that record is written.

  • CHash(long nBuck,long bucksize,long recsize): is one of the overloaded ctor where we pass number of buckets, bucket size and each record size (table size). For this table, key should be long.
  • CHash(long nBuck,long bucksize,long recsize,unsigned long (*uh)(void *)): is another ctor where user defined hash function pointer can be passed. key can be user defined.
  • createdb: creates the hdb file with the name given, writes and reads can be made.
  • writedb: opens the file for writing
  • readdb: opens the file for reading
  • closedb: closes the file handle
  • writerec: write record has two overloads, one for key type long and another for user defined key type, writes user defined data, writes data into the bucket if record is present (i.e. collision occurred) then it jumps to the next record until it finds a vacant place to write.
  • readrec: readrec  has two overloads, one for key type long and another for user defined key type, reads user defined data. Reads the record from the bucket offset, if the value searched is different, then readnextrec is used to fetch the next record with the offset taken from previous readrec call.
  • readnextrec: as mentioned above, this needs to be used to read consecutive elements in the same bucket by passing in the offset from readrec.
  • writeoffset,readoffset,GetbucketOffset are the helper functions to move around the file for writing and reading.
C++
int CHash::GetbucketOffset(long buckno)//returns the bucket offset
{
   int offset = 1;
   offset = buckno * m_bucketsize;
   return offset;
}
C++
int CHash::writerec(void* key,void *data)
{
	long buckno = GetbucketOffset(hash(key));
    void *temp = malloc(m_recsize);
	while(0 != readoffset(buckno,temp,m_recsize))
	{
		buckno += m_recsize;//go to the next record
	}
	free(temp);
	writeoffset(buckno,data,m_recsize);
	return 0;
}
C++
int CHash::writerec(void* key,void *data)
{
	long buckno = GetbucketOffset(hash(key));
    void *temp = malloc(m_recsize);
	while(0 != readoffset(buckno,temp,m_recsize))
	{
		buckno += m_recsize;//go to the next record
	}
	free(temp);
	writeoffset(buckno,data,m_recsize);
	return 0;
}
C++
int CHash::readrec(long key,void *data,long &offset)
{
	long buckno = GetbucketOffset(hash(key));
	readoffset(buckno,data,m_recsize);
	offset = buckno;
	return 0;
}
C++
int CHash::readrec(void* key,void *data,long &offset)
{
	long buckno = GetbucketOffset(hash(key));
	readoffset(buckno,data,m_recsize);
	offset = buckno;
	return 0;
}
C++
int CHash::readnextrec(long offset,void *data)
{
	offset += m_recsize;
	readoffset(offset,data,m_recsize);
	return 0;
}

Testhash.cpp is written to give a demo of how to use this class.

Limitations

May be in my next article, I can improve on the pre-definition of the size for buckets and tables. Instead, I will build these tables with some other class or use some other technique. Another limitation I found in this is if sufficient space is not allocated to a bucket, then data corruption can happen. All these are due to fixed sizes.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior) EMC
India India
I have worked in ESM, Storage and Embedded domains. Mainly taken part in design, coding and maintaining in c/c++ on windows.
As a hobby tinkering with embedded/electronic devices (new found interestSmile | :) )...

Comments and Discussions

 
GeneralSee the next improved version Pin
Dharmateja Challa7-Nov-11 1:04
Dharmateja Challa7-Nov-11 1:04 
GeneralRe: See the next improved version Pin
salamat200026-Nov-12 5:11
salamat200026-Nov-12 5:11 
Questionpersistent, not persistant Pin
Charvak Karpe1-Nov-11 4:02
Charvak Karpe1-Nov-11 4:02 
AnswerRe: persistent, not persistant Pin
Dharmateja Challa7-Nov-11 1:00
Dharmateja Challa7-Nov-11 1:00 
Yes, you are true I was searching for one i did not get, hence posting these articles in a series. The first one is not even a starter, I am planning to post many in this series, please also see the next ones
SuggestionRe: persistent, not persistant Pin
KapsD18-May-12 0:19
KapsD18-May-12 0:19 
QuestionMore information Pin
Mehdi Gholam27-Oct-11 4:31
Mehdi Gholam27-Oct-11 4:31 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.