Click here to Skip to main content
15,885,546 members
Articles / DevOps
Tip/Trick

4-bit Encoder/Decoder

Rate me:
Please Sign up or sign in to vote.
4.38/5 (4 votes)
12 Feb 2019CPOL 17.8K   13
Code for a 4-bit encoder to store 15 different symbols with higher efficiency

Introduction

Converts an 8 bit string to a 4-bit string (max. 15 different characters allowed).

Respectively: Converts two 8 bit strings to one 8 bit string.

Through this conversion, strings can be stored using only 1/2 of the size of a usual string. This might be useful for a huge amount of data, that uses 15 different characters at max (like phone numbers).

Background

I was thinking, that storing telephone numbers in a database as strings is a waste of memory. But storing as an integer is also not possible. My solution was to use an encoded string.

Using the Code

Below, you see the implementation of the class. At the bottom, there is a test() function, that shows how to use the code.

For customizing the symbols, that can be represented/encoded, change Encode4Bits._mappingTable. Never use more than 15 customized values.

Python
class Encode4Bits:
    def __init__(self):
        # first element is always "END"
        self._mappingTable = ['\0', \
                              '0','1','2','3','4','5','6','7','8','9', \
                              '-','','','','']

    def _encodeCharacter(self,char):
        """@return index of element or None, if not exists"""
        for p in range(len(self._mappingTable)):
            if(char == self._mappingTable[p]):
                return p
        return None

    def encode(self, string):
        strLen = len(string)

        # ===== 1. map all chars to an index in our table =====
        mappingIndices = []
        for i in range(strLen):
            char = string[i]
            index = self._encodeCharacter(char)
            if(index is None):
                raise("ERROR: Could not encode '" + char + "'.")
            mappingIndices.append(index)
        mappingIndices.append(0)
        
        # ===== 2. Make num values even =====
        # 4 bit => 2 chars in one byte. Therefore: need even num values
        if(len(mappingIndices) % 2 != 0):
            mappingIndices.append(0)

        # ===== 3. create string =====
        ret = ""
        i = 0
        while True:
            if(i >= len(mappingIndices)):
                break # finished
            val1 = mappingIndices[i]
            val2 = mappingIndices[i+1]
            val1 = val1 << 4           
            mixed = val1 | val2
            char = chr(mixed)
            ret += str(char)
            i += 2

        return ret

    def decode(self, string):
        ret = ""
        for char in string:
            index1 = (ord(char) & 0xF0) >> 4
            index2 = (ord(char) & 0x0F)            
            ret += self._mappingTable[index1]
            ret += self._mappingTable[index2]
        
        return ret

def test():
    numberCompressor = Encode4Bits()
    encoded = numberCompressor.encode("067-845-512")
    decoded = numberCompressor.decode(encoded)
    print(len(decoded))
    print(len(encoded))


if __name__ == "__main__":
    test()

History

  • 8th February, 2019: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Engineer Telefonica Germany
Germany Germany
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
BugError Decoding Pin
Gammed11-Feb-19 1:04
Gammed11-Feb-19 1:04 
PraiseThank you! Pin
D4rkTrick12-Feb-19 15:22
professionalD4rkTrick12-Feb-19 15:22 
QuestionNot Integers Pin
SDSpivey7-Feb-19 19:02
SDSpivey7-Feb-19 19:02 
AnswerRe: Not Integers Pin
Nelek7-Feb-19 19:09
protectorNelek7-Feb-19 19:09 
AnswerNumber of leading zeros Pin
D4rkTrick7-Feb-19 21:35
professionalD4rkTrick7-Feb-19 21:35 
AnswerRe: Not Integers Pin
YvesDaoust11-Feb-19 2:25
YvesDaoust11-Feb-19 2:25 
QuestionBCD Pin
YvesDaoust7-Feb-19 4:06
YvesDaoust7-Feb-19 4:06 
It seems that you re-invented the Binary-Coded Decimal representation. This is (was) in use in electronic calculators, among others. Binary-coded decimal - Wikipedia

As it uses only 10 values out of 16, it has an efficiency of about 83%. This can be beaten by bit streams where the numbers are encoded in pure binary, with a variable length. This is reserved to sophisticated applications.

You may be interested to know that the 2D barcodes (Data Matrix, QR...) and some advanced 1D barcodes are using many different and exotic encoding systems to represent numerical data and some alphanumeric sets in a packed way. For instance, Data Matrix uses the C40 system to pack three characters (digits or upper/lowercase letters) in a 16 bits number. As well as two digits in an sigle byte, using the values 00 to 99.
AnswerRe: BCD Pin
Nelek7-Feb-19 19:04
protectorNelek7-Feb-19 19:04 
GeneralRe: BCD Pin
YvesDaoust7-Feb-19 23:05
YvesDaoust7-Feb-19 23:05 
GeneralRe: BCD Pin
Nelek8-Feb-19 0:15
protectorNelek8-Feb-19 0:15 
AnswerThanks for sharing Pin
D4rkTrick8-Feb-19 0:17
professionalD4rkTrick8-Feb-19 0:17 
SuggestionPhone symbols Pin
Nick Gisburne7-Feb-19 2:36
Nick Gisburne7-Feb-19 2:36 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.