Introduction

This is a new algorithm... or, at least, I think it is, since I didn't find any reference to something like this one.
It's an encryption algorithm.
When I say 'encryption', I mean that with this algorithm alone, you may hide something, for example a message;
I do not mean that this encryption is strong.
Quite the opposite: if used as a standalone algorithm, this is quite a weak cipher algorithm,
so weak that it could only be successfully used to hide a message from your 8 year's old son (unless you are in the Gauss family);
but do NOT use it to hide something from your boss.

So ... Why?
As I said before, this algorithm is weak if used alone.
It should instead be used as a part of a succession of multiple algorithms, like it is done for example in the AES / Rijdael algorithm.
AES encryption is conceptually quite simple: it is made of four easy algorithms that are applied many times (rounds) on a block of data.
In particular, every round consists of roughly four steps:

Substitution bytes (Sub-Bytes)
Shift Rows
Mix Columns
Xor with a key

and these steps are repeated a given amount of times.

Every one of these steps, taken alone, is quite easy to break, but when they are applied together many times, they become really strong.
If you are interested in AES and how it works, please have a look here.

Now, the purpose of the current article is to describe a new algorithm.
I will not explain how to use it in combination with other algorithms: that will be the purpose of another article.

The provided DLL is a Notepad++ plugin that allows you to test the current algorithm.

Background

To fully understand this algorithm, you should know the basics of cryptography;
it is also advisable to have a look to the 'Sub-Bytes' step of the AES algorithm.
It would also help to know the basics on byte operations and to know what a hashing algorithm is.
Furthermore, to test this algorithm, I used a Notepad++ plugin that I published here on CodeProject.

Algorithm Description

This algorithm performs a substitution, that is: it substitutes every byte in the clear message with another byte.

The simplest form of substitution encryption was performed by the Caesar cipher. Caesar cipher simply replaces letters with a letter some fixed number of positions down the alphabet,
so, for decryption, you only need to know the said number of positions.
It was used more than 2000 years ago (Guess by who ?), and it was already obsolete and easy to crack by the middle ages.

AES's Sub-Bytes, on the other side, uses a 'map' (the S-box) to perform the replacement (this is quite more complex than the Caesar's method),
but the map is fixed; it is part of the algorithm description and it never changes):
this was probably made because there may be some maps that are weaker than others, so the fixed one was chosen between the 'good' ones.

The currently presented algorithm creates a 'map' (S-box, like in AES) that varies depending on a key (like in Caesar cipher).

How is the S-box Created?

The S-Box is a simple byte array, with 256 positions (one for every byte value);
every position in this byte array is filled with a different number from 0 to 255.
So every position in the byte array will contain a unique value (the replacement value).
Roughly: the password is hashed,
then in a loop (from 0 to 255) it is subsequently hashed once for each position of the s-box (256 positions).
The first byte of the hash in every step of the loop will pinpoint the position in the resulting S-box,
while the index of the loop will be the value contained in that position.
The detailed description will be given hereinbelow, in the 'Using the Code' section.

How is then the S-Box Used?

Exactly in the same way as in the AES Sub-Bytes step:
every byte of the input will be regarded as a position in the S-Box
and will be replaced by the content of the S-Box in said position.
For the decryption, the box will be inverted and then used in the same way.

Using the Code

The solution contains two projects:
the first one 'Encoders' is exactly the same explained (as mentioned before) in my article on Notepad++ plugins; the second one 'HashGenSboxEncoder' contains:

the high level class that implements the 'IEncoder' interface
the Hash class that contains this algorithm
the low level classes that do the dirty work

Let's speak first about the most interesting part: the creation of the S-box, that is detailed in the CreateSboxFromHash method (in the Hash class); the core of the algorithm is here:

List<byte> sbox = new List<byte>();
List<int> startValues = new List<int>(Enumerable.Range(0, 256));
...
while (sbox.Count <= 0xff) 
    {
    hval = HashUntilFind(hval, left);
    int pos;
    int i = 0;
    do
        {
        pos = hval[countStartZeroes + i] % startValues.Count;
        } while ((countStartZeroes + i++) < hval.Length && 
                  startValues[pos] == sbox.Count);

    if (startValues[pos] == sbox.Count)
        {
        pos = (pos + 1) % startValues.Count;
        }

    sbox.Add((byte)startValues[pos]);
    startValues.RemoveAt(pos);
    }

At the beginning, there is a first empty list 'sbox' that, at the end of the procedure, will contain the resulting S-Box.
A while loop is used to fill, one by one, one for every loop, the 256 positions of the resulting S-box.

Inside the while loop:

a hashing is made
the first byte of that hashing is added to the resulting 'sbox' list.

This sounds easy, but by proceeding in this way, it is highly possible that there will be some duplicate values.

So we need a way to prevent this: a 'startValues' list serves this purpose.
The list 'startValues' is filled with all the values from 0 to 255 (Enumerable.Range(0, 256)).
Values are removed, once for every loop, from this 'startValues' list and then added to the resulting 'sbox' list.

Syllogism:
Since all the values in the 'startValues' list are different,
and since we'll use these values to fill the resulting 'sbox' list,
there won't be duplicates in the resulting 'sbox' list,

And that solves the problem of duplicate values.

So, inside the while loop:

a hashing is made,
the first byte of that hashing gives the position 'pos' in the 'startValues',
the byte in the position 'pos' in the 'startValues' is added to the resulting 'sbox' list,
the position 'pos' is removed from the 'startValues' list.

So at every loop, the resulting 'sbox' list will be one position bigger, and the 'startValues' list will be one position smaller.

But if the 'startValues' list grows smaller, it will then be smaller than 256 elements long.
Then... looking at the second point of the loop explanation hereinabove, what happens if 'the first byte of that hashing' is bigger than the 'startValues' list length?
It will rise an exception. Not good.

This is solved simply by reducing 'the first byte of that hashing' modulo the 'startValues' list length (modulo is performed here as the remainder of the integer division, the '%' operator).
And this solves the problem of picking the positions in the 'startValues' list, when the list grows smaller.

Then sometimes, there is the risk of storing values that are equal to the position in 'sbox' list.
For example: sbox { 32, 187, 41, 19, 4, ...
In this case, in position 4 (array starts from 0), we have the value 4, and this is bad.
Because when performing the substitution, we'll substitute 4 with 4. Not good.
We have to prevent this as much as possible.

So we are going to check if the value we are going to add to 'sbox' is the same as the value position,
and this is taken care by the second check of the do{} while loop:

startValues[pos] == sbox.Count

So, when we'll find some of the mentioned unlucky values, we are going to get the next value in the hashing and repeat the check.

And if we are extremely unlucky (which means that all the bytes in our hashing have the same value AND that value it is equal to our position in the sbox array), then get the next value in the startValues (which will be for sure different from the previous value).

Now, if at this point, we consider the 'HashUntilFind' function as a simple Hash256 function, the algorithm already works fine.

But, at this point, I wanted to make this algorithm more difficult to calculate, so that any potential brute force attack may require a big calculation power.
The Bitcoin mining already solved this problem, by making successive hashes until a hash that starts with a given number of zeroes is found. So I tried the same.
The 'HashUntilFind' function searches for a hash whose initial bytes are exactly the same as a given byte array, by repeating the hash in a while loop, until the right value is found. And this may be very power consuming.

So, during then creation of the S-Box, a difficulty parameter has to be chosen; that difficulty is used to create a simple array (called 'left' in the code) containing only zeroes, that is long as the 'difficulty'.
value, so, by making, for example, difficulty = 3 -> we'll obtain a 'left' array equal to { 0, 0, 0 };
this means that the HashUntilFind will continue its hashing until a hash that starts with three zeroes is found.

This also means that, if the difficulty is set to zero, then the 'HashUntilFind' function will perform as a standard Sha256 hash.

I've tried using difficulty = 1, and the algorithm works fine.
But, starting from difficulty = 2, the calculation of the algorithm already starts slowing down very much.

We are now moving on the high level part: the EncoderHashGenSbox class.

This class inherits from IEncoder, defined in the 'Encoders' project, explained in this article.

The implementation of this class shows a way to use the algorithm, in particular using strings as input.

Encode and Decode methods must be provided, and they simply need an input text and a password.

Let's have a look at the code inside Encoding method:

1)  byte[] bclear = ByteLogic.Conversions.ToByteArray(cleartext);
2)  byte[] sbox = GetSbox(pass, difficulty);
3)  byte[] bencoded = ApplySbox(bclear, sbox);
4)  string encodedtext = Convert.ToBase64String(bencoded);

The first row simply converts a string to a byte array,
the second row calculates the S-Box,
the third applies the Sbox to the input,
and the fourth row converts the resulting byte array to Base64 encoded text.

Applying the Sbox is quite simple: for every byte in the input, search that byte as the position in the S-Box and get the correspondant value. This value will be written in the output.

Example: input byte = { 14, 234, 02, 06, ...
So, in position ZERO in input we have 14.
Go search in the S-box in the 14 position and read the byte contained therein (for example 65).
In the output, in position ZERO, there will be 65.

Points 2 and 3 are the same in the decode, but between them, there is an added step, in which the S-Box is inverted. Inversion means that a new array is created, where the values contained on the old one are now the positions, and the old positions are now the values in the array.

The S-Box and the inverted S-Box are both 256 positions long; the following example shows two S-Boxes that are only 16 long (these will never be used with the current algorithm: they are shown only to provide a simple explanation):

S-BOX

09	12	00	07
14	15	01	03
04	10	02	05
13	06	08	11

Inverted S-BOX

02	06	10	07
08	11	13	03
14	00	09	15
01	12	04	05

In position 00 in the S-Box, we have 09; in the Inverted S-Box, in position 09 we have 00. Same goes with all the other values.

In the low level functionalities implemented in the ByteLogic.cs file, the only one worth mentioning is the function that performs the equality test between two byte arrays, where the comparison byte by byte is made with the help of two pointers (one for each byte array compared) that move in sync through the whole array lengths.

History

Version 1.0

Generate Variable Substitution-Boxes Starting from a Key