Introduction

What follows is the product of my study of several encryption algorithms. I decided to write this library out of a desire to learn more about them, and cryptography in general. I have in the past adapted classes from popular libraries like Mono and Bouncy Castle, but this time I wanted to write my own implementations, ones that were optimized for the C# language, and possibly faster, and more flexible than these popular C# versions. As I was writing the base classes, I also began thinking about various attack vectors, and how they might be mitigated, and also how the existing primitives might be improved upon from a security perspective.

It is important to note, that using the base ciphers with their original key sizes, output from those classes will be exactly the same as any other valid implementation of that cipher; RHX (Rijndael) with a 256 bit key is Rijndael, as THX (Twofish) with a standard key size is Twofish, and SHX (Serpent) is a valid Serpent implementation. This is proven. The Tests section contains the most complete and authoritative test suites available for each of these ciphers. So if you choose to deploy with standard key lengths, you can use configurations that have been thoroughly cryptanalyzed.

One has to consider that these ciphers were designed almost 20 years ago; at the time, Windows 95 was the predominant operating system, and computer hardware was quite primitive by today's standards. So, concessions had to be made in cipher design in regards to speed and memory footprint. We are not so constrained with the hardware of today, so adding rounds to a cipher, or using a larger key size is less a consideration now, and will have even less impact in the future as hardware continues to evolve.

Speed remains an important design criterion with this project. The CTR mode and the decryption function of the CBC and CFB modes have been parallelized. If a block size of ParallelBlockSize (64000 by default, but configurable) bytes is passed to the mode, and the hardware utilizes multiple processor cores, the processing is automatically parallelized. On an i7-6700 processor, I have reached speeds of over 18 gigabytes per minute with this library using Rijndael, making this by far the fastest implementation of these ciphers in the C# language that I have found (the C++ version using AES-NI has been clocked at 5882 MB per second on the same computer!).

I definitely have some strong reservations about publishing this code, not the least of which is that it is likely to spawn a number of so called 'AES 512' copies by people who may not understand enough about the algorithms to evaluate, produce, or maintain a secure encryption software. It's kind of a quandary though, if I leave it on Github, no one will ever see it, if I publish it here, it might be used irresponsibly.. I would urge anyone considering using one of the extended key lengths to study the work, thoroughly evaluate the implementations, and make an informed choice.

As for my part, I wrote these implementations based on well-known versions, and made as few changes to the ciphers as possible to extend the key size. I have confidence in the library itself, because I took care to test it every step of the way, and feel I am developing a good understanding of the cryptographic primitives used in its construction. This should however, be considered as what it was intended to be; experiments..

I welcome input from cryptographers and programmers. If you have a comment or concern, I'd be glad to hear from you. My goals include moving what I feel are the best and strongest implementations to a C++ library; CEX++.

Version 1.5 is complete, and will be one of the last versions of the .Net version of the library, as the code begins transition to the C++ language. This version includes post quantum secure asymmetric ciphers, a key exchange protocol, and many more of what I consider essential and relevant cryptographic entities. This version encompasses most of the features I originally envisioned for the project, and though it may still evolve as I mirror improvements to the code made in the C++ version, this should be considered final, as no more major additions are planned for the project.

The license is MIT for all sources except the NTRU cipher and the DTM-KEX implementations which are GPLv3, the rest of the library including ciphers, digests, Drbgs, Prngs etc. is still MIT. If you plan to use the NTRU implementation in a commercial project you need to contact the patent holders at Security Innovation.

Documentation has been added as an optional download with this project distribution, though the website link on the sample forms Help menu, or directly at: CEX Help.

This project contains strong cryptography; before downloading the source files, it is your responsibility to check if the extended symmetric cipher key lengths (512 bit and higher) are legal in your country. If you use this code, please do so responsibly and in accordance to law in your region.

CEX is also available on GitHub as the CEX Project. The library API documentation can be accessed from the CEX Homepage, along with the latest releases of the library and example code.

What's New in 1.5?

..and then there were three. In the original distributions there were two variations of each of the three root symmetric ciphers; Rijndael, Twofish, and Serpent, one version that used a standard key schedule (extended to a 512 bit key), and the other version used HKDF to generate the internal working key. These versions have been combined; now each cipher, (RHX, SHX, and THX) are equipped with both key expansion fuctions. The ciphers have also been optimized for speed, per the C++ distribution.

The 'merged' ciphers Fusion, RSM, and TSM have been removed. It isn't that I thought they were insecure, just unnecessary.. a lot of extra processing involved in combining the rounds functions from two ciphers, when I believe the existing ciphers, (with larger keys and increased rounds), will remain secure for some time to come.

The seed generators, ISAAC and XorShift+ have been added, along with a class that gathers entropy from the system.

A number of the classes like CipherStream, and the cipher modes have been rewritten, as I have made a line by line inspection and optimization of all of the classes that have made their way into the C++ distribution thus far.

VolumeFactory, KeyFactory, and PackageFactory have been reworked along with their dependant types and a range of tests have been added to the Test project.

February 23, 2016 1.5.2

Documentation moved to Doxygen

March 16, 2016 1.5.3

Added Seed generators CTRRsp and SP20Rsp
SHA-2 and Blake digests have been unrolled (optimized)
Changes to RegistryUtils and the EntropyPool classes
Expanded pool and changes to ISAAC

April 08, 2016:

Unrolled Rijndael key schedule StandardExpand() routine, removed all branches and loops from the key expansion function.
Added new constructors to CMAC and HMAC, they can now be initialized with the engines enumeration member.
Added MacDescription structure, similar useage to CipherDescription.
Added new constructor to MacStream, can now initialize any supported Mac generator with a MacDescription structure.
Expanded DigestFromName helper class with GetDigestSize() and GetBlockSize() functions.
Added new helper classes CipherFromDescription, and MacFromDescription.
VolumeCipher rewritten, changes to MacStream and DigestStream stream classes

April 13, 2016 -v1.5.5:

Note: There are some breaking changes in this version.

When I first wrote out the symmetric ciphers using the HKDF Expand powered key schedule, I was going for a 'good until the end of the world' sort of implementation, with large cipher input keys, which was ok, because the ciphers were intended only for a local file encryption application.

As the C++ version is evolving, I have reconsidered this, and lowered the minimum key size requirements to more reasonable sizes, as that version of the library will be used for communications applications that I am writing.

I have also changed the Twofish and Serpent implementations so that when they use a 512 bit key, (with the 'standard' key schedule), the rounds are set to a higher count automatically to reflect the increased security guarantee.

Changed minimum key size for HKDF extended key schedules in the Symmetric ciphers to the digests output size
Serpent in standard mode sets rounds to 40 with a 512 bit key
Twofish sets rounds to 20 with a 512 bit key
Removed unnecessary loop processers from Salsa and ChaCha
Updated documentation throughout the library

May 28, 2016 -v1.5.6

Note: There are some breaking changes in this version.

While writing out the Rijndael key schedule for the AES-NI version in CEX++, I realized that the key schedule's 512bit key expansion routine could be made more secure. It's not like I had any sources to draw from when I wrote the extended key size routines, but revisiting that code revealed changes that needed to be made to increase security. It is strongly recommended that if you are using Rijndael with the 512 key, that you update to this more secure version (this includes CEX++). My previous recommendation still stands though, if you require a strong security guarantee, use the HKDF expansion methods rather than the native key schedules.

Changes to the Rijndael 512 bit key expansion method (see the Rijndael section for details)
Changes to the KeyGenerator class; counters are now rotated on 32bit boundaries (see Rotate function)
MessageHeader renamed MessageDescription, changed to allow in-place decryption (example project)
Various improvements to speed and security made throughout the library

Library Components

The library contains the following components, as it evolves, some will be added, some removed, and when possible, changes will be made to improve upon performance, security and documentation.

Asymmetric Encryption Engines

NTRU Encrypt

NTRU Encrypt is a public key cryptosystem developed around 1996 by three mathematicians (Jeffrey Hoffstein, Jill Pipher, and Joseph H. Silverman). It is a lattice based cipher that has continued to evolve and is used in many different libraries and real world applications. This version of NTRU is a translation of the Java version by Tim Buktu, located on Github. There are some notable differences between the two implementations; this C# version can use any of the libraries Digests or Prngs to power internal random generation, uses optional parallel processing to generate keys and transform data, has the Dispose interface implemented throughout, and like all of the asymmetric ciphers in the library, follows a standard pattern that is very easy to use (most operations require only a couple lines of code to implement).

McEliece

The McEliece Public Key Cryptosytem was first developed in 1978 by Robert McEliece based on the hardness of decoding a general linear code. Since then, it has undergone a number of changes to increase security, including the CCA2 secure variants used in this implementation. The C# version is based on the Bouncy Castle Java version 1.51 McEliece implementation, and like the other asymmetric ciphers in this project uses parallelism, and operates with any of the Digest or Prng implementations in the library.

Ring-LWE

The concept of Learning with errors was first introduced by Oded Regev in 2005. Ring-LWE employs learning with errors operating within a Ring of Polynomials over a Finite Field. The Ring-LWE cipher has been shown to be both efficient and as a 'quantum safe' alternative to the widely used Diffie-Hellman and Elliptic Curve Diffie-Hellman key exchanges. This version is based on the paper 'Efficient Software Implementation of Ring-LWE Encryption' by Ruan de Clercq, Sujoy Sinha Roy, Frederik Vercauteren, and Ingrid Verbauwhede, and the accompanying source code on Github.

Usage

Both the Aymmetric ciphers and signers follow the same design pattern, and have been made as easy to use as possible..

Creating a key pair:

// use a pre-defined parameter set
RLWEParameters ps = RLWEParamSets.RLWEN512Q12289;
// initialze the key generator
RLWEKeyGenerator gen = new RLWEKeyGenerator(ps);
// create the key-pair
IAsymmetricKeyPair kp = gen.GenerateKeyPair();

Serializing and Deserializing a key:

// convert a key to a byte array
byte[] keyArray = kp.PublicKey.ToBytes();
// deserialize a key
NTRUPublicKey npubk = new NTRUPublicKey(keyArray);
// convert a key to a stream
MemoryStream keyStream = kp.PrivateKey.ToStream();
// deserialize key
NTRUPrivateKey nprik = new RLWEPrivateKey(keyStream);

Encrypt and Decrypt:

RLWEParameters encParams = new RLWEParameters(512, 12289, 12.18, new byte[] { 2, 5, 1 }))
RLWEKeyGenerator keyGen = new RLWEKeyGenerator(encParams);
IAsymmetricKeyPair keyPair = keyGen.GenerateKeyPair();
 
byte[] data = new byte[64];
byte[] enc, dec;

// encrypt an array
using (RLWEEncrypt cipher = new RLWEEncrypt(encParams))
{
    cipher.Initialize(keyPair.PublicKey);
    enc = cipher.Encrypt(data);
}

// decrypt the cipher text
using (RLWEEncrypt cipher = new RLWEEncrypt(encParams))
{
    cipher.Initialize(keyPair.PrivateKey);
    dec = cipher.Decrypt(enc);
}

Pre-Defined Parameter Set Access:

// from the oid
NTRUParameters params1 = NTRUParamSets.FromId(new byte[] { 2,2,2,105 });
// from enumeration
NTRUParameters params2 = NTRUParamSets.FromName(NTRUParamNames.E1171EP1);
// clone static member
NTRUParameters params3 = NTRUParamSets.APR2011743.DeepCopy();

Asymmetric Signing Engines

Rainbow

The Rainbow Multivariate Equation Signature Scheme is a member of a class of multivariate quadratic equation cryptosystems called "Unbalanced Oil and Vinegar Cryptosystems" (UOV Cryptosystems). First presented in In 1988 by T. Matsumoto and H. Imai presented their scheme "Matsumoto-Imai-Scheme". This implementation is a translation of the Bouncy Castle Java implementation.

GMSS

The Generalized Merkle Signature scheme is based on hash trees and one-time signatures such as the Lamport signature scheme. This is also based on the Bouncy Castle Jave implementation.

Symmetric Encryption Engines

Base Algorithms

The three base ciphers; Rijndael, Serpent, and Twofish have all underwent thorough testing to ensure that they align with valid implementations that use a smaller maximum key size. The same algorithms are used to transform data at any key size; only the key schedule itself has been (optionally) extended, (a key schedule takes a small user key, and expands it into a larger working array, used in the rounds function to create a unique output). These changes to the key schedule, and a flexible rounds assignment, increase the potential security of the cipher, make it more difficult to cryptanalyze, and more resistant to brute force attacks.

RHX: (Rijndael) An implementation of the Rijndael algorithm
SHX: (Serpent) An implementation of the Serpent encryption algorithm
THX: (Twofish) An implementation of the Twofish encryption algorithm

The library also includes parallelized versions of the stream ciphers ChaCha and Salsa20.

HX Ciphers: HMAC based Key Schedules

The HX Series Ciphers use the identical encryption and decryption algorithms (transforms), of the standard ciphers, the difference is that the key schedule has been extended to optionally use a HMAC based Key Derivation Function (HKDF) to expand the internal working key array.

HKDF is powered by a cryptographic hash function, and is one of the most cryptographically strong methods available to generate pseudo-random output. The HKDF based key schedule takes a minimum of the digest hash size as keying input, so for a 256 bit digest that would be 32 bytes, a 512 bit digest would be 64 bytes.

HKDF uses that input keying material to generate a cryptographically strong working key array. The digest used is definable through the HX ciphers constructor KdfEngineType parameter; so any of the implemented digests can be used to power the key schedule KDF: Blake, Keccak, SHA-2, or Skein.

There are several advantages to using a HMAC based Key Derivation Function; the stronger working keys are less susceptible to attack vectors that leverage weak or related keys. The larger user key size, also makes brute force attacks practically impossible; 2^(256-1) compared to a minimum of 2^(512-1) iterations. Another advantage of the HX ciphers is that the number of diffusion rounds, (transformation cycles within the rounds function), are configurable independent of the initial key size.

RHX: 10 to 38 rounds of diffusion
SHX: 32 (default) to 64 rounds of diffusion
THX: 16 (default) to 32 rounds of diffusion

Cipher Modes

CBC: Cipher Block Chaining, decryption is optionally parallelized
CFB: Cipher Feed Back, decryption is optionally parallelized
CTR: Segmented Integer Counter, encryption and decryption is optionally parallelized
OFB: Output Feed Back mode
ECB: Electronic Code Book mode

Deterministic Random Byte Generators

CTRDrbg: An Encryption Counter based Deterministic Random Byte Generator
DGCDrbg: A Digest Counter based Deterministic Random Byte Generator
HKDF: A HMAC based Key Derivation Function
PBKDF2: An implementation of an Hash based Key Derivation Function
KDF2Drbg: An implementation of an Hash based Key Derivation Function
SP20Drbg: A parallelized Salsa20 deterministic random byte generator implementation

Message Authentication Code

CMAC: A Cipher based Message Authentication Code: CMAC
HMAC: A Hash based Message Authentication Code
VMAC: A Variably Modified Permutation Composition based Message Authentication Code

Message Digests

SHA256: An implementation of SHA-2 with a 256 bit hash output
SHA512: An implementation of SHA-2 with a 512 bit hash output
Keccak256: The Keccak with digest with a 256 bit return size
Keccak512: The Keccak with digest with a 512bit return size
Blake256: The Blake digest with a 256 bit return size
Blake512: The Blake digest with a 512 bit return size
Skein256: The Skein digest with a 256 bit digest return size
Skein512: The Skein digest with a 512 bit digest return size
Skein1024: The Skein digest with a 1024 bit digest return size

Helper Classes

KeyGenerator: A helper class for generating cryptographically strong keying material
KeyHeader: A helper class that manages an encryption key header structure
MessageHeader: A helper class that manages a message header structure
BlockCipherFromName: Create a symmetric block cipher instance using it's enumeration name
CipherFromDescription: Create a symmetric cipher and mode using it's CipherDescription
CipherModeFromName: Create a symmetric cipher mode instance using it's enumeration name
DigestFromName: Create a cryptographic hash instance using it's enumeration name
MacFromDescription: Create a Mac generator instance using it's MacDescription
PaddingFromName: Create a symmetric cipher padding instance using it's enumeration name
PrngFromName: Create a pseudo random number generator instance using it's enumeration name
SeedGeneratorFromName: Create a seed generator instance using it's enumeration name
StreamCipherFromName: Create a stream cipher instance using it's enumeration name

Numeric

BigInteger: Provides BigInteger operations for modular arithmetic, GCD calculation, primality testing, prime generation, bit manipulation, and other miscellaneous operations.
BigMath: Class of number-theory related functions for use with integers represented as int's or BigInteger objects.
BigDecimal: This class represents immutable arbitrary precision decimal numbers.

Networking

PacketBuffer: A class that contains a searchable list of packet streams.
TcpSocket: A disposable socket class that wraps asychronous and sychronous network operations

Padding

ISO7816: ISO 7816 Padding.
PKCS7: PKCS7 Padding.
TBC: Trailing Bit Compliment Padding.
X923: X923 Padding.
Zeroes: All Zeroes Padding.

Key Factories

KeyFactory: A helper class used to create or extract a CipherKey file
PackageFactory: A helper class used to create or extract a KeyPackage file
VolumeFactory: A helper class used to create and extract a VolumeKey file

Key Types

KeyParams: A container class for encryption Key, Vector, and Ikm arrays
CipherKey: Used in conjunction with the CipherStream class; this structure is used as the header for a single use key and vector set
PackageKey: Contains the KeyAuthority structure with identity and origin, attached policies, a description of the sub-key sets, and the CipherDescription structure containing the description of the cipher
VolumeKey: This structure is used for the encryption of a series of files, each with unique key/iv pairings; like a directory or online volume

Processing

CompressionCipher: Extends the CipherStream class for encrypting a compressed directory of files.
CipherStream : Cipher helper class; performs all operations necessary to encrypt a file.
StreamDigest: Digest helper class; wraps the creation of a hash digest from a Stream
StreamMac: MAC helper class; wraps the creation of a Message Authentication Code from a Stream
VolumeCipher: A helper class used to encrypt or decrypt a series of files on a directory or volume

Pseudo Random Number GeneratorsCipherStream

BBSG: The Blum-Blum-Shub random number generator
CCG: A Cubic Congruential Generator II random number generator
CSPRng: A Cryptographically Secure PRNG using RNGCryptoServiceProvider
CTRPrng: An implementation of a Encryption Counter based Deterministic Random Number Generator
DGCPrng: An implementation of a Digest Counter based Random Number Generator
MODEXPG: A Modular Exponentiation Generator random number generator
PBPRng: An implementation of a passphrase based PKCS#5 random number generator
QCG1: A Quadratic Congruential Generator I random number generator
QCG2: A Quadratic Congruential Generator II random number generator
SP20Prng: An implementation of a Salsa20 Counter based Deterministic Random Number Generator
SecureRandom: An implementation of a Cryptographically Secure Psuedo Random Number Generator

Seed Generators

EntropyPool: Uses system entropy sources to provide a generators initial state
CSPRsg: A Cryptographically Secure seed generator using the RNGCryptoServiceProvider class
ISCRsg: An implementation of the ISAAC pseudo random generator
XSPRsg: An implementation of the XorShift+ pseudo random generator
CTRRsp: An implementation of the Symmetric cipher counter based pseudo random generator
SP20Rsp: An implementation of the Salsa20 counter based pseudo random generator

Queuing

JitterQueue: Adds a small amount of random delay time to a queuing operation
WaitQueue: demonstrates using a queue to create a constant time implementation

Security

SecureDelete: a 5 stage secure file deletion class

Utilities

Compare: Compare arrays for equality
Compression: a fully implemented compression and folder archiving class
DirectoryTools: Directory methods wrapper class
DriveTools: Drive methods wrapper class
FileTools: File methods wrapper class
NetworkTools: A networking tools class
RegistryUtils: A systems registry utility class

DTM-KEX

DTM stands for 'Deferred Trust Model' which is how it works in regards to mechanisms like authentication and forward secrecy. These tasks are handled at another layer of software, with the KEX being only responsible for the transmission of keying material for a session. This is a layered encryption protocol, that is, there are two independent asymmetric and symmetric ciphers at work during the key exchange.

The protocol steps can be logically divided into two sections or 'phases'; the first is the authentication phase, in which the first asymmetric and symmetric keys are exchanged, and the hosts authenticate.

The primary phase transfers the final session keys, is encrypted beginning to end with the auth-phase symmetric key, including asymmetric parameters, public key, and the (double) encrypted primary session key.

The design concept stems from a need for a key exchange that offers both the maximum flexibility, and a high degree of security. I guess you could look at the library example as both a protocol, and the implementation of that protocol; so it might help to break the description down into two parts..

The DTM Protocol

There are nine stages to an exchange; Connect, Init, PreAuth, AuthEx, Auth, Sync, PrimeEx, Primary, and Established.

A session begins with the Connect stage; where a client, Alice, requests an exchange from a server Bob. This is also where a parameters negotiation is transacted; the asymmetric and symmetric ciphers used in the exchange are determined by the server (the node accepting a call), by enforcing a minimum security context for the exchange.

The connection request contains a partial DtmIdentity structure: which for the Connect Stage of the exchange, contains only a Public identity field which uniquely identifies the host on the network. This identity field is an unbounded byte array. It can contain a serialized structure with more information like connection prerequisites, a signed id field, security parameters, whatever your organizational or application requirements demand, but it should contain an id field that can be used in the initial identification of the host. If the server accepts the connection, it sends back its own public id in response, at which point the exchange moves to the Init Stage. This is one of four identity checks that happen during the exchange, the identity packet data is passed up to the application via an event, this event can also cancel an exchange, which sends back a termination packet with the Refuse option flag set. The operation can be cancelled both by a refusal of the identity data, or the asymmetric cipher parameters.

The Init stage mirrors the Connect, except that once the initial transfer is accepted, the DtmIdentity structure is fully populated. It contains the Auth-Phase symmetric cipher parameters (cipher type, key length, mode etc), and the asymmetric parameters for the first stage public key exchange.

The PreAuth stage is where the two nodes exchange the Auth-Phase public keys. These public keys are used to encrypt the first symmetric cipher keys. Note that the exchange is bi-directional; data that Bob sends to Alice is encrypted with the key that Bob generated, as data Alice sends is encrypted with her key. In this way each actor is responsible for the security of the channel they transmit data on.

After receiving the public asymmetric keys, each client generates a symmetric key, encrypts that key with the other clients public key, and sends that encrypted symmetric key. This is the Auth-Phase symmetric key, and from this point on, everything is encrypted with that key; primary asymmetric parameters, public key, and the primary session key.

This is the final stage of the Auth-Phase of the key exchange. The DtmIdentity structure is encrypted and exchanged. Just as with the Connect and Init stages, the identity field is an unbounded byte array, which can contain a serialized structure with security information, operational flags, a signed data array, or simply a persistent secondary authentication string unique to that host.

This is the first stage of the Primary-Phase of the key exchange. Just as with the Init-Stage, a full DtmIdentity structure is exchanged, only this time it is encrypted with the Auth-Phase symmetric key. This identity field contains the primary symmetric cipher parameters, and the Asymmetric parameters, which can be either an OId field identifying a pre-defined parameter set, or the serialized asymmetric parameters.

The public keys for the primary asymmetric cipher are encrypted with the Auth-Phase symmetric key, and exchanged between hosts.

The primary symmetric keys are generated by each host, encrypted with the primary asymmetric public key, then encrypted again with the Auth-Phase symmetric key. There are a number of safeguards for the Primary-Phase exchange; the symmetric and asymmetric keys, and parameters can be prepended and appended with random bytes before encryption and transmission, to mask their size and type, and a ranged random delay can be introduced before the transmission of the primary asymmetric key.

This is the last stage of the exchange, where hosts confirm that the VPN is in the Established state. The VPN is now ready for data transmission.

DTM Implementation

When I first started researching key exchange protocols, I looked at TLS and PGP, and it quickly occurred to me that not only were these exceedingly complex for simple peer-to-peer exchanges, but that there were a lot of features I didn't need, and they were missing features I wanted to add, so.. I started thinking about designing a key exchange, one that has a high degree of flexibility, could be applied in a wide range of uses, was built with post-quantum secure ciphers, and that offered a strong but flexible security model. It seemed to me that the best way to keep the exchange flexible was to offload or 'defer' authentication to a higher layer of software, whereby the software, and in turn, the users actions and application settings, can at least in part determine the level of security, authentication, repudiation, and how an exchange is transacted.

Most key exchanges work the same way; two hosts exchange public asymmetric keys; they use each others public keys to encrypt a symmetric cipher key, (symmetric ciphers are used because they are faster, and asymmetric ciphers have a small maximum message size). The symmetric keys are exchanged and decrypted with the asymmetric ciphers private keys, and then used to encrypt/decrypt a data stream.

This exchange uses a two-stage key exchange mechanism, the first asymmetric cipher, Ring-LWE, encrypts the symmetric keys transferred between hosts, these symmetric keys are then used for authentication and to encrypt the second half of the exchange, which uses a second asymmetric cipher like NTRU, to encrypt the primary symmetric session keys. This is done for several reasons; the parameters sets for some asymmetric ciphers should not be transmitted over an open channel, they contain information that could aide an attacker (like the Db field, which is the number of random bytes pre-pended to a message in NTRU). So by encrypting that second ciphers parameters, public key, and double encrypting the primary session key, you are creating a transfer mechanism that would require breaking both asymmetric ciphers to derive the primary symmetric session key, (which can be a key for any cipher in this library, including the extended hybrid ciphers like RHX, which are themselves practically unbreakable). Unlike key exchanges using RSA or ECDH, the post-quantum secure asymmetric ciphers in this library are not vulnerable to attacks by quantum computers, this is a serious concern for long term data security, because once large scale quantum computers are built, (probably within the next ten years), RSA and ECDH will be broken.

One of the central design goals of CEX is to offer advanced encryption in a turn-key solution, whereby a developer could encrypt a file or a stream with only a few lines of code, because let's face it, many of the encryption libraries out there are hard to use, with few working examples, and sparse documentation, a developer has to make a significant investment in research to get encryption working properly in their application. So here was another problem, how to wrap the entire key exchange, and the network stack so that it can be used simply and reliably? This turned out to be quite a challenge, but what I am presenting here allows for just that, the key exchange and an encrypted TCP channel with as little as five lines of code..

The CEX solution contains two test projects, DtmServer and DtmClient. To run the example set the DtmServer as the startup project in the solution explorer, start it, and then right-click on the DtmClient project and choose: Debug->Start New Instance. The entire exchange will execute using the computers loopback address, and you are left with two windows configured as simple chat clients.

An Example

// dtm server exchange parameters X11RNS1R2
DtmParameters srvDtmParams = DtmParamSets.FromName(DtmParamSets.DtmParamNames.X42RNS1R1);       // preset contains all the settings required for the exchange

// dtm server id
DtmClient srvDmtId = new DtmClient(
    new byte[] { 3, 3, 3, 3 },      // the clients public id, (should be at least 32 bytes, can be used as a contact lookup and initial auth)
    new byte[] { 4, 4, 4, 4 });     // the clients secret id, (secret id can be anything.. a serialized structure, signed data, hash, etc)

// create the server
_dtmServer = new DtmKex(srvDtmParams, srvDmtId);
_dtmServer.IdentityReceived += new DtmKex.IdentityReceivedDelegate(OnIdentityReceived);         // returns the client public and secret id fields, used to authenticate a host
_dtmServer.PacketReceived += new DtmKex.PacketReceivedDelegate(OnPacketReceived);               // notify that a packet has been received (optional)
_dtmServer.SessionEstablished += new DtmKex.SessionEstablishedDelegate(OnSessionEstablished);   // notify when the vpn state is up
_dtmServer.PacketSent += new DtmKex.PacketReceivedDelegate(OnPacketSent);                       // notify when a packet has been sent to the remote host (optional)
_dtmServer.DataReceived += new DtmKex.DataTransferredDelegate(OnDataReceived);                  // returns the decrypted message data
_dtmServer.FileReceived += new DtmKex.FileTransferredDelegate(OnFileReceived);                  // notify that a file transfer has completed
_dtmServer.FileRequest += new DtmKex.FileRequestDelegate(OnFileRequest);                        // notify that the remote host wants to send a file, can cancel or provide a path for the new file
_dtmServer.SessionError += new DtmKex.SessionErrorDelegate(OnSessionError);                     // notify of any error conditions; includes the exception, and a severity code contained in the option flag
           

// server starts listening
_dtmServer.Listen(IPAddress.Any, Port);

The first line in the above code is the declaration of the DTM parameter set, this includes which asymmetric ciphers are used, their parameters, the symmetric cipher descriptions, as well as some security controls; like how much random to append/prepend to the primary public key before encryption and transmission.

The next line of code is a DtmClient structure; this contains the public and secret message arrays. These arrays are what is transmitted in the DtmIdentity structure during the Connect, Init and Auth stages of the key exchange. These arrays are unbounded and can be serialized structures containing a heirarchy of flags and data used by the application layer to authenticate the caller.

The next line initializes the DtmKex class, passing in the DtmParameters and DtmClient data which is used to direct the working operations of the exchange.

Next we wire up the events:

The IdentityReceived event is fired when an Identity packet is received from the remote host. It contains the identity message data used to authenticate a node through the application layer.
The PacketReceived and PacketSent are fired each time a packet either during or after the exchange is sent or received.
The SessionEstablished fires when the exchange has been established, and returns the initialized forward and return primary symmetric cipher instances, as well as the active socket instance. If this event is used, the DtmServer class can be disposed, and the remainder of the session handled externally.
The DataReceived event is used if the the DtmKex instance serves as not just the vehicle of the exchange, but also as the primary encrypted channel. All data received from the remote host is processed and decrypted before being sent through this event.
The FileSent and FileReceived are used to signal a file transfer has completed. File transfers are seperate instances, each with their own keys and network socket.

The last line; Listen(address, port) begins listening for a connection after passing it the ip address and port number.

Setting up the DtmClient is the exact same code, only instead of the Listen method, use Connect(address, port).

DtmParameters and DtmParamSets

The DtmParameters class, just as with the asymmetric cipher parameters, defines the internal settings used in the key exchange and encrypted channel:

OId: The DtmParameters Identifier field; should be 16 bytes describing the parameter set
AuthPkeId: The Auth-Phase Asymmetric parameters OId; can be the Asymmetric cipher parameters OId, or a serialized Asymmetric Parameters class
PrimaryPkeId: The Primary-Phase Asymmetric parameters OId; can be the Asymmetric cipher parameters OId, or a serialized Asymmetric Parameters class
AuthSession: The Auth-Phase Symmetric sessions cipher parameters; contains a complete description of the Symmetric cipher
PrimarySession: The Primary-Phase Symmetric sessions cipher parameters; contains a complete description of the Symmetric cipher
RandomEngine: The Prng type used to pad messages
MaxAsmKeyAppend: (Optional) The maximum number of pseudo-random bytes to append to the Primary-Phase Asymmetric Public key before encryption
MaxAsmKeyPrePend: (Optional) The maximum number of pseudo-random bytes to prepend to the Primary-Phase Asymmetric Public key before encryption
MaxAsmParamsAppend: (Optional) The maximum number of pseudo-random bytes to append to the Primary-Phase Client Identity before encryption
MaxAsmParamsPrePend: (Optional) The maximum number of pseudo-random bytes to prepend to the Primary-Phase Asymmetric Client Identity before encryption
MaxSymKeyAppend: (Optional) The maximum number of pseudo-random bytes to append to the Primary-Phase Symmetric key before encryption
MaxSymKeyPrePend: (Optional) The maximum number of pseudo-random bytes to prepend to the Primary-Phase Symmetric key before encryption
MaxMessageAppend: (Optional) The maximum number of pseudo-random bytes to append to a Post-Exchange message before encryption
MaxMessagePrePend: (Optional) The maximum number of pseudo-random bytes to prepend to a Post-Exchange message before encryption
MaxAsmKeyDelayMS: (Optional) The maximum delay time before sending the Primary-Phase Asymmetric key; the minimum time is 1 half max, a value of 0 has no delay
MaxSymKeyDelayMS: (Optional) The maximum delay time before sending the Primary-Phase Symmetric key; the minimum time is 1 half max, a value of 0 has no delay
MaxMessageDelayMS: (Optional) The maximum delay time before sending Post-Exchange message traffic; the minimum time is 0, a value of 0 has no delay

Most of these parameters aren't required, and it can be a bit tricky matching up the sometimes small maximum message size of an asymmeric cipher with the symmetric key and vector sizes. This is why there is a DtmParamSets class that contains a number of pre-sets ordered by a security classification. The parameter sets are serializable, cloneable, and can be retrieved by OId, enumeration, or the deep copy of a static instance. The SecurityContexts enumeration orders the pre-sets X1 through X4, with X1 being the maximum security, and X4 tuned for speed.

Inside DtmKex

There's a lot going on under the hood in this class; hello timers, auto crypto-stream resync, auto-reconnect, IPv6 ready, parallelized file transfers.. I could write an article just on this class. In lieu of that there are several thousand pages of documentation on the project, and I went out of my way to comment the class thoroughly. So if you're interested, debug, read, debug some more.. here's some of the highlights:

Properties

AutoReconnect: Get/Set attempt to reconnect to a host if the connection is dropped through an error or timeout
ConnectionTimeOut: Get/Set the number of contiguous missed keepalives (at one second intervals), before a connection is considered dropped
IsConnected: Get the connection state; true if connected to a remote host
IsEstablished: Get the VPN is Established
FileBufferSize: Get/Set: the size of the file Tcp and buffer queue element
MaxResend: Get/Set the maximum number of times a packet can be resent; default is 1024
MessageBufferSize: Get/Set the size of the message Tcp and buffer queue elements
ResendThreshold: Get/Set the number of queued message packets before a resend is triggered
Socket: Get the TcpSocket class instance

Methods

Connect(address, port) Connects either with the host name or ip address; sends a connection request, starting the key exchange.
ForwardKeyRequest Allows for on the fly key ratcheting on the TCP stream
Listen(address, port) Initialize the server and listen for incoming connections
Disconnect() Disconnect from the remote host and teardown the connection
SendReceive(MemoryStream, int) Blocking transceiver; sends a packet and waits for a response
Send(Stream) Used Post-Exchange to encrypt data before it is sent to the client
SendFile(string) Used to initialize the file transfer sequence. Files are each sent on their TCP socket, using unique keys. This was done to increase reliability (a lost packet would mean resyncing the primary channel), and to implement a buffered parallelized transfer mechanism for large data.
Dispose() Clean up resources and dispose of the class

Events

DataReceived: Fires each time data has been received through the post-exchange encrypted channel, contains decrypted received data
Disconnected: Fires when the connection is disposed
FileReceived: Fires when the file receive operation has completed
FileSent: Fires when the file send operation has completed
FileRequest: Fires when the host receives notification of a pending file transfer
IdentityReceived: Fires when a packet containing identity data is received
PacketReceived: Fires each time a valid packet has been received
PacketSent: Fires each time a valid packet has been sent
ProgressPercent: Returns received file bytes processed as an integer percentage
SessionError: Fires when an error has occured
SessionEstablished: Fires when the vpn has been established

Symmetric Processors

CompressionCipher

The Compression cipher wraps the CipherStream class by first compressing a target directory, then encrypting the compressed file. Decryption inflates the directory to a target path.

public static void CompressionCipherTest(string InputDirectory, string OutputDirectory, string CompressedFilePath)
{
    KeyParams kp = new KeyGenerator().GetKeyParams(32, 16);
    // Create an archive //
    // create the cipher
    using (ICipherMode cipher = new CTR(new RHX()))
    {
        // initialize the cipher for encryption
        cipher.Initialize(true, kp);

        // create the archive file
        using (FileStream fs = new FileStream(CompressedFilePath, FileMode.Create))
        {
            // compress and encrypt directory
            using (CompressionCipher cc = new CompressionCipher(true, cipher))
            {
                // set the input folder path and archive output stream
                cc.Initialize(InputDirectory, fs);
                // write the compressed and encrypted archive to file
                cc.Write();
            }
        }
    }

    // Inflate an archive //
    // create the cipher
    using (ICipherMode cipher = new CTR(new RHX()))
    {
        // initialize the cipher for decryption
        cipher.Initialize(false, kp);

        // open the archive
        using (FileStream decmp = new FileStream(CompressedFilePath, FileMode.Open))
        {
            // decrypt and inflate to output directory
            using (CompressionCipher cc = new CompressionCipher(false, cipher))
            {
                // set the output folder path and archive path
                cc.Initialize(OutputDirectory, decmp);
                // decrypt and inflate the directory
                cc.Write();
            }
        }
    }
}

VolumeCipher

This is a kind of file system level encryption. An entire directory, including subtrees can be encrypted, each file encrypted with its own unique key and vector. A single file or the entire directory can be decrypted using a VolumeKey; a specialized key created and extracted with the VolumeFactory class.

Encrypting a Volume:

private void VolumeEncrypt(string[] VolumeFiles, string KeyPath)
{
    // key will be written to this stream
    MemoryStream keyStream = new MemoryStream();

    // encrypt the files in the directory
    using (VolumeCipher vc = new VolumeCipher())
    {
        keyStream = vc.CreateKey(CipherDescription.AES256CTR, VolumeFiles.Length);
        vc.ProgressPercent += OnVolumeProgressPercent;
        vc.Initialize(keyStream);
        vc.Encrypt(VolumeFiles);
    }

    // write the key
    keyStream.Seek(0, SeekOrigin.Begin);
    using (FileStream outStream = new FileStream(KeyPath, FileMode.Create, FileAccess.ReadWrite))
        keyStream.CopyTo(outStream);

    // clean up
    keyStream.Dispose();
}

Decrypting a file(s) in the volume:

private void VolumeDecryptFile(string FilePath, string KeyPath, DecVolume)
{
    string[] paths = new string[1];

    if (DecVolume)
        paths = FileUtilities.DirectoryGetFiles(Path.GetDirectoryName(FilePath));
    else
        paths[0] = FilePath;

    using (FileStream ks = new FileStream(KeyPath, FileMode.Open, FileAccess.ReadWrite))
    {
        using (VolumeCipher vc = new VolumeCipher())
        {
            vc.Initialize(ks);
            vc.Decrypt(paths);
        }
    }
}

Key Packages

The Example project utilizes package keys; these are compound keys or 'sub-key sets', containing the keying material necessary for a single encryption cycle. Package keys address a number of security concerns in the distribution of cryptographic keys. For instance, you give a key to a friend, but how do you know that they will not distribute the key to someone else, or re-use that key for another encryption? Package keys contain not only the keying material but combinable flags that control how the key is to be processed; who can use that key, how many times the key can be used for decryption, limits a sub-key to a single encryption, and can even securely delete a subkey from the set after a decryption. A package key file can contain thousands of subkeys, has a layered security access scheme, and can itself be encrypted via a passphrase driven encryption cipher. Key packages are created with the PackageFactory class:

// populate a KeyAuthority structure
KeyAuthority authority =  new KeyAuthority(domainId, originId, packageId, packageTag, keyPolicy);
// create a key file
new PackageFactory(KeyPath, authority).Create(PackageKey);

// populate a KeyAuthority structure
KeyAuthority authority =  new KeyAuthority(domainId, originId, packageId, packageTag, keyPolicy);
KeyParams keyparam;
CipherDescription description;
byte[] extKey;
byte[] keyId;

// extract a key for decryption
using (PackageFactory factory = new PackageFactory(KeyPath, authority))
    factory.Extract(keyId, out description, out keyparam, out extKey);

// populate a KeyAuthority structure
KeyAuthority authority =  new KeyAuthority(domainId, originId, packageId, packageTag, keyPolicy);
KeyParams keyparam;
CipherDescription description;
byte[] extKey;

// get the next available encryption subkey
using (PackageFactory factory = new PackageFactory(KeyPath, authority))
    keyId = factory.NextKey(out description, out keyparam, out extKey)

CipherStream

The CipherStream class does it all for you.. it calculates the ideal parallel block size (based on a configurable BlockProfile setting), provides a progress event, and writes out data processed by a cipher from one stream to another. It is as simple and easy to use as I can make it. Take a look..

KeyParams kp;

using (KeyGenerator kg = new KeyGenerator())
    kp = kg.GetKeyParams(64, 16);

using (ICipherMode cipher = new CTR(new RHX()))
{
    using (CipherStream cstrm = new CipherStream(cipher))
    {
        cstrm.ProgressPercent += new CipherStream.ProgressDelegate(TestProgressPercent);
        cstrm.Initialize(true, kp);
        cstrm.Write(new FileStream(FileMode.Open, FileAccess.Read), new FileStream(outFile, FileMode.Create, FileAccess.Write));
    }
}

In the example we create a KeyParams structure populated by the KeyGenerator class. Then create a cipher instance wrapped in the CTR mode. In this case it is the RHX cipher, (but it works on all of the cipher implementations). We use that cipher to initialize the CipherStream class. Add a progress event, then initialize the class with an encryption flag and the KeyParams class. The two streams; input and output are passed to the Write() method which engages the stream processing. The input stream is then encrypted or decrypted for you. Massive encryption, with as little as five lines of code..

Key and Vector Creation

The KeyGenerator class uses a two stage pseudo random generation method. It uses a seed generator, (that can be initialized through different entropy sources) to create seed bytes along with an incrementing variable length counter; both are combined and processed by a cryptographic hash function (HMAC is also used with SHA-2) to produce the pseudo random output: Digest(Rsg() + ctr).

The KeyGenerator constructor contains two optional parameteters; SeedEngine and DigestEngine. The seed material can be generated with any of the implemented Rsg's. The digest engine can be any one of the digests. The counter uses the machine local RngCrypto provider, but the seed generator (when using XSPRsg and ISCRsg) can use the EntropyPool mechanism to initialize. So, you can create key material using a combination of ISAAC and 1024 bit Skein, or RngCrypto and Keccak.. or any other combination of seed/digest you choose.. The KeyGenerator can return a byte array, or a populated KeyParams class.

byte[] rand;
using (KeyGenerator gen = new KeyGenerator())
    // generate pseudo random bytes
    rand = gen.Generate(Size);

Headers

CipherKey

The CipherKey structure contains the cipher description; all of the parameters needed to recreate the cipher instance. It also contains a unique id field, and 16 bytes of random used to encrypt a file extension. Within the CipherDescription, two fields are used to determine HMAC settings; the MacSize integer, which stores the size of the digest output, and MacEngine, which stores which digest was used to create the MAC code. If the MacSize is set to zero, that indicates there is no HMAC processing of a message file. If the MacSize is non-zero, this is considered a signing key.

Message Description

The message description contains a unique id of the key used to encrypt that message, and the length of an optional MAC code. The message may contain a MAC code; used with the MAC key in the KeyHeaderStruct to verify the integrity of the message. The message description itself does not contain any description of the MAC, or whether a MAC was used at all. The MAC code is derived from a keyed digest that gets its code by processing the message cipher text. The MAC itself is a type of pseudo random permutation, so an attacker would have no way of distinguishing it from the cipher text itself. Without the key, an attacker would have no way of knowing where the message ends, because without the key file, they do not know the length of the MAC field, it could be 32, 64 or 128 bytes, or there may be no MAC at all.. the point is, we give an adversary as little information to work with as possible.

KeyFactory

The KeyFactory class is a helper class used to create or extract a Key file. It makes it very easy to create a key file, or extract the file to a CipherKey header containing the cipher description, and a KeyParams class that contains the keying material. If a KeyParams instance is not specified during creation, the random keying material is generated automatically using the KeyGenerator class.

Create a key file

// populate a keyheader
CipherKey keyHeader = new CipherKey(
    Engines.RHX,        // cipher engine
    64,                 // key size in bytes
    IVSizes.V128,       // cipher iv size enum
    CipherModes.CTR,    // cipher mode enum
    PaddingModes.X923,  // cipher padding mode enum
    BlockSizes.B128,    // block size enum
    RoundCounts.R18,    // diffusion rounds enum
    Digests.Skein512,   // cipher kdf engine
    64,                 // mac size
    Digests.Keccak);    // mac digest

// create the key file
new KeyFactory(KeyPath).Create(keyHeader);

Extract a key file

// local vars
KeyParams keyparam;
CipherKey header;

new KeyFactory(KeyPath).Extract(out keyparam, out header);

Message Authentication

Message authentication can be handled with the StreamMac class. Just as with the CipherStream class, all the setup is done for you. The recommended way to create a MAC is post encryption. In the example, the mac stream is initialized with the MAC key, the file stream to process is defined in the Initialize method, and the ComputeMac returns the MAC code.

using (MacStream mstrm = new MacStream(new SHA512HMAC(keyParam.IKM)))
{
    // initialize mac stream
    mstrm.Initialize(outStream);
    // get the hash
    byte[] hash = mstrm.ComputeMac();
}

To verify a MAC, you simply run mac stream again on the file and compare the output with the mac code contained in the message. The Example project contains a working example.

EntropyPool

The EntropyPool class collects state from various counters, timers, and handles on the system, distributes them across 8 internal pools and then compresses this state using Keccak 512. Collected state includes Clsid strings stored in the registry, process and thread handles and statistics, system timers, and bytes from the RngCrypto random provider. This state is distributed byte by byte in a round robin fashion across the eight queues, then compressed from around 7200 bytes to 512 bytes distributed across the 16 pool members. The pool can be used to seed a Drbg or seed generator, which in turn can be used to generate keying material.

Symmetric Cipher Overview

Before we start looking at some of the ciphers and getting into implementation details, I think it helps to present a general idea of what has been done, and clarify some of the concepts and terminology used in the article.

First off, the key schedule: A key schedule is a function that takes a small amount of user supplied data (the input cipher key), and expands it, usually into a larger integer array. For example; Rijndael takes a 32 byte key (256 bit), and expands that into 60 integers, or 240 bytes worth of keying material. That array of integers is sometimes called an array of 'round' keys, 'subkeys' or 'working' keys, I'll use the term working key, because it makes it clear that it is a derived key. Some key schedules have a simple algebraic expression; Rijndael for example, derives most of the working keys with a simple exclusive OR of two previous keys. Serpent uses a much more elaborate key schedule, one that was designed to resist some forms of cryptanalysis. These working keys, created by the key schedule are used to create a unique cipher text, and a good cipher design is one in which a change of just a single bit in the cipher key, results in a completely different output, this is known as the 'avalanche' property. The working keys are usually added to the state (processed data at some stage of transformation), with a simple addition or XOR.

Larger keys play an important role in a ciphers security. Many of the techniques used to 'break' a cipher involve the reduction in the number of times a unique key is tested against the ciphers decryption output, in other words, they reduce the number of brute force attempts required to decrypt the output.

Each time you add a bit to a keys length, you double the size of the previous potential sum. So, a 256 bit key represents an integer with a maximum rounded as 1.15 times 10 to the power of 77. A simply monstrous number.. and one might think that computers will never be fast enough to run a decryption cycle that many times, and that is most certainly true, but some cryptanalytic attacks aim to reduce that number, sometimes quite substantially. A larger key puts this further out of reach; given that the cipher key and the working key produced is done in a cryptographically strong way, much larger keys are feasible, and by using those longer keys, data can be kept beyond the capabilities of technology for a longer period of time.

There is also evidence that the key schedule plays a part in providing strength against linear and differential cryptanalysis, and there have been some serious attacks on Rijndael that leverage the weak key schedule. So it follows that a cryptographically strong key schedule, can help create a stronger cipher.

In this article, a transform is a function that performs the actual encryption of data, just as the inverse transform performs the decryption in a reversible iterative block cipher. In a rounds based cipher, (sometimes called a product cipher), a round can be thought of as one complete sequence of transformation, whereas the transform function, (or rounds function), may loop through a number of rounds and use whitening stages.

In the three block ciphers presented here (Rijndael, Serpent, and Twofish), The input bytes are first copied into four integers. These integers are (in the case of Rijndael and Twofish), XORd with members of the working key, (key whitening). These state integers are then processed in a series of rounds, which change the state via a series of substitutions, permutations, and modular arithmetic, (with the key added in stages), finally the processed state is whitened with the remaining key members and copied into the output byte array.

Let's look at a round of Serpent:

R0 ^= _exKey[keyCtr++];
R1 ^= _exKey[keyCtr++];
R2 ^= _exKey[keyCtr++];
R3 ^= _exKey[keyCtr++];
Sb0(ref R0, ref R1, ref R2, ref R3);
LinearTransform(ref R0, ref R1, ref R2, ref R3);

R0 through R3 are state integers. Before each round the state is XORd with a member of the working key. The state is then processed through one of eight bit slicing S-Boxes before undergoing a linear transform. This clearly illustrates the role of the working key during a round cycle; the working keys are used to mix with the state in a way that will produce an output that is unique to that key, this is their purpose, and in these ciphers, they do not interact with the transformation functions in any other way.

One often hears of the term rounds in the context of the number of rounds that can be broken using an attack on the cipher; Rijndael has been shown to be vulnerable to a known-key distinguishing attack against a reduced 8-round version of AES-128. These attacks are often aimed at reduced versions, where a smaller number of rounds can be broken, as both a means of providing proof with limited computing power, and positing the method by which a full transformation might be reversed. This is because with most ciphers, adding rounds increases the security of the cipher by making differential or linear cryptanalysis more difficult. There have been a number of noted cryptographers who have stated that the number of rounds used in Rijndael should be increased, that its simple algebraic description makes it vulnerable with the current round counts, (10, 12, and 14), and it should be increased to 22 or more rounds to ensure its continued integrity..

The HX Ciphers

One of the central goals of this project has been to create the strongest ciphers possible, using existing and proven cryptographic primitives. Another important goal was to try to better understand various attack vectors, and create something that was more resistant to these attacks.

The three base ciphers all have something in common; they all use the working key in a similar way; to change or 'whiten' the state values to create a unique output, other than that, they do not interact with the actual computational processes used to transform the state. What that means is that how that cipher key is expanded, (so long as it is done in a secure way), does not directly impact the data transformation. Creating that expanded key using a more secure means, like a hash function, can increase the overall security of the cipher itself.

The HX ciphers; RHX, SHX, and THX can all use HKDF Expand, that's a HMAC based Key Derivation Function, a kind of key stretching function, or pseudo-random generator. By default, HKDF is powered by an HMAC, a keyed hash function. This is one of the most cryptographically strong methods of creating a pseudo-random output; even a strong key schedule like the one used in Serpent, is not as secure as using this method to generate the working keys. Aside from the increased security, there are two additional advantages to replacing the key schedule with a HMAC based KDF; it is more resistant to weak key and sliding attacks, and the longer cipher key size, makes brute force attacks impossible.

Timing attacks use discrete differences in the length of time it takes to perform a task with a given set of parameters. In the case of an attack on a key schedule, it measures the distance in timing of things like branching and table lookups to make predictions about the key; like the difference between branches in a function, or the computational time averaged to compute an output given the value of a specific table member. SHA-2 is less vulnerable to timing attacks, because the amount of time required to run is typically more constant than say.. the Rijndael key schedule.

The other advantage is the key size; the minimum key size for an HX cipher in HKDF extended mode is the size of the hash function output. So the key for these ciphers is a minimum of 32 bytes with a 256 bit message digest, but expandable up to any size in multiples of the hash functions output size. Even when quantum computers eventually halve the symmetric cipher key space, a large safety margin can still be maintained. If new attacks are devised that threaten the integity of the cipher, it is now flexible enough to respond to those threats, both through extending the key size and increasing the number of transformation rounds.

Benchmarks

Speed tests on an HP all-in-one with a I7-6700T processor, 12GB DDR3 RAM, compiled Release/Any CPU.
Test is a transform of a byte array in a Monte Carlo method.
Sizes are in MBs (1,000,000 bytes). Time format sec.ms, key sizes in bits. Rate is MB per minute.
Highest rate so far is RHX with a 128 bit key: 18.18 GB per minute, and the parallelized Salsa with 46GB!

I put a bench from CEX++ in here as a reference. It demonstrates just what can be done with a combination of AES-NI, intrinsics, fast-lean code, and the more efficient language..

Cipher	Mode	Func	Time	Rate
RHX	CTR	ENC	0.42	14285
THX	CTR	ENC	0.30	20000
SHX	CTR	ENC	1.07	5607
SALSA	CTR	ENC	0.13	46153
CHACHA	CTR	ENC	0.14	42857

The parallelized decryption functions of the CFB and CBC modes are comparable, less about 10-20%.

My i7 cpu is using 8 threads, other processors with more or less cores, different memory speeds/configurations etc., will have varying results.

The C++ version using AES-NI/CTR outputs at about 7GB per second now on the same machine, what a difference C++ makes..

RHX (Rijndael)

The Key Schedule

The key schedule in RHX is the defining difference between this and other versions of the cipher; there are two modes of operation, standard: which uses the original key expansion routines, and secure, which uses an internal HKDF Expand function to derive the working key. Which method is used is determined through the classes constructor. If the KdfEngineType enumeration parameter is set to a message digest function, (the default is None), the cipher is initialized in HKDF extended mode.

The standard expansion routine will produce the expected output when using keys up to 256 bits (there are no standards tests for a 512 bit key). This is proven with the NIST AESAVS known answer tests in the Test project.

The secure key expansion method uses a HMAC based pseudo-random generator to create the working key. HKDF Expand is a key derivation function that is using a cryptographic hash function as its diffusion engine. This is one of the strongest methods available for generating pseudo-random keying material, and far superior in entropy dispersion to Rijndael, or even Serpents native key schedule.

HKDF uses up to three inputs; a nonce value called an information string (the DistributionCode property), an Ikm (the Key), and a Salt value. The HMAC Rfc 2104, recommends a key size equal to the digest output, in this case 64 bytes with SHA512. The Salt is derived as anything greater than the Key size, so a 128 byte key will call HKDF Extract with a 64 byte key and a 64 byte salt.

First we have to look at the constructor of an HX cipher, this where the number of rounds, the block size , and the HKDF digest engine can all be specified.

public RHX(int BlockSize = BLOCK16, int Rounds = ROUNDS22, Digests KdfEngineType = Digests.None)
{
    if (BlockSize != BLOCK16 && BlockSize != BLOCK32)
        throw new CryptoSymmetricException("RHX:CTor", "Invalid block size! Supported block sizes are 16 and 32 bytes.", new ArgumentException());

    _kdfEngineType = KdfEngineType;
    // add standard key lengths
    _legalKeySizes[0] = 16;
    _legalKeySizes[1] = 24;
    _legalKeySizes[2] = 32;
    _legalKeySizes[3] = 64;
    _blockSize = BlockSize;

    if (KdfEngineType != Digests.None)
    {
        if (Rounds < MIN_ROUNDS || Rounds > MAX_ROUNDS || Rounds % 2 > 0)
            throw new CryptoSymmetricException("RHX:CTor", "Invalid rounds size! Rounds assignment option only available in HKDF extended mode.", new ArgumentException());

        _legalRounds = new int[] { 10, 12, 14, 22, 24, 26, 28, 30, 32, 34, 36, 38 };
        // set the hmac key size
        _ikmSize =  GetIkmSize(KdfEngineType);

        // hkdf extended key sizes
        for (int i = 4; i < _legalKeySizes.Length; ++i)
            _legalKeySizes[i] = (_legalKeySizes[3] + _ikmSize * (i - 3));

        _dfnRounds = Rounds;
    }
    else
    {
        _legalRounds = new int[] { 10, 12, 14, 22 };
        Array.Resize(ref _legalKeySizes, 4);
    }
}

In the above code the block size specified, the number of rounds (determined automatically in standard mode), as well as the HKDF digest type enumerator (instantiated when the Initialize method is called).

When the user input cipher key is added (through the Initialize(bool, KeyParams) method), the key expansion method ExpandKey() is tested, which directs the expansion to the appropriate method.

private void ExpandKey(byte[] Key, bool Encryption)
{
    if (_kdfEngineType != Digests.None)
    {
        // hkdf key expansion
        _expKey = SecureExpand(Key);
    }
    else
    {
        // standard rijndael key expansion + k512
        _expKey = StandardExpand(Key);
    }

    // inverse cipher
    if (!Encryption)
    {
        int blkWords = _blockSize / 4;

        // reverse key
        for (int i = 0, k = _expKey.Length - blkWords; i < k; i += blkWords, k -= blkWords)
        {
            for (int j = 0; j < blkWords; j++)
            {
                uint temp = _expKey[i + j];
                _expKey[i + j] = _expKey[k + j];
                _expKey[k + j] = temp;
            }
        }
        // sbox inversion
        for (int i = blkWords; i < _expKey.Length - blkWords; i++)
        {
            _expKey[i] = IT0[SBox[(_expKey[i] >> 24)]] ^
                IT1[SBox[(byte)(_expKey[i] >> 16)]] ^
                IT2[SBox[(byte)(_expKey[i] >> 8)]] ^
                IT3[SBox[(byte)_expKey[i]]];
        }
    }
}

You can see the switch between secure and standard expansion functions, and if the cipher is initialized for decryption, the key reversal loop required by the inverse cipher.

Here is the portion of the new key schedule that expands a 512bit input key into the required number of round keys:

private uint[] StandardExpand(byte[] Key)
{
    // block in 32 bit words
    int blkWords = _blockSize / 4;
    // key in 32 bit words
    int keyWords = Key.Length / 4;
    // rounds calculation, 512 gets 22 rounds
    _dfnRounds = (blkWords == 8 && keyWords != 16) ? 14 : keyWords + 6;
    // setup expanded key
    uint[] expKey = new uint[blkWords * (_dfnRounds + 1)];


    if (keyWords == 16)
    {
        expKey[0] = IntUtils.BytesToBe32(Key, 0);
        expKey[1] = IntUtils.BytesToBe32(Key, 4);
        expKey[2] = IntUtils.BytesToBe32(Key, 8);
        expKey[3] = IntUtils.BytesToBe32(Key, 12);
        expKey[4] = IntUtils.BytesToBe32(Key, 16);
        expKey[5] = IntUtils.BytesToBe32(Key, 20);
        expKey[6] = IntUtils.BytesToBe32(Key, 24);
        expKey[7] = IntUtils.BytesToBe32(Key, 28);
        expKey[8] = IntUtils.BytesToBe32(Key, 32);
        expKey[9] = IntUtils.BytesToBe32(Key, 36);
        expKey[10] = IntUtils.BytesToBe32(Key, 40);
        expKey[11] = IntUtils.BytesToBe32(Key, 44);
        expKey[12] = IntUtils.BytesToBe32(Key, 48);
        expKey[13] = IntUtils.BytesToBe32(Key, 52);
        expKey[14] = IntUtils.BytesToBe32(Key, 56);
        expKey[15] = IntUtils.BytesToBe32(Key, 60);


        // k512 R: 16,24,32,40,48,56,64,72,80,88, S: 20,28,36,44,52,60,68,76,84
        ExpandRotBlock(expKey, 16, 16, 1);
        ExpandSubBlock(expKey, 20, 16);
        ExpandRotBlock(expKey, 24, 16, 2);
        ExpandSubBlock(expKey, 28, 16);
        ExpandRotBlock(expKey, 32, 16, 3);
        ExpandSubBlock(expKey, 36, 16);
        ExpandRotBlock(expKey, 40, 16, 4);
        ExpandSubBlock(expKey, 44, 16);
        ExpandRotBlock(expKey, 48, 16, 5);
        ExpandSubBlock(expKey, 52, 16);
        ExpandRotBlock(expKey, 56, 16, 6);
        ExpandSubBlock(expKey, 60, 16);
        ExpandRotBlock(expKey, 64, 16, 7);
        ExpandSubBlock(expKey, 68, 16);
        ExpandRotBlock(expKey, 72, 16, 8);
        ExpandSubBlock(expKey, 76, 16);
        ExpandRotBlock(expKey, 80, 16, 9);
        ExpandSubBlock(expKey, 84, 16);
        ExpandRotBlock(expKey, 88, 16, 10);


        if (blkWords == 8)
        {
            ExpandSubBlock(expKey, 92, 16);
            ExpandRotBlock(expKey, 96, 16, 11);
            ExpandSubBlock(expKey, 100, 16);
            ExpandRotBlock(expKey, 104, 16, 12);
            ExpandSubBlock(expKey, 108, 16);
            ExpandRotBlock(expKey, 112, 16, 13);
            ExpandSubBlock(expKey, 116, 16);
            ExpandRotBlock(expKey, 120, 16, 14);
            ExpandSubBlock(expKey, 124, 16);
            ExpandRotBlock(expKey, 128, 16, 15);
            ExpandSubBlock(expKey, 132, 16);
            ExpandRotBlock(expKey, 136, 16, 16);
            ExpandSubBlock(expKey, 140, 16);
            ExpandRotBlock(expKey, 144, 16, 17);
            ExpandSubBlock(expKey, 148, 16);
            ExpandRotBlock(expKey, 152, 16, 18);
            ExpandSubBlock(expKey, 156, 16);
            ExpandRotBlock(expKey, 160, 16, 19);
            ExpandSubBlock(expKey, 164, 16);
            ExpandRotBlock(expKey, 168, 16, 20);
            ExpandSubBlock(expKey, 172, 16);
            ExpandRotBlock(expKey, 176, 16, 21);
            ExpandSubBlock(expKey, 180, 16);
        }
    }
    else if(keyWords == 8)
    {

...

First, just as with every other key size, the input cipher key is copied to the beginning of the expanded key array. The ExpandRotBlock() and ExpandSubBlock() methods alternate, just as with a 256bit key, with each Rot cycle receiving a unique Rcon constant. This mirrors the 256bit key expansion routine exactly, only dispersing a longer input key into an increased number of round keys required by the 22 rounds (as compared to 14 rounds with a 256 key).

Here are the two expansion routines:

private void ExpandRotBlock(uint[] Key, int KeyIndex, int KeyOffset, int RconIndex)
{
    int sub = KeyIndex - KeyOffset;


    Key[KeyIndex] = Key[sub] ^ SubByte((Key[KeyIndex - 1] << 8) | ((Key[KeyIndex - 1] >> 24) & 0xFF)) ^ Rcon[RconIndex];
    // Note: you can insert noise before each subsequent mix to further equalize timing; ex. add between each mix line:
    // (uint) tmp = SubByte((Key[Index - 1] << 8) | ((Key[Index - 1] >> 24) & 0xFF))];
    Key[++KeyIndex] = Key[++sub] ^ Key[KeyIndex - 1];
    Key[++KeyIndex] = Key[++sub] ^ Key[KeyIndex - 1];
    Key[++KeyIndex] = Key[++sub] ^ Key[KeyIndex - 1];
}

private void ExpandSubBlock(uint[] Key, int KeyIndex, int KeyOffset)
{
    int sub = KeyIndex - KeyOffset;


    Key[KeyIndex] = SubByte(Key[KeyIndex - 1]) ^ Key[sub];
    // can equalize timing here as well..
    Key[++KeyIndex] = Key[++sub] ^ Key[KeyIndex - 1];
    Key[++KeyIndex] = Key[++sub] ^ Key[KeyIndex - 1];
    Key[++KeyIndex] = Key[++sub] ^ Key[KeyIndex - 1];
}

By unrolling the key schedule, and using minimal variable creation, these routines are both faster, and more difficult to time. A similar function is used for both the AES-NI (AHX) cipher and RHX in the CEX++ library.

The secure expansion routine is powered by HKDF, using the defined digest, it can produce varying sized expanded working key arrays, to accomodate flexible rounds assignments.

private uint[] SecureExpand(byte[] Key)
{
    // block in 32 bit words
    int blkWords = _blockSize / 4;
    // expanded key size
    int expSize = blkWords * (_dfnRounds + 1);
    // kdf return array
    int keyBytes = expSize * 4;
    byte[] rawKey = new byte[keyBytes];
    int saltSize = Key.Length - _ikmSize;
    // hkdf input
    byte[] hkdfKey = new byte[_ikmSize];
    byte[] hkdfSalt = new byte[0];

    // copy hkdf key and salt from user key
    Buffer.BlockCopy(Key, 0, hkdfKey, 0, _ikmSize);
    if (saltSize > 0)
    {
        hkdfSalt = new byte[saltSize];
        Buffer.BlockCopy(Key, _ikmSize, hkdfSalt, 0, saltSize);
    }
    // HKDF generator expands array
    using (HKDF gen = new HKDF(_kdfEngine, false))
    {
        gen.Initialize(hkdfSalt, hkdfKey, _hkdfInfo);
        gen.Generate(rawKey);
    }

    // initialize working key
    uint[] expKey = new uint[expSize];
    // copy bytes to working key
    Buffer.BlockCopy(rawKey, 0, expKey, 0, keyBytes);

    return expKey;
}

As you can see, the kdf generates the number of bytes necessary to run the set number of rounds, and copies the bytes to the working key integer array.

Lastly, here is one of the rounds functions, used to encrypt 16 bytes. This method is flexible, in that it can run any even number of rounds through the main loop. This method is optimized for speed with the byte oriented approach, which uses pre-multiplied lookup tables.

private void Encrypt16(byte[] Input, int InOffset, byte[] Output, int OutOffset)
{
    int LRD = _expKey.Length - 5;
    int keyCtr = 0;

    // round 0
    uint X0 = IntUtils.BytesToBe32(Input, InOffset) ^ _expKey[keyCtr];
    uint X1 = IntUtils.BytesToBe32(Input, InOffset + 4) ^ _expKey[++keyCtr];
    uint X2 = IntUtils.BytesToBe32(Input, InOffset + 8) ^ _expKey[++keyCtr];
    uint X3 = IntUtils.BytesToBe32(Input, InOffset + 12) ^ _expKey[++keyCtr];

    // round 1
    uint Y0 = T0[X0 >> 24] ^ T1[(byte)(X1 >> 16)] ^ T2[(byte)(X2 >> 8)] ^ T3[(byte)X3] ^ _expKey[++keyCtr];
    uint Y1 = T0[X1 >> 24] ^ T1[(byte)(X2 >> 16)] ^ T2[(byte)(X3 >> 8)] ^ T3[(byte)X0] ^ _expKey[++keyCtr];
    uint Y2 = T0[X2 >> 24] ^ T1[(byte)(X3 >> 16)] ^ T2[(byte)(X0 >> 8)] ^ T3[(byte)X1] ^ _expKey[++keyCtr];
    uint Y3 = T0[X3 >> 24] ^ T1[(byte)(X0 >> 16)] ^ T2[(byte)(X1 >> 8)] ^ T3[(byte)X2] ^ _expKey[++keyCtr];

    while (keyCtr != LRD)
    {
        X0 = T0[Y0 >> 24] ^ T1[(byte)(Y1 >> 16)] ^ T2[(byte)(Y2 >> 8)] ^ T3[(byte)Y3] ^ _expKey[++keyCtr];
        X1 = T0[Y1 >> 24] ^ T1[(byte)(Y2 >> 16)] ^ T2[(byte)(Y3 >> 8)] ^ T3[(byte)Y0] ^ _expKey[++keyCtr];
        X2 = T0[Y2 >> 24] ^ T1[(byte)(Y3 >> 16)] ^ T2[(byte)(Y0 >> 8)] ^ T3[(byte)Y1] ^ _expKey[++keyCtr];
        X3 = T0[Y3 >> 24] ^ T1[(byte)(Y0 >> 16)] ^ T2[(byte)(Y1 >> 8)] ^ T3[(byte)Y2] ^ _expKey[++keyCtr];
        Y0 = T0[X0 >> 24] ^ T1[(byte)(X1 >> 16)] ^ T2[(byte)(X2 >> 8)] ^ T3[(byte)X3] ^ _expKey[++keyCtr];
        Y1 = T0[X1 >> 24] ^ T1[(byte)(X2 >> 16)] ^ T2[(byte)(X3 >> 8)] ^ T3[(byte)X0] ^ _expKey[++keyCtr];
        Y2 = T0[X2 >> 24] ^ T1[(byte)(X3 >> 16)] ^ T2[(byte)(X0 >> 8)] ^ T3[(byte)X1] ^ _expKey[++keyCtr];
        Y3 = T0[X3 >> 24] ^ T1[(byte)(X0 >> 16)] ^ T2[(byte)(X1 >> 8)] ^ T3[(byte)X2] ^ _expKey[++keyCtr];
    }

    // final round
    Output[OutOffset] = (byte)(SBox[Y0 >> 24] ^ (byte)(_expKey[++keyCtr] >> 24));
    Output[++OutOffset] = (byte)(SBox[(byte)(Y1 >> 16)] ^ (byte)(_expKey[keyCtr] >> 16));
    Output[++OutOffset] = (byte)(SBox[(byte)(Y2 >> 8)] ^ (byte)(_expKey[keyCtr] >> 8));
    Output[++OutOffset] = (byte)(SBox[(byte)Y3] ^ (byte)_expKey[keyCtr]);

    Output[++OutOffset] = (byte)(SBox[Y1 >> 24] ^ (byte)(_expKey[++keyCtr] >> 24));
    Output[++OutOffset] = (byte)(SBox[(byte)(Y2 >> 16)] ^ (byte)(_expKey[keyCtr] >> 16));
    Output[++OutOffset] = (byte)(SBox[(byte)(Y3 >> 8)] ^ (byte)(_expKey[keyCtr] >> 8));
    Output[++OutOffset] = (byte)(SBox[(byte)Y0] ^ (byte)_expKey[keyCtr]);

    Output[++OutOffset] = (byte)(SBox[Y2 >> 24] ^ (byte)(_expKey[++keyCtr] >> 24));
    Output[++OutOffset] = (byte)(SBox[(byte)(Y3 >> 16)] ^ (byte)(_expKey[keyCtr] >> 16));
    Output[++OutOffset] = (byte)(SBox[(byte)(Y0 >> 8)] ^ (byte)(_expKey[keyCtr] >> 8));
    Output[++OutOffset] = (byte)(SBox[(byte)Y1] ^ (byte)_expKey[keyCtr]);

    Output[++OutOffset] = (byte)(SBox[Y3 >> 24] ^ (byte)(_expKey[++keyCtr] >> 24));
    Output[++OutOffset] = (byte)(SBox[(byte)(Y0 >> 16)] ^ (byte)(_expKey[keyCtr] >> 16));
    Output[++OutOffset] = (byte)(SBox[(byte)(Y1 >> 8)] ^ (byte)(_expKey[keyCtr] >> 8));
    Output[++OutOffset] = (byte)(SBox[(byte)Y2] ^ (byte)_expKey[keyCtr]);
}

SHX (Serpent)

SHX, just like RHX (optionally) uses an HKDF generator to expand the user supplied key into a working key integer array. It can also takes a user defined number of rounds between 32 (the normal number of rounds), all the way up to 64 (twice the normal) rounds in 8 round sets. A round count of 40 or 48 is more than sufficient, as theoretical attacks to date are only able to break up to 12 rounds and would require an enormous amount of memory and processing power.

The transform in SHX is identical to the Serpent implementation SPX, it process rounds by first moving the byte input array into 4 integers, then processing the rounds in a while loop. Each round consists of an Xor of each state word (R_n) with a key, an S-Box transformation of those words, and then a linear transformation. Each of the 8 S-Boxes are used in succession within a loop cycle. The final round Xors the last 4 keys with the state and shifts them back into the output byte array.

The Serpent encryption transform:

private void Encrypt16(byte[] Input, int InOffset, byte[] Output, int OutOffset)
{
    int LRD = _expKey.Length - 5;
    int keyCtr = -1;

    // input round
    uint R0 = IntUtils.BytesToBe32(Input, InOffset + 12);
    uint R1 = IntUtils.BytesToBe32(Input, InOffset + 8);
    uint R2 = IntUtils.BytesToBe32(Input, InOffset + 4);
    uint R3 = IntUtils.BytesToBe32(Input, InOffset);

    // process 8 round blocks
    do
    {
        R0 ^= _expKey[++keyCtr];
        R1 ^= _expKey[++keyCtr];
        R2 ^= _expKey[++keyCtr];
        R3 ^= _expKey[++keyCtr];
        Sb0(ref R0, ref R1, ref R2, ref R3);
        LinearTransform(ref R0, ref R1, ref R2, ref R3);

        R0 ^= _expKey[++keyCtr];
        R1 ^= _expKey[++keyCtr];
        R2 ^= _expKey[++keyCtr];
        R3 ^= _expKey[++keyCtr];
        Sb1(ref R0, ref R1, ref R2, ref R3);
        LinearTransform(ref R0, ref R1, ref R2, ref R3);

        R0 ^= _expKey[++keyCtr];
        R1 ^= _expKey[++keyCtr];
        R2 ^= _expKey[++keyCtr];
        R3 ^= _expKey[++keyCtr];
        Sb2(ref R0, ref R1, ref R2, ref R3);
        LinearTransform(ref R0, ref R1, ref R2, ref R3); ;

        R0 ^= _expKey[++keyCtr];
        R1 ^= _expKey[++keyCtr];
        R2 ^= _expKey[++keyCtr];
        R3 ^= _expKey[++keyCtr];
        Sb3(ref R0, ref R1, ref R2, ref R3);
        LinearTransform(ref R0, ref R1, ref R2, ref R3);

        R0 ^= _expKey[++keyCtr];
        R1 ^= _expKey[++keyCtr];
        R2 ^= _expKey[++keyCtr];
        R3 ^= _expKey[++keyCtr];
        Sb4(ref R0, ref R1, ref R2, ref R3);
        LinearTransform(ref R0, ref R1, ref R2, ref R3);

        R0 ^= _expKey[++keyCtr];
        R1 ^= _expKey[++keyCtr];
        R2 ^= _expKey[++keyCtr];
        R3 ^= _expKey[++keyCtr];
        Sb5(ref R0, ref R1, ref R2, ref R3);
        LinearTransform(ref R0, ref R1, ref R2, ref R3);

        R0 ^= _expKey[++keyCtr];
        R1 ^= _expKey[++keyCtr];
        R2 ^= _expKey[++keyCtr];
        R3 ^= _expKey[++keyCtr];
        Sb6(ref R0, ref R1, ref R2, ref R3);
        LinearTransform(ref R0, ref R1, ref R2, ref R3);

        R0 ^= _expKey[++keyCtr];
        R1 ^= _expKey[++keyCtr];
        R2 ^= _expKey[++keyCtr];
        R3 ^= _expKey[++keyCtr];
        Sb7(ref R0, ref R1, ref R2, ref R3);

        // skip on last block
        if (keyCtr != LRD)
            LinearTransform(ref R0, ref R1, ref R2, ref R3);
    }
    while (keyCtr != LRD);

    // last round
    IntUtils.Be32ToBytes(_expKey[++keyCtr] ^ R0, Output, OutOffset + 12);
    IntUtils.Be32ToBytes(_expKey[++keyCtr] ^ R1, Output, OutOffset + 8);
    IntUtils.Be32ToBytes(_expKey[++keyCtr] ^ R2, Output, OutOffset + 4);
    IntUtils.Be32ToBytes(_expKey[++keyCtr] ^ R3, Output, OutOffset);
}

THX (Twofish)

This is an interesting cipher, with methods like a keyed S-Box and a complex algebraic description, quite a bit different from Sepent or Rijndael. As such, it required more consideration in how an extended key size could be implemented. What I did was similar to the other ciphers; to use patterns from the existing function, and extend those patterns in a way that best leverages the larger cipher key while maintaining the consistency of the original design.

The Twofish encryption transform:

private void Encrypt16(byte[] Input, int InOffset, byte[] Output, int OutOffset)
{
    int LRD = _expKey.Length - 1;
    int keyCtr = 0;
    UInt32 X0 = IntUtils.BytesToLe32(Input, InOffset) ^ _expKey[keyCtr];
    UInt32 X1 = IntUtils.BytesToLe32(Input, InOffset + 4) ^ _expKey[++keyCtr];
    UInt32 X2 = IntUtils.BytesToLe32(Input, InOffset + 8) ^ _expKey[++keyCtr];
    UInt32 X3 = IntUtils.BytesToLe32(Input, InOffset + 12) ^ _expKey[++keyCtr];
    UInt32 T0, T1;

    keyCtr = 7;
    do
    {
        T0 = Fe0(X0);
        T1 = Fe3(X1);
        X2 ^= T0 + T1 + _expKey[++keyCtr];
        X2 = (X2 >> 1) | X2 << 31;
        X3 = (X3 << 1 | (X3 >> 31)) ^ (T0 + 2 * T1 + _expKey[++keyCtr]);

        T0 = Fe0(X2);
        T1 = Fe3(X3);
        X0 ^= T0 + T1 + _expKey[++keyCtr];
        X0 = (X0 >> 1) | X0 << 31;
        X1 = (X1 << 1 | (X1 >> 31)) ^ (T0 + 2 * T1 + _expKey[++keyCtr]);

    } while (keyCtr != LRD);

    keyCtr = 4;
    IntUtils.Le32ToBytes(X2 ^ _expKey[keyCtr], Output, OutOffset);
    IntUtils.Le32ToBytes(X3 ^ _expKey[++keyCtr], Output, OutOffset + 4);
    IntUtils.Le32ToBytes(X0 ^ _expKey[++keyCtr], Output, OutOffset + 8);
    IntUtils.Le32ToBytes(X1 ^ _expKey[++keyCtr], Output, OutOffset + 12);
}

Recommendations

The first recommendation would be to use CEX++, rather than this library. The C++ version of the library is much faster (20x or more in many cases), better writting, more secure, more options..

That version of the library began when this ended; I now consider this library as a part of the learning curve, and the C++ version as a serious effort towards delivering a flexible and performant post-quantum secure cryptography library. Eventually, time permitting, this library will be re-written based on CEX++, but for now, all my efforts are going into that project.

Use RHX with the extended key sizes (HKDF secure mode), and increase the number of rounds.. I can understand the reluctance of some people when it comes to something new in this context, but in my opinion, RHX is many orders of magnitude more powerful than standard AES. Changing the key schedule from a weak PRF to a cryptographically strong DRBG, is a natural progression, and should ensure the integrity of the cipher for a long time to come. The same can be said for the other HX ciphers, the longer user supplied key, combined with increased rounds assignments, creates a much stronger cipher. The digest used in the KDF is assignable, so one can use a hashing algorithm that is more or less immune to timing attacks, and also eliminates vulnerabilities of weak and related key attacks.

If you are encrypting files, use the CipherStream class. The example demonstrates using a KeyHeaderStruct to construct a cipher instance via that class, I believe that framework is the best use of the library. Some people may be tempted to pull encryption classes from the library and use them as stand alone instances in their projects, this creates a weaker system. The MessageHeader provides only two pieces of information to an attacker; a random Key id, and an encrypted file extension. The attacker does not know what encryption algorithm was used, how many rounds, the KDF engine, padding, mode, mac size.. etc. Essentially, an attacker knows nothing useful with this system, and given that encryption may have been performed with so many unknown variables, this makes it far more difficult to cryptanalyze a message file.

With the HX ciphers in extended mode, use key sizes that align with the HKDF digest engines hash return sizes. The recommended size is (the digest functions input block size). This ensures an adequate amount of entropy is used to seed HKDF.

There is no need to use a 512 bit digest with HKDF in this context, a 256 bit digest like Skein or Blake is more than adequate.

Recommended minimum key sizes for HX ciphers in Extended mode:

Digest	Minimum	Recommended
Blake-256	32 Bytes	64 Bytes
Blake-512	64 Bytes	128 Bytes
Keccak 512	64 Bytes	128 Bytes
SHA-256	32 Bytes	64 Bytes
SHA-512	64 Bytes	128 Bytes
Skein-256	32 Bytes	64 Bytes
Skein-512	64 Bytes	128 Bytes

Recommended Round Counts for HX ciphers:

Cipher	Rounds
RHX	22
SHX	40
THX	20

I would also recommend using the KeyGenerator class to generate keys and vectors in an implementation. If the user is given the options of which seed generator and Digest engine to use when creating a key, it adds more unknowns to any cryptanalysis involving key prediction, while creating cryptographically strong keying material with a two-stage PRNG/Digest engine.

Tests

There are a number of different tests included in the project used to verify the validity of the RDX, SPX, TFX implementations in standard configurations (128, 192, and 256 bit keys). There are also I/O, mode equality, padding, HMAC, HKDF, SHA KAT tests, and performance comparisons;

AESAVS Known answer tests; the output from a transformation is known, given set parameters of key, iv, or plaintext. The full plaintext and key vectors from the AESAVS set, used for certifying an AES implementation are used, 960 vector tests in total.
AES monte carlo tests; defined in the AES specification Fips 197, and the vectors created by Brian Gladman.
Blake Vector KATs from the Blake SHA-3 submission package; tests 256/512 digests.
ChaCha and Salsa20; KAT tests from the bouncy castle jdk 1.51 implementation for both ChaCha and Salsa20.
CMAC; Test vectors from RFC 4493.
HKDF; the set of known answer tests from the HKDF RFC 5869 are used to test the HKDF implementation.
HMAC; Known answer tests from RFC 4231 are used to test the HMAC implementation.
Keccak; Vectors from the Bouncy Castle jdk 1.51 SHA-3 tests including Nist vectors.
Cipher Modes; The full set of vectors for ECB, CBC, CFB, OFB and CTR modes from Nist SP800-38A are tested.
PSC Equality; The output from parallel modes is compared with output from a standard mode for equality.
I/O; Tests cipher initialization and processing methods.
Rijndael Vector; known answer testing 256 bit block size using test vectors derived from Bouncy Castle RijndaelTest.cs and the Nessie unverified vectors.
SecureRandom; Tests the SecureRandom access methods and return ranges.
Serpent Vector; The complete set of Nessie verified vectors including 100 and 1000 round Monte Carlo tests, 2865 vectors.
SHA-2; Vector: KAT tests used in the NIST SHA test vectors document supplement.
Skein; Tests the 256, 512, and 1024 bit versions of Skein against known test vectors.
Twofish; Official Twofish Vector KAT tests, including 60 thousand rounds of Monte Carlo test
VMPCMAC; Vector test used by Bouncy Castle.

Guiding Publications

SHA-3 proposal BLAKE: Jean-Philippe Aumasson, Luca Henzen, Willi Meier, and Raphael C.-W. Phan
Cryptographic Secure Pseudo-Random Bits Generation: Pascal Junod
ChaCha, a variant of Salsa20: Daniel J. Bernstein
Fips 140-2 Security Requirements for Cryptographic Modules: NIST
Fips 180-4: Secure Hash Standard (SHS): NIST
Fips 197: Advanced Encryption Standard (AES): NIST
Fips 198-1: The Keyed-Hash Message Authentication Code (HMAC): NIST
Handbook of Applied Cryptography: Alfred J. Menezes, Paul C. van Oorschot and Scott A. Vanstone
The Keccak SHA-3 submission: Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles Van Assche
IR7896: Third-Round Report of the SHA-3 Cryptographic Hash Algorithm Competition: NIST
RFC 2104: Keyed-Hashing for Message Authentication (HMAC): H. Krawczyk, M. Bellare, and R. Canetti
RFC 4086: Randomness Requirements for Security: D. Eastlake, J. Schiller, and S. Crocker
RFC 4493: The AES-CMAC Algorithm: JH. Song, R. Poovendran, J. Lee, and T. Iwata
RFC 5652: Cryptographic Message Syntax (CMS): R. Housley
RFC 5869: HMAC-based Extract-and-Expand Key Derivation Function (HKDF): H. Krawczyk, and P. Eronen
AES Proposal: Rijndael: Joan Daemen, Vincent Rijmen
Salsa20 Design: Daniel J. Bernstein
Salsa20 Security: Daniel J. Bernstein
Security Bounds for the NIST Codebook-based Deterministic Random Bit Generator: Matthew J. Campagna
Serpent: A Proposal for the Advanced Encryption Standard: Ross Anderson, Eli Biham, and Lars Knudsen
Keccak and the SHA-3 Standardization: NIST
The Skein Hash Function Family: Niels Ferguson, Stefan Lucks, Bruce Schneier, Doug Whiting, Mihir Bellare, Tadayoshi Kohno, Jon Callas, and Jesse Walker
SP800-22A: A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications: NIST
SP800-38A: Recommendation for Block Cipher Modes of Operation: NIST
SP800-38B: Recommendation for Block Cipher Modes of Operation: The CMAC Mode for Authentication: NIST
SP800-88R1: Guidelines for Media Sanitization: NIST
SP800-90A: Recommendation for the Entropy Sources Used for Random Bit Generation: NIST
SP800-90B: Recommendation for the Entropy Sources Used for Random Bit Generation: NIST
Twofish: A 128-Bit Block Cipher: Bruce Schneier, John Kelsey, Doug Whiting, David Wagner, Chris Hall, and Niels Ferguson
VMPC One-Way Function and Stream Cipher: Bartosz Zoltak
VMPC-MAC: A Stream Cipher Based Authenticated Encryption Scheme: Bartosz Zoltak
Merkle Signatures with Virtually Unlimited Signature Capacity
The McEliece Cryptosystem Resists Quantum Fourier Sampling Attacks
Attacking and defending the McEliece cryptosystem
A CCA2 Secure Variant of the McEliece Cryptosystem
New Approach for CCA2-Secure Post-Quantum Cryptosystem Using Knapsack Problem
McBits: fast constant-time code-based cryptography
Selecting Parameters for Secure McEliece-based Cryptosystems
Optimizations for NTRU
Implementation Aspects of NTRUEncrypt and NTRUSign
Practical lattice-based cryptography: NTRUEncrypt and NTRUSign
Choosing NTRUEncrypt Parameters in Light of Combined Lattice Reduction and MITM Approaches
The NTRU Cryptosystem: Implementation and Comparative Analysis
NTRU Cryptosystems Technical Report
Selecting Parameters for the Rainbow Signature Scheme
Efficient Software Implementation of Ring-LWE Encryption
Compact Ring-LWE Cryptoprocessor
A Simple Provably Secure Key Exchange Scheme Based on the Learning with Errors Problem
The Knuth-Yao Quadrangle-Inequality Speedup is a Consequence of Total-Monotonicity
A Toolkit for Ring-LWE Cryptography

License

The license for the NTRU implementation and the DTM-KEX is GPLv3, the rest of the library including the cipher implementations is MIT.

Conclusion

In the spring of 1946 work on the ENIAC computer was completed. Newspapers around the globe heralded it as the “Super Brain” and the “Einstein Machine”. It was the most powerful computer ever built; and had more than 17,000 vacuum tubes, 7,200 crystal diodes, 1,500 relays, 70,000 resistors, 10,000 capacitors and around 5 million hand-soldered joints. It weighed 30 tons, took up 1800 square feet, and consumed 150 kW of power. It was capable of calculating an astounding 385 multiplication operations per second.

Imagine that you were one of the designers, out for a few pints with fellow engineers and scientists, and you proposed that in just 25 years, anyone could walk into a Sears store and buy a 10 dollar portable calculator with thousands of times the computational power and speed. I think you would have been greeted with much skepticism; ‘impossible’, ‘infeasible’, ‘transistors and circuit pathways cannot be made that small’.. and you would have been subjected to a barrage of scientific theories positing that such a thing could never happen.. at least, not for a hundred years or so..

In a recent article on wired, John Martinis, one of the foremost experts on quantum computers, states that one of the objectives of the new Google Quantum AI lab, is to double the number of qubits each year. Recently another breakthrough in how quantum states are measured could prove to be 350 times faster than current methods.. breakthroughs of this kind are happening ever more frequently as our understanding of quantum processes continues to grow. So, at this ever accelerating rate, how long will it be before computers exist that will be capable of the enormous processing power required to break current encryption technology? It is impossible to say with any certainty, but a major breakthrough could put this in reach, possibly much sooner than expected. So when you hear of the improbability of breaking AES, remember the ENIAC, and consider how far we have advanced technology in the last 100 years.

There is the argument against stronger encryption, often linked to the idea that state agencies are developing a mass surveillance apparatus for our own protection, that facilities like this one in Utah, will be used only to target criminals and terrorists, and that strong encryption hampers their efforts. I think most people understand that this is not strictly the case, that the technology could eventually be used in some system of 'people control', and that these agencies intentions are at best unclear. Should a mass surveillance apparatus of this scale ever fall into the hands of tyrants, it would become a devastating tool of oppression.

Besides which, the people these agencies propose to target like criminals and terrorists; the worst of which don't use electronic communications at all, or they have developed effective evasion strategies, and have access to unbreakable and undetectable methods like One Time Pads and steganography.

I think that we have advanced so quickly over the past 100 years because we are living in an age of unparalleled individual rights and freedoms, freedom to express our ideas and communicate them without fear of interference or reprisal. This has been a chief driver in the forward progression of our societies, and these freedoms need to be preserved if we are to maintain that forward momentum, and hopefully, create a better world for future generations. Encryption technologies play a pivotal role in that future, and I believe we should be striving towards technologies that protect information for the full measure of a human lifetime, that all forms of electronic communication should incorporate strong encryption technology as a matter of standard, and that these technologies should constantly be compared to, and evolved against the projected rate of technological change.

So.. hope you enjoyed the article, leave a comment, or if it's technical, you can email me through this, or my site.

Cheers,

John