Input Normalization as Separate Layer in CNTK with C#

Bahrudin Hrnjica

5.00/5 (2 votes)

Jul 13, 2018

CPOL

3 min read

11027

How to implement data normalization as regular neural network layer, which can simply training process and data preparation

In the previous post, we have seen how to calculate some of the basic parameters of descriptive statistics, as well as how to normalize data by calculating mean and standard deviation. In this blog post, we are going to implement data normalization as regular neural network layer, which can simplify the training process and data preparation.

What is Data Normalization?

Simply said, data normalization is a set of tasks which transform values of any feature in a data set into predefined number range. Usually, this range is [-1,1] , [0,1] or some other specific ranges. Data normalization plays a very important role in ML, since it can dramatically improve the training process, and simplify settings of network parameters.

There are two main types of data normalization:

MinMax normalization – which transforms all values into range of [0,1],
Gauss Normalization or Z score normalization, which transforms the value in such a way that the average value is zero, and std is 1.

Beside those types, there are plenty of other methods which can be used. Usually, those two are used when the size of the data set is known, otherwise we should use some of the other methods, like log scaling, dividing every value with some constant, etc. But why data needs to be normalized? This is the essential question in ML, and the simplest answer is to provide the equal influence to all features to change the output label. More about data normalization and scaling can be found at this link.

In this blog post, we are going to implement CNTK neural network which contains a “Normalization layer” between input and first hidden layer. The schematic picture of the network looks like the following image:

As can be observed, the Normalization layer is placed between input and first hidden layer. Also, the Normalization layer contains the same neurons as input layer and produced the output with the same dimension as the input layer.

In order to implement Normalization layer, the following requirements must be met:

calculate average $\mu$ and standard deviation $\sigma$ in training data set as well as find maximum and minimum value of each feature.
this must be done prior to neural network model creation, since we need those values in the normalization layer.
within network model creation, the normalization layer should be defined after input layer is defined.

Calculation of Mean and Standard Deviation for Training Data Set

Before network creation, we should prepare mean and standard deviation parameters which will be used in the Normalization layer as constants. Hopefully, the CNTK has the static method in the Minibatch source class for this purpose “MinibatchSource.ComputeInputPerDimMeansAndInvStdDevs”. The method takes the whole training data set defined in the minibatch and calculates the parameters.

//calculate mean and std for the minibatchsource
// prepare the training data
var d = new DictionaryNDArrayView, NDArrayView>>();
using (var mbs = MinibatchSource.TextFormatMinibatchSource(
trainingDataPath , streamConfig, MinibatchSource.FullDataSweep,false))
{
d.Add(mbs.StreamInfo("feature"), new Tuple(null, null));
//compute mean and standard deviation of the population for inputs variables
MinibatchSource.ComputeInputPerDimMeansAndInvStdDevs(mbs, d, device);
}

Now that we have average and std values for each feature, we can create network with normalization layer. In this example, we define simple feed forward NN with 1 input, 1 normalization, 1 hidden and 1 output layer.

private static Function createFFModelWithNormalizationLayer
(Variable feature, int hiddenDim,int outputDim, Tuple<NDArrayView, NDArrayView> avgStdConstants, 
DeviceDescriptor device)
{
    //First the parameters initialization must be performed
    var glorotInit = CNTKLib.GlorotUniformInitializer(
    CNTKLib.DefaultParamInitScale,
    CNTKLib.SentinelValueForInferParamInitRank,
    CNTKLib.SentinelValueForInferParamInitRank, 1);

    //*******Input layer is indicated as feature
    var inputLayer = feature;

    //*******Normalization layer
    var mean = new Constant(avgStdConstants.Item1, "mean");
    var std = new Constant(avgStdConstants.Item2, "std");
    var normalizedLayer = CNTKLib.PerDimMeanVarianceNormalize(inputLayer, mean, std);

    //*****hidden layer creation
    //shape of one hidden layer should be inputDim x neuronCount
    var shape = new int[] { hiddenDim, 4 };
    var weightParam = new Parameter(shape, DataType.Float, glorotInit, device, "wh");
    var biasParam = new Parameter(new NDShape(1, hiddenDim), 0, device, "bh");
    var hidLay = CNTKLib.Times(weightParam, normalizedLayer) + biasParam;
    var hidLayerAct = CNTKLib.ReLU(hidLay);

    //******Output layer creation
    //the last action is creation of the output layer
    var shapeOut = new int[] { 3, hiddenDim };
    var wParamOut = new Parameter(shapeOut, DataType.Float, glorotInit, device, "wo");
    var bParamOut = new Parameter(new NDShape(1, 3), 0, device, "bo");
    var outLay = CNTKLib.Times(wParamOut, hidLayerAct) + bParamOut;
    
    return outLay;
}

Complete Source Code Example

The whole source code about this example is listed below. The example shows how to normalize input feature for Iris famous data set. Notice that when using such way of data normalization, we don’t need to handle normalization for validation or testing data sets, because data normalization is part of the network model.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using CNTK;
namespace NormalizationLayerDemo
{
    class Program
    {
        static string trainingDataPath = "./data/iris_training.txt";
        static string validationDataPath = "./data/iris_validation.txt";
        static void Main(string[] args)
        {
            DeviceDescriptor device = DeviceDescriptor.UseDefaultDevice();

            //stream configuration to distinct features and labels in the file
            var streamConfig = new StreamConfiguration[]
               {
                   new StreamConfiguration("feature", 4),
                   new StreamConfiguration("flower", 3)
               };

            // build a NN model
            //define input and output variable and connecting to the stream configuration
            var feature = Variable.InputVariable(new NDShape(1, 4), DataType.Float, "feature");
            var label = Variable.InputVariable(new NDShape(1, 3), DataType.Float, "flower");

            //calculate mean and std for the minibatchsource
            // prepare the training data
            var d = new Dictionary<StreamInformation, Tuple>();
            using (var mbs = MinibatchSource.TextFormatMinibatchSource(
               trainingDataPath , streamConfig, MinibatchSource.FullDataSweep,false))
            {
                d.Add(mbs.StreamInfo("feature"), new Tuple(null, null));
                //compute mean and standard deviation of the population for inputs variables
                MinibatchSource.ComputeInputPerDimMeansAndInvStdDevs(mbs, d, device);

            }

            //Build simple Feed Froward Neural Network with normalization layer
            var ffnn_model = createFFModelWithNormalizationLayer
                                    (feature,5,3,d.ElementAt(0).Value, device);

            //Loss and error functions definition
            var trainingLoss = CNTKLib.CrossEntropyWithSoftmax
            (new Variable(ffnn_model), label, "lossFunction");
            var classError = CNTKLib.ClassificationError
            (new Variable(ffnn_model), label, "classificationError");

            // set learning rate for the network
            var learningRatePerSample = new TrainingParameterScheduleDouble(0.01, 1);

            //define learners for the NN model
            var ll = Learner.SGDLearner(ffnn_model.Parameters(), learningRatePerSample);

            //define trainer based on model, loss and error functions , and SGD learner
            var trainer = Trainer.CreateTrainer
                          (ffnn_model, trainingLoss, classError, new Learner[] { ll });

            //Preparation for the iterative learning process

            // create minibatch for training
            var mbsTraining = MinibatchSource.TextFormatMinibatchSource
            (trainingDataPath, streamConfig, MinibatchSource.InfinitelyRepeat, true);

            int epoch = 1;
            while (epoch < 20)
            {
                var minibatchData = mbsTraining.GetNextMinibatch(65, device);
                //pass to the trainer the current batch separated by the features and label.
                var arguments = new Dictionary
                {
                    { feature, minibatchData[mbsTraining.StreamInfo("feature")] },
                    { label, minibatchData[mbsTraining.StreamInfo("flower")] }
                };

                trainer.TrainMinibatch(arguments, device);

                //for each epoch report the training process
                if (minibatchData.Values.Any(a => a.sweepEnd))
                {
                    reportTrainingProgress(feature, label, streamConfig, trainer, epoch, device);
                    epoch++;
                }
            }
            Console.Read();
        }

        private static void reportTrainingProgress(Variable feature, Variable label, 
        StreamConfiguration[] streamConfig,  Trainer trainer, int epoch, DeviceDescriptor device)
        {
            // create minibatch for training
            var mbsTrain = MinibatchSource.TextFormatMinibatchSource
            (trainingDataPath, streamConfig, MinibatchSource.FullDataSweep, false);
            var trainD = mbsTrain.GetNextMinibatch(int.MaxValue, device);
            //
            var a1 = new UnorderedMapVariableMinibatchData();
            a1.Add(feature, trainD[mbsTrain.StreamInfo("feature")]);
            a1.Add(label, trainD[mbsTrain.StreamInfo("flower")]);
            var trainEvaluation = trainer.TestMinibatch(a1);

            // create minibatch for validation
            var mbsVal = MinibatchSource.TextFormatMinibatchSource
            (validationDataPath, streamConfig, MinibatchSource.FullDataSweep, false);
            var valD = mbsVal.GetNextMinibatch(int.MaxValue, device);

            //
            var a2 = new UnorderedMapVariableMinibatchData();
            a2.Add(feature, valD[mbsVal.StreamInfo("feature")]);
            a2.Add(label, valD[mbsVal.StreamInfo("flower")]);
            var valEvaluation = trainer.TestMinibatch(a2);

            Console.WriteLine($"Epoch={epoch}, 
            Train Error={trainEvaluation}, Validation Error={valEvaluation}");
        }

        private static Function createFFModelWithNormalizationLayer
        (Variable feature, int hiddenDim,int outputDim, Tuple avgStdConstants, DeviceDescriptor device)
        {
            //First the parameters initialization must be performed
            var glorotInit = CNTKLib.GlorotUniformInitializer(
                    CNTKLib.DefaultParamInitScale,
                    CNTKLib.SentinelValueForInferParamInitRank,
                    CNTKLib.SentinelValueForInferParamInitRank, 1);

            //*******Input layer is indicated as feature
            var inputLayer = feature;

            //*******Normalization layer
            var mean = new Constant(avgStdConstants.Item1, "mean");
            var std = new Constant(avgStdConstants.Item2, "std");
            var normalizedLayer = CNTKLib.PerDimMeanVarianceNormalize(inputLayer, mean, std);

            //*****hidden layer creation
            //shape of one hidden layer should be inputDim x neuronCount
            var shape = new int[] { hiddenDim, 4 };
            var weightParam = new Parameter(shape, DataType.Float, glorotInit, device, "wh");
            var biasParam = new Parameter(new NDShape(1, hiddenDim), 0, device, "bh");
            var hidLay = CNTKLib.Times(weightParam, normalizedLayer) + biasParam;
            var hidLayerAct = CNTKLib.ReLU(hidLay);

            //******Output layer creation
            //the last action is creation of the output layer
            var shapeOut = new int[] { 3, hiddenDim };
            var wParamOut = new Parameter(shapeOut, DataType.Float, glorotInit, device, "wo");
            var bParamOut = new Parameter(new NDShape(1, 3), 0, device, "bo");
            var outLay = CNTKLib.Times(wParamOut, hidLayerAct) + bParamOut;
            return outLay;
        }
    }
}

The output window should look like:

2018-07-13_18-20-58

The data set files used in the example can be downloaded from here.