Download source - 121.6 KB

In this series of articles, we’ll show you how to use a Deep Neural Network (DNN) to estimate a person’s age from an image.

Having designed and built the CNN model for age estimation, in this article – the fifth of the series – we are going to train that model to classify people in the images into the appropriate age groups.

We need to implement the following functionality:

Preprocess images to satisfy the network’s input criteria
Load images from the files into memory
Convert the data to the format acceptable for the model optimization
Launch the training process

Prepare Images to Serve as Input

The way we designed our CNN it expects input data to consist of gray (one-channel, 8-bit) images sized 128 x 128 pixels. Now we need to provide some conversion functionality to preprocess the original (color) images into the valid input format. Here is the Python code that defines two classes for implementing the conversion functionality:

import cv2
class ResizeConverter:
    def __init__(self, width, height):
        self.width = width
        self.height = height
 
    def convert(self, image):
        return cv2.resize(image, (self.width, self.height), cv2.INTER_AREA)
	
class GrayConverter:
    def convert(self, image):
        return cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

The ResizeConverter class is intended to resize images to the specified width and height. Note that we first imported the OpenCV package cv2. This package includes all the functions we need to work with image data. The resize method of this class uses cv2.resize with the specified parameter values. The cv2.INTER_AREA interpolation type is the recommended algorithm for image shrinking.

The GrayConverter class has only one method; it converts a color image to the 8-bit one-channel gray format, which is specified by the cv2.COLOR_BGR2GRAY value.

Define Image Loading Process

Having implemented the converter classes, we can now implement the dataset class for loading images into memory:

import os
import numpy as np
import cv2

class ImageDataset:
    def __init__(self, converters):
        self.converters = converters
    
    def get_files(self, folder):
        filenames = os.listdir(folder)
        for filename in filenames:
            filepath = os.path.join(folder, filename)
        	yield filepath
    
    def load(self, folder):
        self.images = []
        self.labels = []
        files = list(self.get_files(folder))
        for (i, path) in enumerate(files):
            image = cv2.imread(path)
            fname = os.path.basename(path)
            label = fname.split('_')[0]
            if self.converters is not None:
                for c in self.converters:
                    image = c.convert(image)
        	self.images.append(image)
        	self.labels.append(int(label))
        	
    def get_data(self):
        return (np.array(self.images), np.array(self.labels))

The class constructor receives one parameter – a set of converters to be used for image preprocessing. The main method, load, requires one parameter – the full path to the folder with image files. This method finds all the files inside the directory, reads the image from each file using the cv2.imread function, and then applies all the converters to the image. It also parses the age labels from the file names and stores them as integer values. See the second article of the series for description of the file name syntax.

Convert Data to Optimizable Format

The last step before CNN model training is to convert the loaded images to a special format. This is achieved with the convert method of the AgeClassConverter class:

import numpy as np
from keras.preprocessing.image import img_to_array
from sklearn.preprocessing import LabelBinarizer
 
class AgeClassConverter:
    @staticmethod
    def convert(imdataset, ageranges):
        (images, labels) = imdataset.get_data()
        arrays = []
        for (i, image) in enumerate(images):
            arr = img_to_array(image, data_format="channels_last")
            arrays.append(arr)
        arrays = np.array(arrays).astype("float")/255.0 
        
        k = len(ageranges)
        for (i, label) in enumerate(labels):
            for (j, r) in enumerate(ageranges):
                if j<(k-1) and label>=ageranges[j] and label<ageranges[j+1]:
                    labels[i] = j+1
                    break
        
        lb = LabelBinarizer()
        lb.fit(range(1, k));
        binlabels = np.array(lb.transform(labels))
        
        return (arrays, binlabels)

The first parameter of the convert method is an instance of the ImageDataset class. The second parameter is the list of age values to form the ranges for the age groups. The first loop in the method loops over the images in the dataset and converts every image to a special Keras array format using the img_to_array function. Note that we specify the data format as channels_last. It is assumed that the channels of the images adhere to the spatial dimensions – width and height. After the loop, we normalize the data to the [0, 1.0] range, dividing the values by 255.0.

The second loop of the method converts the integer age values found in the labels to the age groups. For example, suppose we call the method with the following age range values [1, 6, 11, 16, 19, 22, 31, 45, 61, 81, 101]. There are eleven values, which provide ten age intervals: 1-5, 6-10, 11-15, …, 81-100. If the dataset contains five labels with the age values [2, 6, 8, 15, 21], the loop will transform these values to the group indicators [1, 2, 2, 3, 5].

After the loop, we use the LabelBinarizer class, imported from the sklearn.preprocessing package, to convert label values to a special binary format used for classification problems. Instead of a single value for a label (age group), this format provides probability values for all possible classes. For example, conversion of the five label values from our example above would result in the following binarized data:

[ [1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 1, 0, 0, 0, 0, 0] ]

As you can see, the binarized data contains an array of probabilities for all age groups. As we have ten age groups, every probability array has ten values: zero value if the age is not in the range, and unit value if the age label belongs to this group.

Load Images

Now we have the code for all classes to load the dataset into memory:

ageranges = [1, 6, 11, 16, 19, 22, 31, 45, 61, 81, 101]
classes = len(ageranges)-1
imgsize = 128
rc = ResizeConverter(imgsize, imgsize)
gc = GrayConverter()
 
trainSet = ImageDataset([rc, gc])
trainSet.load(r"C:\Faces\Training")
(trainData, trainLabels) = AgeClassConverter.convert(trainSet, ageranges)
 
testSet = ImageDataset([rc, gc])
testSet.load(r"C:\Faces\Testing")
(testData, testLabels) = AgeClassConverter.convert(testSet, ageranges)

In the above code, we assign values to the age range list and image size. Then we instantiate the converters for image resizing and color conversion. The converter list is used as the parameter for constructing the training and testing datasets. The datasets are loaded from the disk by the load method with the path to the image files’ directory. Finally, we call the static AgeClassConverter.convert method to convert our dataset to the format acceptable for the Keras optimization algorithms.

Train the Model

We are now ready to launch the training process:

frep = net.fit(trainData, trainLabels, validation_data=(testData, testLabels), batch_size=128, epochs=20, verbose=1)
netname = r"C:\Faces\age_class_net_"+str(kernels)+"_"+str(hidden)+".cnn"
net.save(netname)

The net notation stands for our CNN model, which we’ve instantiated a couple of articles back. We call its fit method to launch the optimization process. The method parameters are:

trainData and trainLabels are the training data and binarized labels, respectively
validation_data is the tuple of the testing data and labels
batch_size is the size of batches for the selected SGD optimization method
epochs is the number of epochs (iterations over the full dataset) for the training process
verbose is the level of the information shown during the process

Executing the code will launch the model training process. Note that the process can take several hours to finish with an average CPU. During execution, information about the process iterations is shown in the output. It looks like this:

Train on 21318 samples, validate on 2369 samples
Epoch 1/20
21318/21318 [==============================] - 1146s 54ms/step - loss: 1.7091 - accuracy: 0.4166 - val_loss: 2.2536 - val_accuracy: 0.0912
Epoch 2/20
21318/21318 [==============================] - 1124s 53ms/step - loss: 1.3156 - accuracy: 0.5058 - val_loss: 1.6474 - val_accuracy: 0.4116
Epoch 3/20
21318/21318 [==============================] - 1118s 52ms/step - loss: 1.2010 - accuracy: 0.5439 - val_loss: 1.2562 - val_accuracy: 0.5230

There are two values you need to pay attention to: accuracy and val_accuracy. The former is the precision of the classification on the training dataset, and the latter is the precision of the age group prediction on the testing dataset. As you can see in the sample output above, the last val_accuracy value is 0.5222. This means that at this step our CNN correctly predicts the age group for 52% of the images in the testing dataset. We should keep track of these values to be sure that the optimization process converges. The ideal case is when both values monotonically increase to the value of 1.0.

After the specified number of epochs, the process will stop, and the CNN model will be saved to the disk. The final testing accuracy we reached in our example is about 56%. The prediction accuracy can be increased with the various methods, such as using bigger datasets and deeper network architecture, regularization, data augmentation, and so on.

Next Step

We now have the pre-trained CNN saved to the disk. The next step is to use it for age estimation of a person from an image.

Age Estimation With Deep Learning: Training CNN