Download source - 8.4 KB

Introduction

If you’ve seen the Minority Report movie, you probably remember the scene where Tom Cruise walks into a Gap store. A retinal scanner reads his eyes, and plays a customized ad for him. Well, this is 2020. We don’t need retinal scanners, because we have Artificial Intelligence (AI) and Machine Learning (ML)!

In this series, we’ll show you how to use Deep Learning to perform facial recognition, and then – based on the face that was recognized – use a Neural Network Text-to-Speech (TTS) engine to play a customized ad. You are welcome to browse the code here on CodeProject or download the .zip file to browse the code on your own machine.

We assume that you are familiar with the basic concepts of AI/ML, and that you can find your way around Python.

The series is built of five articles:

Get a Dataset

In the previous article, we described the process of detecting faces in an image. Now that we know how to obtain a cropped face image from a larger picture or a video, let’s assume that we’ve gone through this exercise and ended up with a dataset (face set) to train our CNN on. Before training, however, we need to process this dataset to categorize and normalize the data. In this article, we’ll create a dataset parser/processor and run it on the Yale Face dataset, which contains 165 grayscale images of 15 different people. This dataset is small but sufficient for our purpose – learning.

Prepare a Parser

The dataset parser will reside in two classes – an abstract and more general one, and one handling specifics of the selected dataset. Let’s look at the constructor of the parent class.

class FaceDataSet(metaclass=abc.ABCMeta):

    def __init__(self, path,  extension_list, n_classes):
        self.path = path
        self.ext_list = extension_list
        self.n_classes = n_classes
        self.objects = []
        self.labels = []
        self.obj_validation = []
        self.labels_validation = []
        self.number_labels = 0

The constructor parameters are:

path: the path to the folder containing dataset samples (images)
extension_list: extensions of files to look for in the path-defined folder (one or more)
n_classes: the number of classes to categorize the dataset into; for the Yale dataset, this will be 15 because this is the number of people in the dataset

We also create the next class objects:

objects: the images to use for CNN training
labels: the labels (subject numbers) that classify the images (objects)
obj_validation: a subset of the images used to validate the CNN after training
labels_validation: classifiers (labels) for the obj_validation list
number_labels: the total number of labels in the dataset

The get_data() method is the one we’ll call after instantiating the FaceDataSet class.

def get_data(self):
    img_path_list = os.listdir(self.path)
    self.objects, self.labels = self.fetch_img_path(img_path_list, self.path, vgg_img_processing)
    self.process_data(vgg_img_processing)
    self.print_dataSet()

The method is composed of two main calls: fetching the images from the defined path and processing them. To fetch the images, we loop through the files in the path-defined folder. We then use SK-Image to load these files as grayscale images. This call returns a NumPy array containing every pixel in the image.

def fetch_img_path(self, img_path_list, path, vgg_img_processing):
    images = []
    labels = []
    for img_path in img_path_list:
        if self.__check_ext(img_path):
            img_abs_path = os.path.abspath(os.path.join(path, img_path))
            image = io.imread(img_abs_path, as_gray=True)
            label = self.process_label(img_path)
            images.append(image)
            labels.append(label)
    return images, labels

def __check_ext(self, file_path):
    for ext in self.ext_list:
        if file_path.endswith(ext):
            return True
    return False

process_label() is an abstract method in the FaceDataSet class; its implementation happens in the YaleDataSet class, where we parse the name of the image file from the dataset. The file names are in the "subjectXX.*" format. The method extracts the "XX" number from the file name and assigns it to the image.

class YaleFaceDataSet(FaceDataSet):

    def __init__(self, path, ext_list, n_classes):
        super().__init__(path, ext_list, n_classes)
    def process_label(self, img_path):
        val = int(os.path.split(img_path)[1].split(".")[0].replace("subject", "")) - 1
        if val not in self.labels:
            self.number_labels+=1
        return val

Finally, the process_data() method looks like this:

def split_training_set(self):
    return train_test_split(self.objects, self.labels, test_size=0.3,
                            random_state=random.randint(0, 100))

def process_data(self, vgg_img_processing):
    self.objects, self.img_obj_validation, self.labels, self.img_labels_validation = \
        self.split_training_set()
    self.labels = np_utils.to_categorical(self.labels, self.n_classes)
    self.labels_validation = np_utils.to_categorical(self.img_labels_validation, self.n_classes)

    self.objects = Common.reshape_transform_data(self.objects)
    self.obj_validation =   Common.reshape_transform_data(self.img_obj_validation)

In this method, we split the dataset into two parts. The second part contains images for validation of the training results. We use the train_test_split() method from Scikit-Learn, and we transform the labels into categorical variables. If an image has classification "2" (from subject02), its categorical variable will be [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0] – a vector of 15th dimension (number of classes) with 1 in the 2nd component.

class Common:

    @staticmethod
    def reshape_transform_data(data):
        data = numpy.array(data)
        result = Common.reshape_data(data)
        return Common.to_float(result)

    @staticmethod
    def reshape_data(data):
        return data.reshape(data.shape[0], constant.IMG_WIDTH, constant.IMG_HEIGHT, 1)

    @staticmethod
    def to_float(value):
        return value.astype('float32')/255

The reshape_transform_data() method reshapes the data to fit the grayscale mode. In image processing, color images are considered as 3-channel grids; in other words, they are divided into 3 colors (RGB). Gray images have only one channel. Therefore, the initially color images need to be reshaped with "1" at the end.

The to_float() method normalizes the data by dividing each pixel value by 255 (pixel values are between 0 and 255), which takes the entire pixel matrix to 0-1 space for better numerical input and faster convergence. Now we can set up our dataset in the main.py file, which will serve as the entry point of our application.

ext_list = ['gif', 'centerlight', 'glasses', 'happy', 'sad', 'leflight',
            'wink', 'noglasses', 'normal', 'sleepy', 'surprised', 'rightlight']
n_classes = 15
# Set up dataSet
dataSet = YaleFaceDataSet(constant.FACE_DATA_PATH, ext_list, n_classes)

Categorize the Dataset

...

Next Step?

Now we have a processed, categorized dataset ready to be used for CNN training. In the next article, we’ll put together our CNN and train it for face recognition. Stay tuned!