Introduction

Face recognition is one area of Artificial Intelligence (AI) where deep learning (DL) has had great success over the past decade. The best face recognition systems can recognize people in images and video with the same precision humans can – or even better. The two main base stages of face recognition are person verification and identification.

In the first (current) half of this article series, we will:

Discuss the existing AI face detection methods and develop a program to run a pretrained DNN model
Consider face alignment and implement some alignment algorithms using face landmarks
Run the face detection DNN on a Raspberry Pi device, explore its performance, and consider possible ways to run it faster, as well as to detect faces in real time
Create a simple face database and fill it with faces extracted from images or videos

We assume that you are familiar with DNN, Python, Keras, and TensorFlow. You are welcome to download this project code ...

In the previous article, we discussed the principles of face detection and facial recognition. In this one, we’ll have a look at specific face detection methods and implement one of them.

Face Detection Methods

Face detection is the first phase of any face recognition process. It is a critical step that influences all subsequent steps. It requires a robust approach to minimize the detection error. There are many methods of face detection; we’ll concentrate on AI-based approaches.

We’d like to mention the following modern methods of face detection: Max-Margin Object Detection (MMOD), Single-Shot Detector (SSD), Multi-task Cascaded Convolutional Networks (MTCNN), and You Look Only Once (YOLO).

MMOD models require too many resources to run on an edge device. The fastest DNN is YOLO; it provides a rather good precision while detecting faces in real-scene video. The most precise of the above methods is SSD. It has enough processing speed to be used on low-powered devices.

The main drawback of the YOLO and SSD methods is that they cannot provide information on facial landmarks. As we’ll see further, this information is important for face alignment.

MTCNN provides good precision and finds facial landmarks. It is lightweight enough to run on resource-constrained edge devices.

MTCNN Detector

In this series, we’ll use a free Keras implementation of the MTCNN detector. You can install this library in a Python environment using the standard pip command. It requires OpenCV 4.1 and TensorFlow 2.0 (or later versions).

You can test if the MTCNN is installed successfully by running simple Python code:

import mtcnn

print(mtcnn.__version__)

The output must show the version of the installed library – 0.1.0.

After the library has been installed, we can write MTCNN-based code for a simple face detector:

import os
import time
import numpy as np
import copy
import mtcnn
from mtcnn import MTCNN
import cv2

class MTCNN_Detector:    
    def __init__(self, min_size, min_confidence):
        self.min_size = min_size
        self.f_detector = MTCNN(min_face_size=min_size)
        self.min_confidence = min_confidence
    
    def detect(self, frame):
        faces = self.f_detector.detect_faces(frame)
        
        detected = []
        for (i, face) in enumerate(faces):
            f_conf = face['confidence']
            if f_conf>=self.min_confidence:
                detected.append(face)
        
        return detected
    
    def extract(self, frame, face):
        (x1, y1, w, h) =  face['box']
        (l_eye, r_eye, nose, mouth_l, mouth_r) = Utils.get_keypoints(face)
        
        f_cropped = copy.deepcopy(face)
        move = (-x1, -y1)
        l_eye = Utils.move_point(l_eye, move)
        r_eye = Utils.move_point(r_eye, move)
        nose = Utils.move_point(nose, move)
        mouth_l = Utils.move_point(mouth_l, move)
        mouth_r = Utils.move_point(mouth_r, move)
            
        f_cropped['box'] = (0, 0, w, h)
        f_img = frame[y1:y1+h, x1:x1+w].copy()
            
        f_cropped = Utils.set_keypoints(f_cropped, (l_eye, r_eye, nose, mouth_l, mouth_r))
        
        return (f_cropped, f_img)

The detector class has the constructor with two parameters: min_size – the minimal size of a face in pixels; and min_confidence – the minimal confidence to confirm that the detected object is a face. The detect method of the class uses the internal MTCNN detector to get the faces in a frame, then filters the detected objects that have at least the minimal confidence value. The last method, extract, is intended to crop face images from the frame.

We’ll also need the following Utils class:

class Utils:    
    @staticmethod
    def draw_face(face, color, frame, draw_points=True, draw_rect=True, n_data=None):
        (x1, y1, w, h) =  face['box']
        confidence = face['confidence']
        x2 = x1+w
        y2 = y1+h
        if draw_rect:
            cv2.rectangle(frame, (x1, y1), (x2, y2), color, 1)
        y3 = y1-12
        if not (n_data is None):
            (name, conf) = n_data
            text = name+ (" %.3f" % conf)
        else:
            text = "%.3f" % confidence
        
        cv2.putText(frame, text, (x1, y3), cv2.FONT_HERSHEY_SIMPLEX, 0.6, color, 1, cv2.LINE_AA)
        if draw_points:
            (l_eye, r_eye, nose, mouth_l, mouth_r) = Utils.get_keypoints(face)
            Utils.draw_point(l_eye, color, frame)
            Utils.draw_point(r_eye, color, frame)
            Utils.draw_point(nose, color, frame)
            Utils.draw_point(mouth_l, color, frame)
            Utils.draw_point(mouth_r, color, frame)
        
    @staticmethod
    def get_keypoints(face):
        keypoints = face['keypoints']
        l_eye = keypoints['left_eye']
        r_eye = keypoints['right_eye']
        nose = keypoints['nose']
        mouth_l = keypoints['mouth_left']
        mouth_r = keypoints['mouth_right']
        return (l_eye, r_eye, nose, mouth_l, mouth_r)
    
    def set_keypoints(face, points):
        (l_eye, r_eye, nose, mouth_l, mouth_r) = points
        keypoints = face['keypoints']
        keypoints['left_eye'] = l_eye
        keypoints['right_eye'] = r_eye
        keypoints['nose'] = nose
        keypoints['mouth_left'] = mouth_l
        keypoints['mouth_right'] = mouth_r
        
        return face
        
    @staticmethod
    def move_point(point, move):
        (x, y) = point
        (dx, dy) = move
        res = (x+dx, y+dy)
        return res
        
    @staticmethod
    def draw_point(point, color, frame):
        (x, y) =  point
        x1 = x-1
        y1 = y-1
        x2 = x+1
        y2 = y+1
        cv2.rectangle(frame, (x1, y1), (x2, y2), color, 1)
        
    @staticmethod
    def draw_faces(faces, color, frame, draw_points=True, draw_rect=True, names=None):
        for (i, face) in enumerate(faces):
            n_data = None
            if not (names is None):
                n_data = names[i]
            Utils.draw_face(face, color, frame, draw_points, draw_rect, n_data)

In the output of the MTCNN detector, each face object is a dictionary with the following keys: box, confidence, and keypoints. The keypoints item is a dictionary that contains data for face landmarks: left_eye, right_eye, nose, mouth_left, and mouth_right. The Utils class provides simple access to the face data and implements several functions to manipulate the data and draw bounding boxes around faces in images.

Face Detection in Images

Now we can write Python code that will detect faces in images:

d = MTCNN_Detector(30, 0.5)
print("Detector loaded.")

f_file = r"C:\PI_FR\frames\frame_5_02.png"
fimg = cv2.imread(f_file)

faces = d.detect(fimg)

for face in faces:
	print(face)

Utils.draw_faces(faces, (0, 0, 255), fimg, True, True)

res_path = r"C:\PI_FR\detect"
f_base = os.path.basename(f_file)
r_file = os.path.join(res_path, f_base+"_detected.png")
cv2.imwrite(r_file, fimg)

for (i, face) in enumerate(faces):
	(f_cropped, f_img) = d.extract(fimg, face)
	Utils.draw_faces([f_cropped], (255, 0, 0), f_img, True, False)
	dfname = os.path.join(res_path, f_base + ("_%06d" % i) + ".png")
	cv2.imwrite(dfname, f_img)

A run of the above code produces this image in the detect folder.

As you can see, the detector has found all three faces with good confidence – about 99%. We also get cropped faces in the same directory.

Running the same code for the different frames, we can test detections for the various cases. Here are results for two frames.

The results demonstrate that the detector is able to find faces with glasses and also successfully detects the face of a baby.

Face Detection in Video

Having tested the detector on separate images, let’s now write code for detecting faces in video:

class VideoFD:    
    def __init__(self, detector):
        self.detector = detector
    
    def detect(self, video, save_path = None, align = False, draw_points = False):
        detection_num = 0;
        capture = cv2.VideoCapture(video)
        img = None

        dname = 'AI face detection'
        cv2.namedWindow(dname, cv2.WINDOW_NORMAL)
        cv2.resizeWindow(dname, 960, 720)
        
        frame_count = 0
        dt = 0
        face_num = 0
        # Capture all frames
        while(True):    
            (ret, frame) = capture.read()
            if frame is None:
                break
            frame_count = frame_count+1
            
            t1 = time.time()
            faces = self.detector.detect(frame)
            t2 = time.time()
            p_count = len(faces)
            detection_num += p_count
            dt = dt + (t2-t1)
            
            if (not (save_path is None)) and (len(faces)>0) :
                f_base = os.path.basename(video)
                for (i, face) in enumerate(faces):
                    (f_cropped, f_img) = self.detector.extract(frame, face)
                    if (not (f_img is None)) and (not f_img.size==0):
                        if draw_points:
                            Utils.draw_faces([f_cropped], (255, 0, 0), f_img, draw_points, False)
                        face_num = face_num+1
                        dfname = os.path.join(save_path, f_base + ("_%06d" % face_num) + ".png") 
                        cv2.imwrite(dfname, f_img)
            
            if len(faces)>0:
                Utils.draw_faces(faces, (0, 0, 255), frame)
            
            # Display the resulting frame
            cv2.imshow(dname,frame)
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break
            
        capture.release()
        cv2.destroyAllWindows()    
        
        fps = frame_count/dt
        
        return (detection_num, fps)

The VideoFD class simply wraps our implementation of the MTCNN detector and feeds to it the frames extracted from a video file. It uses the VideoCapture class from the OpenCV library.

We can launch the video detector with the following code:

d = MTCNN_Detector(50, 0.95)
vd = VideoFD(d)
v_file = r"C:\PI_FR\video\5_3.mp4"

save_path = r"C:\PI_FR\detect"
(f_count, fps) = vd.detect(v_file, save_path, False, False)

print("Face detections: "+str(f_count))
print("FPS: "+str(fps))

Here is the resulting video captured from the screen:

The test shows fine results: faces had been detected in most frames from the video file. The processing speed is about 20 FPS on a Core i7 CPU. That’s impressive for a difficult task such as face detection.

Next Steps

Looks like we can use an implementation of the MTCNN detector for real-time video detection. Our final goal is running the detector on a low-power edge device. Before starting experiments with edge devices, we must implement another part of the face recognition pipeline – face alignment. In the next article, we’ll explain how to perform the alignment based on the face landmarks the detector has found. Stay tuned!