How To Use Kinect Face Tracking SDK

nsmoly7

4.69/5 (6 votes)

May 31, 2012

CPOL

7 min read

144460

This article describes how to use Kinect's Face Tracking SDK in your Windows application

Introduction

This article demonstrates how to use the Face Tracking SDK in Kinect for Windows to track human faces. It provides code samples as well as useful tips on how to call its APIs to get the most out of the face tracking engine.

Background

The Face Tracking SDK is part of Kinect For Windows Developer Toolkit and can be installed from this site. It can be used for markerless tracking of human faces with Kinect camera attached to a PC. The face tracking engine computes 3D positions of semantic facial feature points as well as a 3D head pose. The Face Tracking SDK could be used to drive virtual avatars, recognize facial expressions, Natural User Interfaces and other face related computer vision tasks.

The full API reference is available on MSDN site as part of Kinect For Windows SDK help. I worked on the development of the face tracking engine and its API and so can provide a good overview of its usage. The general overview and some usage tips can also be found in this post on my site.

Using the Face Tracking SDK in your code

You can use the Face Tracking SDK in your program if you install Kinect for Windows Developer Toolkit 1.5. After you install it, go to the provided samples and run/build yourself “Face Tracking Visualization” C++ sample or "Face Tracking Basics-WPF" C# sample. You need to have Kinect camera attached to your PC. The Face Tracking engine tracks faces at the speed of 4-8 ms per frame depending on your PC resources. It does its computations on a CPU (does not use GPU).

This picture demonstrates the results of face tracking. The yellow mask is the 3D mask fit to the face projected to the RGB frame.

This video demonstrates the face tracking capabilities, supported range of motions and few limitations.

In order to use the face tracking engine, include the following headers in your code:

// Include the main Kinect SDK .h file
#include "NuiAPI.h"

// Include the Face Tracking SDK .h file
#include "FaceTrackLib.h"

You also need to link with the provided FaceTrackLib.lib library that will load FaceTrackLib.dll and FaceTrackData.dll at runtime. These DLLs must be located in the working directory of your executable or in the globally searchable paths.

The face tracking engine API is similar to DirectX API and its COM interfaces. To create a face tracker instance do this:

// Create an instance of a face tracker
IFTFaceTracker* pFT = FTCreateFaceTracker();
if(!pFT)
{
    // Handle errors
}

The FTCreateFaceTracker() method creates an instance of the face tracker and returns its IFTFaceTracker COM interface.

Next, you need to initialize the created face tracker with Kinect camera configuration parameters. Kinect has two cameras - video and depth/IR, so both camera configurations need to be passed to the face tracker Initialize() method. The face tracker uses both cameras to increase face tracking accuracy. Camera configuration parameters must be precise, so it is better to use constants defined in NuiAPI.h Kinect header. If can combine Kinect camera with an external HD camera to increase precision and range. In this case, you need to pass correct focal length and resolution parameters in the camera configuration structure. Also, if you use an external camera, you need to provide a depth frame to video frame mapping function, since the default Kinect function works only for Kinect cameras.

To initialize the face tracker instance, do the following:

// Video camera config with width, height, focal length in pixels
// NUI_CAMERA_COLOR_NOMINAL_FOCAL_LENGTH_IN_PIXELS focal length is computed for 640x480 resolution
// If you use different resolutions, multiply this focal length by the scaling factor
FT_CAMERA_CONFIG videoCameraConfig = {640, 480, NUI_CAMERA_COLOR_NOMINAL_FOCAL_LENGTH_IN_PIXELS};

// Depth camera config with width, height, focal length in pixels
// NUI_CAMERA_COLOR_NOMINAL_FOCAL_LENGTH_IN_PIXELS focal length is computed for 320x240 resolution
// If you use different resolutions, multiply this focal length by the scaling factor
FT_CAMERA_CONFIG depthCameraConfig = {320, 240, NUI_CAMERA_DEPTH_NOMINAL_FOCAL_LENGTH_IN_PIXELS};

// Initialize the face tracker
HRESULT hr = pFT->Initialize(&videoCameraConfig, &depthCameraConfig, NULL, NULL);
if( FAILED(hr) )
{
    // Handle errors
}

Also, you need to create an instance of a face tracking result object that receives the 3D tracking results like this:

// Create a face tracking result interface
IFTResult* pFTResult = NULL;
hr = pFT->CreateFTResult(&pFTResult);
if(FAILED(hr))
{
    // Handle errors
}

The face tracking SDK provides an IFTImage interface that can either wrap existing RGB/depth data buffer or can manage its own memory. If you wrap your buffers then you need to fill them with RGB and depth data from Kinect camera. If you let IFTImage to own its memory, then you need to fill its memory with video and depth frame data from corresponding Kinect cameras. The face tracking SDK requires that you pass both video and depth frames to track faces. Both frames must be synchronized, i.e. read from Kinect API at roughly the same time.

Here is how you can "attach" IFTImage interfaces to existing video and depth buffers and then pass those to FT_SENSOR_DATA structure that is used as an input parameter for the tracking functions later:

// Prepare image interfaces that hold RGB and depth data
IFTImage* pColorFrame = FTCreateImage();
IFTImage* pDepthFrame = FTCreateImage();
if(!pColorFrame || !pDepthFrame)
{
    // Handle errors
}

// Attach created interfaces to the RGB and depth buffers that are filled with
// corresponding RGB and depth frame data from Kinect cameras
pColorFrame->Attach(640, 480, colorCameraFrameBuffer, FTIMAGEFORMAT_UINT8_R8G8B8, 640*3);
pDepthFrame->Attach(320, 240, depthCameraFrameBuffer, FTIMAGEFORMAT_UINT16_D13P3, 320*2);
// You can also use Allocate() method in which case IFTImage interfaces own their memory.
// In this case use CopyTo() method to copy buffers

FT_SENSOR_DATA sensorData;
sensorData.pVideoFrame = &colorFrame;
sensorData.pDepthFrame = &depthFrame;
sensorData.ZoomFactor = 1.0f;       // Not used must be 1.0
sensorData.ViewOffset = POINT(0,0); // Not used must be (0,0)

Note, that you need to pass correct image formats for video and depth data as well as frame resolutions in Attach(). Current version of Kinect For Windows supports several resolutions, but the best resolution for the face tracking SDK is 640x480 video and 320x240 depth, since it provides the best combination of the frame rate and quality for this task.

Next, you can call StartTracking() or ContinueTracking() methods to do actual face tracking on a given video and depth frame combo. StartTracking() initiates a tracking session and must be called first. It is an expensive function since it searches for a face on a passed RGB frame. If the call to StartTracking() was a success, you can call ContinueTracking() to continue tracking recognized face on consecutive frames. If there is a big time gap between input frames, ContinueTracking() may fail (too much distance between face locations). In this case, you can call only StartTracking() since it does not use previous face locations. This could be helpful when the camera frame rate is low.

Here is an example on how you can organize the main tracking loop:

bool isFaceTracked = false;

// Track a face
while ( true )
{
    // Call Kinect API to fill 
    // videoCameraFrameBuffer and depthFrameBuffer 
    // with RGB and depth data
    ProcessKinectIO();

    // Check if we are already tracking a face
    if(!isFaceTracked)
    {
        // Initiate face tracking.
        // This call is more expensive and searches over the input RGB frame for a face.
        hr = pFT->StartTracking(&sensorData, NULL, NULL, pFTResult);
        if(SUCCEEDED(hr) && SUCCEEDED(pFTResult->Status))
        {
            isFaceTracked = true;
        }
        else
        {
            // No faces found
            isFaceTracked = false;
        }
    }
    else
    {
        // Continue tracking. It uses a previously known face position.
        // This call is less expensive than StartTracking()
        hr = pFT->ContinueTracking(&sensorData, NULL, pFTResult);
        if(FAILED(hr) || FAILED (pFTResult->Status))
        {
            // Lost the face
            isFaceTracked = false;
        }
    }

    // Do something with pFTResult like visualize the mask, drive your 3D avatar,
    // recognize facial expressions
}

This loop consists of:

reading input frames from Kinect camera API
passing input frames to the face tracker engine
checking for results

All face tracking APIs are synchronous, so you need to schedule your work accordingly.

At the end, you need to release all face tracking interfaces. This frees all of their allocated memory:

// Clean up
pFTResult->Release();
pColorFrame->Release();
pDepthFrame->Release();
pFT->Release();

Few important points about the face tracking API:

There are two modes in which the face tracker operates – with skeleton based information and without. In the 1st mode you pass an array with two head points to StartTracking/ContinueTracking methods. These head points are the end of the head bone contained in NUI_SKELETON_DATA structure returned by Kinect API. This head bone is indexed by NUI_SKELETON_POSITION_HEAD member of NUI_SKELETON_POSITION_INDEX enumeration. The 1st head point is the neck position and the 2nd head point is the head position. These points allow the face tracker to find a face faster and easier, so this mode is cheaper in terms of computer resources (and sometimes more reliable at big head rotations). The 2nd mode only requires color frame + depth frame to be passed with an optional region of interest parameter that tells the face tracker where to search on RGB frame for a user face. If the region of interest is not passed (passed as NULL), then the face tracker will try to find a face on a full RGB frame which is the slowest mode of operation of StartTracking() method. ContinueTracking() will use a previously found face and so is much faster.

Camera configuration structure - it is very important to pass correct parameters in it like frame width, height and the corresponding camera focal length in pixels. The API does not read these automatically from Kinect camera API to give more advanced users more flexibility. If you don’t initialize them to correct values (Kinect SDK headers provide default values for Kinect camera), the tracking accuracy will suffer or the tracking will fail entirely.

Frame of reference for 3D results - the face tracking SDK uses both depth and color data, so the resulting frame of reference for 3D face tracking results is the video camera space (due to some advantages). It is a right handed system with Z axis pointing towards a tracked person and Y pointing UP. The measurement units are meters. Kinect's skeleton frame of reference is the same but it has the origin and axis orientation aligned with the depth camera space! Online documentation has a sample that describes how to convert from color camera space to depth camera space.

Several things that may affect tracking accuracy:

Light – a face should be well lit without too many harsh shadows on it. Bright backlight or sidelight may make tracking worse.
Distance to the Kinect camera – the closer you are to the camera the better it will track. The tracking quality is best when you are closer than 1.5 meters (4.9 feet) to the camera. At closer range Kinect’s depth data is more precise and so the face tracking engine can compute face 3D points more accurately.
Occlusions – if you have thick glasses or Lincoln like beard, you may have issues with the face tracking. This is still an open area for improvement. Face color is NOT an issue as can be seen on this video

Have fun with the face tracking SDK!

History

[May 2012]: Initial version