Capturing motion from video using the Emgu CV library.

Markus Koppensteiner

4.67/5 (9 votes)

Jun 21, 2017

CPOL

10 min read

33246

4063

The article demonstrates how to use commands of the Emgu CV library to perform face recognition, frame subtraction, and dense optical flow.

Introduction

OpenCV (Open Source Computer Vision Library) is a freely available software library containing ready-to-use routines to process visual input such as images or videos. The functions and tools the library offers can be accessed via C/C++, Python or the .NET programming languages. In this article, I focus on the OpenCV wrapper Emgu CV, whose methods can be embedded in a C# program. Here I demonstrate how to load and play a movie, how to detect faces in an image (by using a pre-trained Haar classifier), how to apply the Farneback algorithm for dense optical flow (to capture motion) and how to use frame subtraction (i.e., subtracting pixel information of successive frames to capture motion).

The contents of the article can be read as an introductory guide to some of the commands the Emgu CV library offers, but they are also aimed to show that the presented techniques can be helpful for answering questions arising in the behavioral sciences. For this reason, the article not only comes with descriptions of code but also contains a small application, a video and some brief comments on how the described software routines might be used to record data for behavioral analyses.

As I access the Emgu CV wrapper via C#, a solid knowledge of this programming language is required to understand the code samples properly. By contrast, using the application that is included does not require any programming skills. The application was compiled in Visual Studio 2015 and is based on the .NET framework 4.5 and Emgu CV 3.0 (click here). To make the Emgu CV functions and classes available in the Visual Studio developmental environment several steps have to be taken. First, after choosing an application type (in my case a standard Windows Form Application) go to the Solution Explorer Window, right-click on "References" and choose "Add reference". In the window that appears, select "Browse" and search for the folder, where Emgu CV has been stored during its installation. Then the DLL files contained in the "bin" folder need to be included. Entering the C# command "using" and the names of the Emgu CV libraries (e.g., using Emgu.Util;) in the main program allows to access the necessary components (see more on the last step in the Form1 file that can be downloaded via "Download source"). A detailed description of all these steps can be found online (click here ).

I do not describe the mathematical background of the procedures I present here. I only give a description of how to use the tools the library offers. If one is interested in the math of all this, please, look it up in other articles.

Absolute Difference between Pixel Information of Successive Frames of a Movie

In the following, I present a simple method to extract the quantity of motion that occurs between two successive frames of a video. The pixels of a video can be turned into greyscale values with a color range from 0 to 255 (8-bit picture ranging from black to white). The absolute difference between the greyscale values of two frames (i.e., the difference between pixels on the same spot in both images), when no motion (or position shift of an object) has occurred results in a black image because pixel values cancel out each other (the operation gives 0 for all pixels). However, when there is a position shift different graduations of grey will appear (see picture above) in some spots of the "difference image". The pixel values above a certain threshold (e.g., gray-scale value of 100) can be counted in order to estimate the quantity of motion. There are, of course, limitations and problems using this method (e.g., changing lighting conditions of pixel color) but, overall, it gives a simple and fairly robust estimate of how much change is going on.

Using the code

The piece of code presented below is only a skeleton version of the method used in the application (see download sample). It focuses on the most important Emgu CV commands needed to determine the absolute difference between successive frames. The code piece builds on steps that need to be done before. This includes initializing several variables such as the Capture class. To capture a movie a code line such as Capture capture_movie = new Capture(movie_name) is needed. To get more insight into this, an inspection of the source code is helpful (see mnu_LoadMovie_Click). In summary, the code below grabs a frame, turns it into a greyscale frame, subtracts its pixel values from the previous frame, and shows the result of this procedure in a window (please note, that I mostly work with the Image<> class instead of Mat, because it has some features the Mat class does not offer).

//
// Code skeleton for frame subtraction
//
// .... find full version of the code in Form1 file

 public void Abs_Diff_And_Areas_of_Activity()
        {
          

        // needed to clean up; all values stored in image are set to 0 (=black)
        // could also be done by reintializing the Image array below each time  
        img_abs_diff.SetZero(); // is a global Image<> variable definede elsewhere
                

        // the current frame becomes the previous frame (is stored in prev_frame) 
        // the Mat frame variable is defined in the section global variables of the source code
        prev_frame = frame;

        // ....omitted code 
                
               
        // drives movie to the frame to be decoded/captured next; 
        // in this case to the frame given in the variable frame_nr
        // capture_movie variable has to be intialized before =>
        // Capture capture_movie = new Capture(movie_name) 
        capture_movie.SetCaptureProperty(CapProp.PosFrames, frame_nr);

        // capture the frame of loaded movie at position of frame_nr (see previous line)
        // QueryFrame pushes "pointer of Capture class" forward; calling it again grabs the next frame
        frame = capture_movie.QueryFrame();
              

        // used for changing original frame size; 
        // resizing factor is given in textfield on user interface
        // making images smaller accelerates processing 
        Size n_size = new Size(frame.Width / Convert.ToInt32(txt_resize_factor.Text),
              frame.Height / Convert.ToInt32(txt_resize_factor.Text));
                
        // resize frame and previous frame, CvInvoke is an Emgu CV Class
        // the destination and the source frame for resizing are the same 
        CvInvoke.Resize(frame, frame, n_size );
        CvInvoke.Resize(prev_frame, prev_frame, n_size);

        // show resized frame in window 
        CvInvoke.Imshow("Movie resized", frame);
               
        // greyscale images to store information of the frame substraction procedure
        Image<Gray, Byte> prev_grey_img, curr_grey_img;

        // initialize images used for frame subtraction using size of resized frame above  
        rev_grey_img = new Image<Gray, byte>(frame.Width, frame.Height);
        curr_grey_img = new Image<Gray, byte>(frame.Width, frame.Height);
        
        // assign frame and previous frame to greyscale images (turns them into greyscales) 
        curr_grey_img = frame.ToImage<Gray, byte>(); // turns Mat frame variable into Image<> variable
        prev_grey_img = prev_frame.ToImage<Gray, Byte>();

        // subtract pixel values of successive frames (greyscale) from each other to get 
        // areas where changes in pixel color occured 
        // only provides positive values -> absolute difference
        CvInvoke.AbsDiff(prev_grey_img, curr_grey_img, img_abs_diff);

        // ALTERNATIVE: CvInvoke.Subtract(prev_grey_img, curr_grey_img, img_abs_diff); 
        // also gets differences but includes negative values as well

        //....omitted code // in source code file: code that transfers greyscale values above 
        // certain threshold (areas after substracting) into array
       

        // show results of CvInvoke.AbsDiff function 
        CvInvoke.Imshow("Frame Subtraction", img_abs_diff);


         // Release memory
         curr_grey_img.Dispose();
         prev_grey_img.Dispose();

                
       }
//

Detecting Faces in Images by using a Haar Classifier

Everybody who uses a modern digital camera or the camera on their smartphone has come in contact with the automatic face detection feature of these devices. Such object detection can also be done using tools provided by OpenCv and Emgu CV. These tools are based on machine learning algorithms for object identification, or more precisely on so-called Haar classifiers. Haar classifiers are trained with a large number of positive (e.g., faces) and negative examples (e.g., images of the same size, which are not faces). Such a classifier can be then applied to unclassified images (e.g., images with faces) in order to identify objects in them (i.e., objects for which the classifier was trained). OpenCv offers ready-to-use xml.files containing data to detect different kinds of objects (e.g., faces, eyes etc.). The code presented below makes use of such a pre-trained classifier. However, it is also possible to create one's own xml.file for object classification.

As in the code example given above a movie has to be loaded first, in order to apply the commands of the subsequent code piece (i.e., Capture capture_movie = new Capture(movie_name)). The code contains the basic principles of using a Haar classifier in Emgu CV. Using a different classifier (e.g., eyes) would give different results, of course, but the general principle would be the same.

Using the code

// 
// Code skeleton for face detection
// 
// .... find full version of code in Form1 file

private void Face_Detect()
             {

                double rect_size = 0;

                // rectangle structure to store largest rectangle (largest face found) 
                // see below foreach loop
                Rectangle largest_rect = new Rectangle();

                //.... omitted code

                // using Haar classifier to find faces in images 
                // data of the trained classifier is stored in xml file 
                // has be in the same folder as the exe.file of the application
                CascadeClassifier haar = new CascadeClassifier("haarcascade_frontalface_default.xml");

                // drive movie to given frame number (stored in frame_nr variable)
                capture_movie.SetCaptureProperty(CapProp.PosFrames, frame_nr);

                // grab frame at the given position (given by frame_nr variable) 
                frame = capture_movie.QueryFrame();

                // convert frame stored as Mat variable to Image<bgr, byte> variable 
                // grabbed_image is global variable (see source code)
                grabbed_image = frame.ToImage<Bgr, Byte>();

                // used for changing original frame size 
                // resizing factor is given in textfield on user interface
                Size n_size = new Size(grabbed_image.Width / Convert.ToInt32(txt_resize_factor.Text),
                grabbed_image.Height / Convert.ToInt32(txt_resize_factor.Text));

                // resize grabbed frame 
                // for demonstration purposes I use the resize function here; 
                // this is different from other procedures in this article (eg., frame subtraction)
                CvInvoke.Resize(grabbed_image, grabbed_image, n_size);
                

                // define greyscale image; has the same size as grabbed_image
                Image<Gray, Byte> grey_img = new Image<Gray, byte>(grabbed_image.Width, 
                grabbed_image.Height);
                // convert grabbed image to greyscale image and store the result in greyscale image 
                grey_img = grabbed_image.Convert<Gray, byte>();
                 
                // define rectangle structure array for storing the position of all faces found 
                Rectangle[] rect;

                // use haarclassifier xlm file to detect faces in grey_scale image and 
                // store results in rect structure array
                // second parameter is factor by which the search window is scaled 
                // between subsequent scans
                // (for example, 1.1 means increasing window by 10%)
                // third parameter is minimum number (minus 1) of neighbor rectangles 
                // that make up an object
                rect = haar.DetectMultiScale(grey_img, 1.1, 3);

                // loop through rectangle array and draw each of them onto the image
                // find largest rectangle and draw it in a different color 
                foreach (var ele in rect)
                {

                    // check if found rectangle is largest rectangle and store this information
                    if ((ele.Width * ele.Height) > rect_size)
                    {
                        rect_size = ele.Width * ele.Height;
                        largest_rect = ele;
                    }

                    // draw found rectangles onto grabbed (original) frame; use red color 
                    grabbed_image.Draw(ele, new Bgr(255, 0, 0), 3);

                }

                // draw largest rectangle onto grabbed image (in green)
                grabbed_image.Draw(largest_rect, new Bgr(0, 225, 0), 3);

                // show results of these procedures   
                CvInvoke.Imshow("Original Video", grabbed_image);

                // release memory 
                grey_img.Dispose();
                haar.Dispose();

               //.... omitted code

            }

Applying Dense Optical Flow to Capture Pixel Position Shifts Occurring between Successive Frames of a Movie

When the term optical flow was coined (Gibson, 1940) it was mainly reserved for describing movement patterns caused by the relative motion between an observer and a scene. More precisely, it described the apparent (i.e., in principle non-existent) motion of objects, surfaces, and edges the eye has to process when people or animals move around in their environments. A modern – maybe hard to digest – definition says that the optical flow is the distribution of the apparent velocities of movement of brightness patterns in an image.

Similar to the frame subtraction method presented above, optical flow algorithms process changes in pixel color to detect motion. Overall, there are two main categories of algorithms, namely the sparse and the dense optical flow. The former uses a small set of vital features to detect motion, whereas the latter processes all the pixel information that is there. Dense optical flow is more accurate but also needs more resources. In the example below I present code for dense optical flow based on the Gunnar Farneback algorithm because, for the work I do, accuracy is more important than processing speed. The sample code is split into two functions. The first function presents code on the optical flow procedure; the second function gives insight on how to access the results of the procedure and how to draw these results onto the screen. More information on the parameters of the Farneback algorithm can be found on Emgu CV and OpenCV webpages. I do not (and cannot) give information about the internal structure of the algorithm.

Again, as in the other code examples, a movie has to be loaded first to apply the commands of the subsequent code piece (i.e., Capture capture_movie = new Capture(movie_name)). The code skeleton for the Draw_Farneback_flow_map() function only focuses on the lines that are needed to access information of pixel shifts and how to make these shifts visible. In the source code file a great deal of extra code can be found (e.g., the sum of all vectors for left and right side separately, information about changes in direction etc.).

Using the code

// 
// Code skeleton for face detection 
// 
// .... find full version of code in Form1 file 

public void Dense_Optical_Flow()
        {

           //.... omitted code

           // frame becomes previous frame (i.e., prev_frame stores information before movie 
           // is pushed forward to next frame by QueryFrame() function)
           prev_frame = frame;

           // .... omitted code

           // set "pointer" to position where frame capturing will start
           capture_movie.SetCaptureProperty(CapProp.PosFrames, frame_nr);

           // capture frame
           frame = capture_movie.QueryFrame();

           // .... omitted code

           // used for changing original frame size
           // resizing factor is given in textfield on user interface
           Size n_size = new Size(frame.Width / Convert.ToInt32(txt_resize_factor.Text),
                  frame.Height / Convert.ToInt32(txt_resize_factor.Text));

           // resize frame and previous frame (make them smaller to reduce processing load)
           CvInvoke.Resize(frame, frame, n_size);
           CvInvoke.Resize(prev_frame, prev_frame, n_size);

           // images that are compared during the flow operation (see below) 
           Image<Gray, Byte> prev_grey_img, curr_grey_img;

           prev_grey_img = new Image<Gray, byte>(frame.Width, frame.Height);
           curr_grey_img = new Image<Gray, byte>(frame.Width, frame.Height);

           // image arrays to store information of flow vectors => results of Farneback algorithm
           // one image array for each direction, which is x and y
           Image<Gray, float> flow_x;
           Image<Gray, float> flow_y;

           flow_x = new Image<Gray, float>(frame.Width, frame.Height);
           flow_y = new Image<Gray, float>(frame.Width, frame.Height);

           // assign information stored in frame and previous frame to greyscale images
           curr_grey_img = frame.ToImage<Gray, byte>();
           prev_grey_img = prev_frame.ToImage<Gray, Byte>();

           // apply Farneback dense optical flow  
           // parameters are the two greyscale images (these are compared) 
           // and two image arrays storing the results of algorithm  
           // the rest of the parameters are (for more details consult google):
           // pryScale: specifies image scale to build pyramids: 
           //           0.5 means that each next layer is twice smaller than the former
           // levels: number of pyramid levels: 1 means no extra layers
           // winSize: the average window size; larger values = more robust to noise but more blur
           // iterations: number of iterations at each pyramid level
           // polyN: size of pixel neighbourhood: higher = more precision but more blur
           // polySigma
           // flags
           CvInvoke.CalcOpticalFlowFarneback(prev_grey_img, curr_grey_img, flow_x, flow_y, 
                         0.5, 3, 15, 3, 6, 1.3, 0);

           // call function that shows results of Farneback algorithm (see next section)  
           Draw_Farneback_flow_map(frame.ToImage<Bgr, Byte>(), flow_x, flow_y, overall_step);
           

           // Release memory 
           prev_grey_img.Dispose();
           curr_grey_img.Dispose();
           flow_x.Dispose();
           flow_y.Dispose();

           //.... omitted code    
          
        }

private void Draw_Farneback_flow_map(Image<Bgr, Byte> img_curr, 
        Image<Gray, float> flow_x, Image<Gray, float> flow_y, int step, int shift_that_counts = 0)
        {

         // NOTE: flow Images (flow_x and flow_y) are organized like this:
         // at index (is position of pixel before optical flow operation) of Image array
         // the shift of this specific pixel after the flow operation is stored
         // if no shift has occured value stored at index is zero
         // (i.e., pixel[index] = 0 
         
         // Point variable where line between pixel positions before and after flow starts
         Point from_dot_xy = new Point(); 
         // Point variable, which will be the endpoint of line between pixels before and after flow
         Point to_dot_xy = new Point(); 
            
         MCvScalar col; // variable to store color values of lines representing flow vectors
         col.V0 = 100;
         col.V1 = 255;
         col.V2 = 0;
         col.V3 = 0;

         //.... omitted code

        
         // loops over image matrix and gets positions of dots before and after optical flow operations 
         // and draws vectors between old and new positions
         // only a subset of pixels are process (see step)
            for (int i = 0; i < flow_x.Rows; i += step) // flow_ y has the same size and row numbers
               for (int j = 0; j < flow_x.Cols; j += step) // flow_y has the same col numbers
                {

                  // pixel shift measured by optical flow is transferred to Point variables 
                  // stores starting point of motion (from_dot..) and its end points (to_dot...)
                  // accesses single pixels of flow matrix, where x-coords and y-coords of pixel after 
                  // flow procedure are stored; only gives the shift
                  to_dot_xy.X = (int)flow_x.Data[i, j, 0]; 
                  to_dot_xy.Y = (int)flow_y.Data[i, j, 0]; 

                  from_dot_xy.X = j; // index of loop is position on image (x-coord); X is cols
                  from_dot_xy.Y = i; // index of of loop is  position on image (y-coord); Y is rows

                  // new x-coord position of pixel 
                  // is "original" position plus shift stored in this position  
                  to_dot_xy.X = from_dot_xy.X + to_dot_xy.X;  
                  to_dot_xy.Y = from_dot_xy.Y + to_dot_xy.Y; 

                  //.... omitted code

                  // draw line between coords to diplay pixel shift stored in flow field 
                  CvInvoke.Line(img_curr, from_dot_xy, to_dot_xy, col, 2); 

                  // show image with flow depicted as lines
                  CvInvoke.Imshow("Flow field vectors", img_curr); 

                } 

      

           //.... omitted code

         
        }


//

Points of Interest

The article comes with a small application that can perform all the analyses that are described above and even more than that. Since I do research in the field of non-verbal communication my main interest is in extracting nonverbal cues from human behaviors. For this reason, the application contains some extras that are not mentioned in the code samples above. The frame subtraction section contains an additional function that stores all values above a certain threshold and produces an image of the areas where changes in pixel color have occurred. The optical flow functions contain code that calculates a summed direction vector for the right and the left side of the window. They also provide information about the changes in the directions of the summed vectors (in an additional window). Moreover, there are code passages, which store the information extracted by the routines described here. All of this is intended to be used to do automated analyses of human motion behavior.

The user interface of the application informs about the number of total frames of a video, the frame rate, and the current frame number. It gives the threshold of the greyscale values that are accepted for the frame subtraction procedure (the default number of 100 means that only values above this threshold are used for the frame subtraction routine). “Steps” gives the number of frames a video will be pushed forward (or backward) after, for instance, using the "Forward" or the "Apply Stepwise" button. The "Divide Size by" text-field specifies to what extent the original video will be reduced in size (2 means that the width and the height of video will be halved). Making the video smaller speeds up the processing of image data. The option buttons on the interface, the button "Play" and the options in the menu "File" are self-explanatory, I think. "Apply Stepwise" applies one of the image processing routines (option buttons) to a video in a stepwise manner (applies it to the current frame and the current frame plus number given in "Steps"). Data captured with the software can be saved in txt.files (see menu).

To have access to the Emgu CV routines a reference to the DLL files of the library is needed (by adding the Emgu CV folder to the environmental variables of windows). It also possible to copy (may not be very elegant but is relatively simple) all necessary DLL's to the folder where the exe. file of the program is. If you are not interested in the code but want to use the software and have troubles with it, please, contact me

There is no guarantee that the samples presented here are free of bugs. Also, the code can, for sure, be organized in a more straightforward and parsimonious way.

Acknowledgements

This work was supported by the Netherlands Institute for Advanced Study in the Humanities and Social Sciences (NIAS/KNAW), www.nias.knaw.nl. by the EURIAS Fellowship programme, and by the European Commission (Marie-Sklodowska-Curie Actions - COFUND Programme - FP7).