Visit the Intel® Developer Zone for IoT

Introduction

This article presents the advantages of developing embedded digital video surveillance systems to run on 4th generation Intel® Core™ processor with Intel® HD Graphics, in combination with the Intel® System Studio software development suite. While Intel® HD Graphics is useful for developing many types of computer vision functionalities in video management software; Intel® System Studio is an embedded application development suite that is useful in developing robust digital video surveillance applications.

Digital Security & Surveillance Overview

Video Surveillance systems capture embedded images or videos, and further compress, store, and transmit the extracted image or video information over communication networks.

Figure 1 – An Overview of Video Surveillance System

Nowadays video management software (VMS) helps with efficient monitoring, transmission, and storage of the surveillance video. Video Surveillance solutions are security tools that help reduce crime, and help protect public and property. Digital Security and Surveillance (DSS) may be used in the following kinds of applications:

Real-time vehicle location tracking
Analysis of customer shopping behavior
Theft prevention
Site surveillance for public premises
Remote education with live class video streaming
Patient monitoring in hospital and care facility

The 4th Generation Intel® Core™ Processor Family Overview

The 4th generation Intel® Core™ processor (code-named Haswell) is based on the 22nm technology and built on the 3rd generation Intel® Core™ processor graphics. The 4th generation Intel Core processor is a complete SoC (System-on-Chip),- integrating all the major building blocks for a system onto a single chip. With CPU, graphics, memory, and connectivity in one package, this innovative modular design provides the flexibility to package a compelling processor graphics solution for multiple form factors.

Figure 2 – Advantages of 4th generation Intel® Core™ Processor Architecture

This architecture introduces support for many new instructions that are specifically designed to provide better performance to a broad range of applications such as: media, gaming, data processing, hashing, cryptography, and so on.

The new features are summarized below:

Intel® Advanced Vector Extensions 2 (Intel® AVX2) - Integer data types are expanded to 256-bit SIMD (single instruction multiple data). Intel® AVX2's integer support is particularly useful for processing visual data commonly encountered in consumer imaging and video processing workloads. With the 4th generation Intel Core processor, you have both Intel® Advanced Vector Extensions (Intel® AVX) for floating point data types as well as Intel® AVX2 for integer data types.
Bit Manipulation New Instructions (BMI) - Bit manipulation instructions are useful for compressed databases, hashes, large number arithmetic, and a variety of general purpose codes.
Floating-point Multiply Accumulate (FMA) - Intel’s FMA significantly increases peak flops and provides improved precision to further improve transcendental mathematics. FMA is broadly usable in high performance computing, professional quality imaging, and face detection. FMA operates on scalars, 128-bit and 256-bit packed single and double-precision data types. [See the initial Intel® AVX specification for a description of these instructions].
Gather Instructions – The gather instructions are useful for vectorized code that access non- adjacent data elements. The 4th generation Intel Core processor’s gather operations are mask- based for safety (like conditional loads and stores introduced in Intel® AVX). Gather operations are favorable to clip values, to clamp boundaries, or to perform conditional operations.
Any-to-Any permutes These permutes are incredibly useful shuffle operations. The 4th generation Intel Core processor adds support for DWORD and QWORD granularity and allows permute across an entire 256-bit register.
Vector-Vector Shifts - Vector-vector shift operations are added to shift vectors where the amount of shift is controlled by a vector. These are critical in vectorized loops with variable shifts.

The details of these instructions can be found in Intel® 64 and IA-32 Architectures Software Developer Manuals and Intel® Advanced Vector Extensions Programming Reference manual.

Intel® System Studio Overview

Intel® System Studio is a comprehensive and integrated tool suite that provides developers with advanced system tools and technologies to help accelerate the delivery of the next generation power efficient, high performance, and reliable embedded and mobile devices.

Figure 4 – Overview of Intel® System Studio

Intel System Studio includes the following components:

Intel® VTune Amplifier for Systems: Advanced CPU and System-on-Chip (SoC) analysis for power and performance profiling and tuning.
Intel® Inspector for Systems: Dynamic and static analyzer for identifying hard-to-find memory and threading errors.
Intel® C++ Compiler: Industry-leading C/C++ compiler including the Intel® Cilk Plus parallel model for optimized performance.
Intel® Integrated Performance Primitives (Intel® IPP): Extensive library of high-performance software building blocks for signal, data and multimedia processing.
Intel® Math Kernel Library (Intel® MKL): Library of highly optimized linear algebra, Fast Fourier Transform (FFT), vector math and statistics functions.
GDB Debugger: Application debugger for fast application-level defect analysis to increase system stability, and detect application-level instruction trace and data race conditions.
Intel ®JTAG Debugger: System debugger for SoCs for low overhead event tracing, logging, and source-level debug of UEFI firmware, bootloader, OS kernel, and drivers.

Using Intel® Core™ Processor with Intel® System Studio for Video Surveillance Applications

The advantage of using the latest Intel® Architecture processor and Intel® System Studio for a video surveillance application can be listed as follows:

Portability: Video Surveillance applications developed on Intel® Architecture using Intel® System Studio can be easily ported to different hardware platforms - from big core (Intel® XEON ) to small core (Intel® Atom).
Optimization: Performance libraries in Intel® System Studio have been optimized for a variety of SIMD instruction sets. Automatic ‘dispatching’ detects the SIMD instruction set that is available on the running processor and selects the optimal SIMD instructions for that processor.
Scalability: Analysis tools improve the performance on multicore systems and increase scalability on systems with more cores.
Reliability: Intel® System Studio’s dynamic and static analysis tools strengthens the reliability of surveillance applications.
Security: The 4th generation Intel Core processor improves the performance characteristics of Intel’s Advanced Encryption Standard (AES). Both Cipher Block Chaining (CBC) and Galois Counter Mode (GCM) will give better performance on the latest architecture. Also, Intel® IPP library has functions to support AES-NI standard.

Video/Audio Transcoder Using Intel® IPP

The basic blocks of video surveillance systems are:

Camera to capture Video/Audio data
Video/Audio Recorder/Data storage device with Encoder
Network transmission via Internet
Video/Audio Decoder with analytics
Video Management software (VMS)

Figure 5 – Basic blocks of a Video Surveillance Software

Intel® Integrated Performance Primitives (Intel® IPP) software building blocks are highly optimized by using advanced instruction sets like Intel® Advanced Vector Extensions 2 (Intel® AVX2) on the 4th generation Intel® Core™ processor platform. Intel® AVX2 optimization in the Intel® IPP library consists of ‘hand-optimized’ and ‘compiler-tuned’ functions – code that has been directly optimized for the Intel® AVX2 instruction set. To get complete list of hand-optimized functions for 4th generation Intel Core processor, refer to the article- Intel IPP support for Intel® AVX2.

For example, ippInit( ) is the instruction set ‘dispatcher’ that is built into the library and it is arguably one of the most valuable features of the Intel® IPP library. This dispatcher automatically executes the optimal version of each Intel® IPP function at run time, to match your specific processor type and instruction set. You must call ‘ippInit( )’ function to automatically initialize the static or dynamic library that is most appropriate for the currently running processor.

Transcoder

The transcoder is a subset of Intel® IPP functions designed for digital media applications. It is a library of functions for encoding and decoding of video data according to MPEG-1, MPEG-2, MPEG-4, DV and H.261, H.263, H.264, AVS, and VC-1 standards. These functions have a convenient interface and present an appropriate solution for both encoding and decoding pipeline and, as with all other parts of Intel® IPP, are intended for the development of high-performance cross-platform code.

In the video surveillance industry, H.264 is the most popular codec and is widely used in applications where there are demands for high frame rates and high resolution at low bandwidths.

The following figure illustrates the basic blocks of the H.264 Decoder/Encoder.

Figure 6 – H.264 transcoder basic blocks

Intel® IPP implements many H.264 decoder functions.

Intel® IPP Function	Description
ippiQuantLuma8x8_H264_16s_C1	Performs quantization for 8X8 luma block coefficients including 8X8 transform normalization.
ippiQuantLuma8x8Inv_H264_16s_C1I	Performs inverse quantization for 8X8 luma block coefficients including normalization of the subsequent inverse 8X8 transform.
ippiDecodeCAVLCCoeffs_H264_1u16s	Decodes any non-chroma DC coefficients CAVLC coded.
ippiDecodeCAVLCCoeffsIdxs_H264_1u16s	Decodes CAVLC coded coefficients. ippiTransformDequantLumaDC_H264_16s_C1I Performs integer inverse transformation and dequantization for 4x4 luma DC coefficients.
ippiTransformDequantChromaDC_H264_16s_C1I	Performs integer inverse transformation and dequantization for 2x2 chroma DC coefficients.
ippiPredictIntra_4x4_H264_8u_C1IR	Performs intra prediction for a 4x4 luma component.
ippiPredictIntra_16x16_H264_8u_C1IR	Performs intra prediction for a 16x16 luma component.
IppiMBReconstructHigh_32s16u	Macroblock Reconstruction.
IppiFilterDeblock_16u	Deblocking filters.

The H.264 Encoder uses the H.264 Decoder functions to calculate inter- and intra-predicted blocks, macro-block reconstruction, and de-blocking filtering. Forward transform and quantization, and CAVLC coding are performed by Encoder functions.

Refer to Intel® IPP Architecture Reference Manual Volume-2 for more information on these functions. For more information on the list of Intel® IPP functions optimized for 4th generation Intel® Core™ processor (code name ‘Haswell’) in Intel® IPP library please visit: http://software.intel.com/en-us/articles/haswell-support-in-intel-ipp.

Data Transmission

Multiple-Input Multiple-Output (MIMO) is smart antenna technology, used in transceiver equipment for wireless radio communication. MIMO uses multiple antennas to send multiple parallel signals. Intel® System Studio supports Multiple Input Multiple Output algorithm the way it is used for Long Term Evolution (LTE) wireless transmissions. The algorithm takes a receiver (RX) signal and returns an estimated transmit (TX) signal so that a Minimum Mean Square Error is achieved. Inceased data rates, efficiency, and increase in the number of input cameras are some of the advantages of using LTE MIMO in video surveillance application.

Video Management Software

Combining the 4th generation Intel Core processor and Intel® System Studio with a highly advanced video management software (VMS) works to your advantage. Processor features like multithreading and multiprocessing can be combined with advanced video compression offered by sophisticated video management software to give you benefits like reduction in processing time, optimization of processor resources, and compression of storage space.

Figure 7 – Functional blocks of Video Management Software (VMS)

The block diagram shows general functionalities of the video management software. You can use Intel® IPP functions to enhance any of these functionalities.

Intel® IPP’s segmentation functions allow extracting parts of the image that can be matched with real objects. The watershed and gradient segmentation functions are region-based methods to split an image into distinctive areas. Background/foreground segmentation functions allow for distinguishing between moving objects and stable areas of the background. The following segmentation functions can be used:

ippiLabelMarkers - Labels markers in image with different values.
ippiSegmentWatershed - Performs watershed image segmentation using markers

Intel® IPP implements the following image processing functions that perform morphological operations on images:

MorphologyInit - Initializes morphology state structures for the erosion or dilatation operation.
MorphologyGetSize - Computes the size of the morphology state structures for the erosion or dilatation operation.

The object detector is based on Haar classifiers. Each classifier uses Haar features to decide if the region of the image looks like the predefined image or not. The following Intel® IPP functions can be used for implementing Haar features in the object detector:

HaarClassifierInitAlloc - Allocates memory and initializes the structure for standard Haar classifiers.
ApplyHaarClassifier - Applies a Haar classifier to an image.

Motion detection plays a fundamental role in any video surveillance application and Intel® IPP’s motion estimation implementation can be used to calculate the following:

residual block –(difference between source block and predicted block)
some characteristics of the residual block
some characteristics of the blocks. These characteristics can be used for comparison of the blocks.

Using Intel® C++ Compiler to Optimize Transcoder and Video Management Software

The Intel® C++ Compiler, which is part of Intel® System Studio suite, is a highly optimizing compiler for Intel® architecture and compatible processor technologies. It can generate code for processors that support certain features. The three main types of processor-specific optimization options that you can use while compiling code using the Intel Compiler are the march, x, and ax options. The table below illustrates these options for generating code for processors supporting Intel® AVX2 instructions:

Option	Description
–march=core-avx2 (On Linux*)	Using this option causes the compiler to generate code for processors that support Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® AVX, SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions. The resulting executable can be run on the specified or later Intel® and other compatible non-Intel® processors that support the instruction set.
–xCORE-AVX2 (On Linux*)	Executables generated from this processor-specific option can only be run on the specified or later Intel® processors because they incorporate optimizations specific to those processors and use a specific version of the Streaming SIMD Extensions (SSE) instruction set and/or the Intel® Advanced Vector Extensions (AVX) instruction set. Using this switch enables some optimizations not enabled with the previous option (-march=core-avx2).
-axCORE-AVX2 (on Linux*)	Using this option while compiling generates feature-specific auto-dispatch code paths for Intel® processors if there is a performance benefit. Processor dispatch technology performs a check at execution time to determine which processor

Using Intel® Vtune Amplifier to Analyze Performance

Intel Vtune Amplifier, which is part of the Intel System Studio suite, is ideal for identifying performance bottlenecks in your code. It can help detect algorithmic or architectural bottlenecks. Intel® System Studio is supported on many Intel architectures including the 4th generation Intel® Core™ processor architecture.

To learn more about identifying performance issues on software running on the 4th Generation Intel® Core™ Processor family, refer to this article - Using Intel® VTune™ Amplifier XE to Tune Software on the 4th Generation Intel® Core™ Processor Family

Figure 8 –Analysis of an application using Intel® Vtune AMplifier

References

Intel® Embedded Design Centre: Digital Security Surveillance - www.intel.com/info/dss
Building Digital Security & Surveillance (DSS) Systems Based on Intel Technology (End to end guide) - http://www.intel.in/content/dam/www/public/us/en/documents/presentation/dss-systems-intel-technology-guide.pdf
Haswell New Instruction Descriptions Now Available - http://software.intel.com/en-us/blogs/2011/06/13/haswell-new-instruction-descriptions-now-available
Intel IPP support for Intel® AVX2 - http://software.intel.com/en-us/articles/haswell-support-in-intel-ipp
Intel® AVX2 optimization in Intel® MKL - http://software.intel.com/en-us/articles/intel-mkl-support-for-intel-avx2
Haswell Cryptographic Performance (White paper)
http://www.intel.in/content/www/in/en/communications/haswell-cryptographic-performance-paper.html?wapkw=haswell

Optimization Notice revision #20110804

Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

© 2014, Intel Corporation. All rights reserved. Intel, the Intel logo, VTune, Cilk, Atom, Core and Xeon are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others.