This whitepaper describes how developers can integrate Intel® RealSense™ SDK background segmentation (BGS) middleware to create new immersive collaboration applications. It outlines the expected behaviors and performance expectations under a variety of scenarios and presents some limitations that developers need to be aware of before shipping products to consumers. The primary audience is development teams implementing BGS and OEMs.
Background and Scope
Background segmentation (also known as "BGS technology") is the key product differentiator for the Immersive Collaboration and Content Creation category for the Intel® RealSense™ camera. The ability for users to segment-out their backgrounds in real-time without the need for specialized equipment or post-processing is a compelling value-add for existing teleconferencing applications.
There is ample potential for users to augment their existing uses or invent new ones based on BGS technology. For example, consumers can view shared content together with friends on YouTube* through another sharing software over a video chat session. Co-workers can see each other overlaid onto a shared workspace during a virtual meeting. Developers can integrate BGS middleware for creating new uses like changing background images or adding video to background while running camera-based or sharing-based applications. Figures 1 and 2 illustrate applications that have immersive uses using the Intel RealSense camera. In addition, developers can think about uses such as taking selfies and changing the background, using collaboration tools such as browser or office applications to do sharing and editing with multi-parties, for example, creating a Karaoke video using a different background.
Creating a BGS Sample Application
In this paper we explain how developers can replace the background with video or images in a sample application. We also provide a snippet for blending the output of the image from middleware with any background image and what to expect in the way of performance.
The current implementation of background segmentation middleware supports YUY2 and RGB formats. Resolution varies from 360p to 720p for RGB and 480p for depth image.
Figure 3 shows the high-level pipeline for BGS. The depth and color frames are captured by the Intel RealSense camera and passed to the core SDK (that is, the Intel RealSense SDK runtime). Based on the request from the application, the frames are delivered to the User Extraction block, which is the segmented RGBA image. This image can be alpha-blended with any background-based RGB image to create the final output. Developer can use any mechanism to blend the images on screen but using graphics can yield best performance.
Figure 3. BGS pipeline.
The following steps explain how to integrate 3D segmentation into a developer application.
1. Install the following components as part of the Intel RealSense SDK:
- Intel RealSense SDK core runtime
- Background Segmentation module
2. Use a web setup or standalone installer to install only core and personify components. Runtime can be installed only in UAC mode.
intel_rs_sdk_runtime_websetup_x.x.x.xxxxxx --silent --no-progress --accept-license=yes --finstall=core,personify --fnone=all"
You can detect which runtime is installed on the system by using the following Intel RealSense SDK API:
3. Create Instance for using the 3D camera. It creates a pipeline construct for running any 3D-based algorithm.
PXCSenseManager* pSenseManager = PXCSenseManager::CreateInstance();
4. Enable the module of the middleware that you need to use. It is recommended that you enable only the module that the application needs.
pxcStatus result = pSenseManager->Enable3DSeg();
5. Identify which profile is needed by your application. Running at higher resolution and frames per second can impact performance. Pass the profiles to get a specific stream from the camera.
PXC3DSeg* pSeg = pSenseManager->Query3DSeg();
6. Initialize the pipeline for the camera and pass the first frame to the middleware. This stage is required by all middleware and is needed to make the pipeline work.
result = pSenseManager->Init();
7. Retrieve the segmented image from the camera. Output of image from middleware is RGBA and contains only the segmented part.
8. Blend the segmented image with your own background.
Note: Blending has a significant impact on performance if done on a CPU versus a GPU. The sample application runs on CPU.
- You can adopt any technologies to do blending with background image pixel and RGBA segmented image.
- You can use zero copy for copying data to system memory using the GPU instead of the CPU.
- Direct3D* or OpenGL* can be used for blending based on preference.
Here is a code snippet for getting image pass to system memory where
srcData is of type pxcBYTE -
srcData = segmented_image_data.planes + 0 * segmented_image_data.pitches;
Steps for Blending and Rendering
- Capture: Read streams for color and depth data from camera
- Segment: Discriminate between background and foreground pixels
- Copy color and segmented image (depth mask) into textures.
- Resize segmented image (depth mask) to the same resolution as the color image.
- (Optional) Load or update background image (if replacing) into a texture.
- Compile/load shader.
- Set color, depth, and (optional) background textures for shader use.
- Run shader and present.
- (For a videoconferencing application) Copy blended image to NV12 or YUY2 surface.
- (For videoconferencing application) Pass surface to Intel® Media SDK H.264 HW Encoder.
The application’s behavior is affected by three factors:
The table below shows CPU utilization on a 5th generation Intel® Core™ i5 processor.
| ||No Render ||Render on CPU ||Render on GPU |
|720p/30fps ||29.20% ||43.49% ||31.92% |
|360p/30fps ||15.39% ||25.29% ||16.12% |
|720p/15fps ||17.93% ||28.29% ||18.29% |
To verify the impact of rendering on your own machine, run the sample application with and without the "-noRender" option.
BGS Technology Limitations
User segmentation is still evolving, and the quality is increasing with each new version of the SDK.
Points to remember while evaluating quality:
- Avoid black objects on body that has similar color as background image. E.g. black shirt with black background
- High-intensity light on head can impact hair quality.
- Lying on the couch or bed can result in a poor user experience. A sitting position is better for video conferencing.
- Translucent or transparent objects like a drinking glass won’t work as expected.
- Hand webbing is an issue; expect quality to vary.
- Hair on forehead may have segmentation issues.
- Do not move hand or head very fast. Camera limitation impacts quality.
Providing Feedback to Intel on BGS Technology
How you can continue to making software better? The best way is to provide feedback. Running the scenarios under the similar environment can be difficult after developer want to re-test on new Intel RealSense SDK release.
To minimize run-to-run variance, it’s best to capture the input camera sequences that are used to replicate the issue to see whether the quality improves.
The Intel RealSense SDK is shipped with a sample application that can help to collect sequences to replay with new drops;
- Important for providing feedback on quality
- Not for performance analysis
In a default installation, the sample application is located at C:\Program Files (x86)\Intel\RSSDK\bin\win32\FF_3DSeg.cs.exe. Start the application and follow the steps shown in the screenshots below:
You will see yourself with the background removed.
If you select the Record mode, you can save a copy of your session. You can then open the FF_3DSeg.cs.exe application and select playback mode to see the recording.
Intel RealSense technology background segmentation middleware brings a new immersive experience to consumers. These new usages include changing background to the video or picture, or creating a selfie with segmented images.