[ KINECTSDK] Use Kinect as a green screen (III) Source Code: A Comprehensive Overview

guljessi
Aug 20, 2023
8 min read

An exciting new feature called the background removal API has been added to the Kinect For Windows SDK 1.8 which was released last week. Background removal or green screening is a feature many people have been using the Kinect for to varying degrees of success. Prior to the background removal API, getting a decent green screen effect out of the Kinect required a lot of heavy lifting and creative thinking because the depth data from the Kinect is too noisy to use for a smooth player mask. A photo taken using the default kinect depth data with no blur or special techniques.The official Kinect SDK 1.8 Background Removal API is a great step forward for developers as it allows obtaining a great green screen effect with minimal work. However there are a few important restrictions in place which make it unusable right now for a multi-user photo experience.The initial Background Removal API requires successful skeleton tracking to work.
The initial Background Removal API only allows the effect to be preformed on one tracked player at a time. Update: 10/3/2013 - Joshua Blake writes in from Twitter, "By the way, you can do multiple people background removal in 1.8. You just need to create an instance per tracked skeleton.". Thanks Joshua.
Those criticism's aside, it would be awesome if Microsoft were able to decouple the Background Removal API from skeleton data or add a mode where you can specify depth away from the camera as a threshold instead of detected skeletons. A photo taken using the Kinect SDK 1.8 Background Removal API.This summer for the Kinect Green Screen Photo kiosk I had at Maker Faire Detroit, I invested about a month of time figuring out how to get a good mask out of the Kinect in real time. The approach I used was to ignore skeleton data, and only use depth data, then run that raw data through EMGUCV(.Net OpenCV) and do blob detection, take the detected blobs, and run them through a point by point averaging algorithm based on work done in openFrameworks. I also used a simple shader based blur effect available in Windows Presentation Foundation as it proved way faster than any other implementations I tested as well as writing my own box blur or gaussian blur. While the results I came up with are not as good as the official SDK implementation they are pretty close and don't require detected skeletons. (The project is open source, so you can check out my implementation here: _Kinect_GreenScreen_PhotoKiosk) Not requiring skeleton detection is a HUGE factor when groups of users are posing for photos. A photo taken using custom techniques and EMGU CV.How I think Microsoft's Background Removal API WorksThe unfortunate part of the Microsoft Background Removal API is that it is closed source; so people like me who have been working on the same problem are interested to know exactly what is going on at a low level. I wish they would release a paper on the technique being used. Based on my own work there are a few things I am going to guess Microsoft is doing.I am pretty sure they are using some sort of frame averaging of the depth data. During my own work I found this concept presented by Karl Sanford in his early work smoothing depth data. In my testing, averaging the depth data was too slow of a process in managed C# code and the results were not very good for creating smooth masks that fit the contour of subjects; so I threw out this technique. The tell in the 1.8 SDK that this is happening is when you wave your hands around or move fast, you can see some lag in the mask as it follows you; which could either be from frame averaging or intentional slow down of processing to increase end user performance.
I am confident that they are using external computer vision libraries that may have some licensing restrictions preventing them from being included in the official Kinect SDK DLL's. You can tell this because the background removal API is oddly contained in a 32 and 64 bit separate DLL from the main Kinect SDK dll. If you use EMGU CV, it requires individual 64 bit and 32bit dlls because it reaches into low level unmanaged code for much of the functionality it provides. I'm willing to be that the background removal API takes advantage of unmanaged code to speed up real time processing.
The requirement of including the skeleton data may only be used to know where to place the mask and may have no benefit on Microsofts background removal. In my own implementation the skeleton data is useless because it does not provide any detailed contextual data to help hug the contours of each humans body. It only provides stick figure like data.
I am guessing that the official 1.8 SDK background removal is limited to one player due to performance barriers. I don't know what the internal allowed CPU usage is for the Kinect SDK, but I would imagine Microsoft attempts to keep it low enough that developers do not have a problem integrating the SDK into end applications. Background removal requires frame by frame processing which is very processor intensive.
Microsoft spent time massaging the depth data provided by human hair and the head region. In the default depth data from the Kinect, human hair is always an issue causing the mask to really degrade some times on a persons head. This makes sense because the Kinect depth data is provided by an infrared grid that only has a resolution of 320x240.
There is some sort of blur going on, but it doesn't look like a conventional blur. Take a look at the edges of the photo from the background removal api. They have weird gaps in the pixels almost like some of the data is being removed for faster processing, or a better attempt at soft refinement of the edge of the mask.

I hope this article has provided some insight into background removal not seen elsewhere. I am looking forward to the next version of the Kinect hardware and appreciate Microsoft's continued commitment to the Kinect for Windows SDK.

Goals: Learn how to align color and depth images to get a colored point cloud. Source: View Source Download: 3_PointCloud.zip Overview There are several new steps we want to take in this tutorial. The most interesting part is that now we're working with 3D data! Creating an interactive system is a bit too much code for us, though, so we just have a simple rotating point cloud. This tutorial has three parts: first, we'll talk briefly about why point clouds are harder than you might think. Then, we'll show the Kinect SDK side of how to get the right data. Finally, we'll show some OpenGL tricks to make things easy to display. Contents Depth and RGB Coordinate Systems
Kinect Code
OpenGL Display
Putting it all together
Depth and RGB Coordinate Systems Kinect Coordinate System The Kinect uses a cartesian coordinate system centered at the Kinect. The positive Y axis points up, the positive Z axis points where the Kinect is pointing, and the positive X axis is to the left. Alignment A naive way of making a point cloud might directly overlap the depth and color images, so that depth pixel (x,y) goes with image pixel (x,y). However, this would give you a poor quality depth map, where the borders of objects don't line up with the colors. This occurs because the RGB camera and the depth camera are located at different spots on the Kinect; obviously, then, they aren't seeing the same things! Normally, we'd have to do some kind of alignment of the two cameras (the formal term is registration) to be able to map from one coordinate space to the other. Fortunately Microsoft has already done this for us, so all we need to do is call the right functions. Direct overlap of RGB and DepthRegistered RGB and Depth Note that computer vision and robotics researchers don't like the quality of the built-in registration, so they often do it manually using something like OpenCV. Kinect Code A lot of this is just combining the code from the first two tutorials. Kinect Initialization There's nothing new in initialization. We simply need two image streams, one for depth and one for color. HANDLE rgbStream;HANDLE depthStream;INuiSensor* sensor;bool initKinect() Getting depth data from the Kinect The Kinect SDK provides a function that tells you which pixel in the RGB image corresponds with a particular point in the depth image. We'll store this information in another global array, depthToRgbMap. In particular, we store the column and row (i.e. x and y coordinate) of the color pixel in order for each depth pixel. Now that we're dealing with 3D data, we want to imagine the depth frame as a bunch of points in space rather than a 640x480 image. So in our getDepthData function, we will fill in our buffer with the coordinates of each point (instead of the depth at each pixel). This means the buffer we pass into it has to have size width*height*3*sizeof(float) for float typed coordinates. // Global Variableslong depthToRgbMap[width*height*2];// ...void getDepthData(GLubyte* dest) {// ... const USHORT* curr = (const USHORT*) LockedRect.pBits; float* fdest = (float*) dest; long* depth2rgb = (long*) depthToRgbMap; for (int j = 0; j Vector4 is Microsoft's 3D point type in homogeneous coordinates. If your linear algebra is rusty, don't worry about homogeneous coordinates - just treat it as a 3D point with x,y,z coordinates. A short explanation can be found at -for-homogeneous-4d-coordinates.html
NuiTransformDepthImageToSkeleton gives you the 3D coordinates of a particular depth pixel. This is in the Kinect-based coordinate system as described above. There is also a version of this function that takes an additional resolution argument.
NuiImageGetColorPixelCoordinatesFromDepthPixelAtResolution is takes the depth pixel (row, column, and depth in the depth image) and gives the row and column of the pixel in the color image. Here's the API reference page
. Getting color data from the Kinect Now that we are thinking about things in terms of points instead of rectangular grids, we want our color output to be associated with a particular depth point. In particular, the input to our getRgbData function, analogously to the getDepthData function, wants a buffer of size width*height*3*sizeof(float) to hold the red, green, and blue values for each point in our point cloud. void getRgbData(GLubyte* dest) {// ... const BYTE* start = (const BYTE*) LockedRect.pBits; float* fdest = (float*) dest; long* depth2rgb = (long*) depthToRgbMap; for (int j = 0; j height) for (int n = 0; n Download and unzip the GLEW binaries from
Copy the contents of the Include/ and Lib/ directories you just unzipped into the appropriate Windows SDK directories. e.g. C:/Program Files/Microsoft SDKs/Windows/v7.0A/Include/ and C:/Program Files/Microsoft SDKs/Windows/v7.0A/Lib/ for Visual Studio 2010
C:/Program Files/Windows Kits (x86)/8.1/Include/um/ and C:/Program Files (x86)/Windows Kits/8.1/Lib/winv6.3/um/ for Visual Studio 2012+
Copy bin/x64/glew32.dll into C:/Windows/System32 and bin/x86/glew32.dll into C:/Windows/SysWOW64. If you have a 32-bit system, just move bin/x86/glew32.dll into C:/Windows/System32

Add glew32.lib to the SDL or OpenGL property sheet's Linker > Input > Additional Dependencies. OpenGL Code Since we're dealing with 3D data, we now also have to worry about camera settings. We use a gluPerspective and gluLookAt to deal with that for us. // Global variables:GLuint vboId; // Vertex buffer IDGLuint cboId; // Color buffer ID// ... // OpenGL setup glClearColor(0,0,0,0); glClearDepth(1.0f); // Set up array buffers const int dataSize = width*height * 3 * 4; glGenBuffers(1, &vboId); glBindBuffer(GL_ARRAY_BUFFER, vboId); glBufferData(GL_ARRAY_BUFFER, dataSize, 0, GL_DYNAMIC_DRAW); glGenBuffers(1, &cboId); glBindBuffer(GL_ARRAY_BUFFER, cboId); glBufferData(GL_ARRAY_BUFFER, dataSize, 0, GL_DYNAMIC_DRAW); // Camera setup glViewport(0, 0, width, height); glMatrixMode(GL_PROJECTION); glLoadIdentity(); gluPerspective(45, width /(GLdouble) height, 0.1, 1000); glMatrixMode(GL_MODELVIEW); glLoadIdentity(); gluLookAt(0,0,0,0,0,1,0,1,0); For display purposes, rather than having a fully interactive setup we just have a rotating camera that rotates around the point 3 meters in front of the Kinect. See the code for details. Putting it all together We wrote those nice functions getDepthData and getRgbData, but how do we use them? What we do is allocate some memory on the GPU and then use our functions to copy our point cloud data there. void getKinectData() const int dataSize = width*height*3*sizeof(float); GLubyte* ptr; glBindBuffer(GL_ARRAY_BUFFER, vboId); ptr = (GLubyte*) glMapBuffer(GL_ARRAY_BUFFER, GL_WRITE_ONLY); if (ptr) getDepthData(ptr); glUnmapBuffer(GL_ARRAY_BUFFER); glBindBuffer(GL_ARRAY_BUFFER, cboId); ptr = (GLubyte*) glMapBuffer(GL_ARRAY_BUFFER, GL_WRITE_ONLY); if (ptr) getRgbData(ptr); glUnmapBuffer(GL_ARRAY_BUFFER); Now we want to use the glDrawArrays function to draw our point cloud. void drawKinectData() GL_DEPTH_BUFFER_BIT); glEnableClientState(GL_VERTEX_ARRAY); glEnableClientState(GL_COLOR_ARRAY); glBindBuffer(GL_ARRAY_BUFFER, vboId); glVertexPointer(3, GL_FLOAT, 0, NULL); glBindBuffer(GL_ARRAY_BUFFER, cboId); glColorPointer(3, GL_FLOAT, 0, NULL); glPointSize(1.f); glDrawArrays(GL_POINTS, 0, width*height); glDisableClientState(GL_VERTEX_ARRAY); glDisableClientState(GL_COLOR_ARRAY); Note that we could just as well replace all the array buffer code with // Global Variablesfloat colorarray[width*height*3];float vertexarray[width*height*3];//...void getKinectData() getDepthData((*GLubyte*) vertexarray); getRgbData((GLubyte*) colorarray);void drawKinectData() getKinectData(); rotateCamera(); glBegin(GL_POINTS); for (int i = 0; i Previous: Depth Images Next: Skeletal Data

[ KINECTSDK] HowTo: Use Kinect as a green screen (III) Source Code

Download

2ff7e9595c

[ KINECTSDK] Use Kinect as a green screen (III) Source Code: A Comprehensive Overview

[ KINECTSDK] HowTo: Use Kinect as a green screen (III) Source Code

Recent Posts

Komentar