Forgot your password?
AI Input Devices Microsoft Games News

Kinect's AI Breakthrough Explained 97

Posted by Soulskill
from the expensive-hacker-toys dept.
mikejuk writes "Microsoft Research has just published a scientific paper (PDF) and a video showing how the Kinect body tracking algorithm works — it's almost as impressive as some of the uses the Kinect has been put to. This article summarizes how Kinect does it. Quoting: '... What the team did next was to train a type of classifier called a decision forest, i.e. a collection of decision trees. Each tree was trained on a set of features on depth images that were pre-labeled with the target body parts. That is, the decision trees were modified until they gave the correct classification for a particular body part across the test set of images. Training just three trees using 1 million test images took about a day using a 1000-core cluster.'"
This discussion has been archived. No new comments can be posted.

Kinect's AI Breakthrough Explained

Comments Filter:
  • by symes (835608) on Saturday March 26, 2011 @05:19PM (#35625190) Journal

    I don't think so this time. This is a reasonably well written formal paper sent for peer review. It is also quite nice to see this published openly.

  • by mikael (484) on Saturday March 26, 2011 @07:30PM (#35626096)

    The function: f=d(x+u/d(x))-d(x+v/d(x)) would calculate the depth gradient of the pixel. It's possible to reconstruct a three dimensional shape from a 2D image [].

    Then your problem is trying to match a human skeleton to the shape. If you know the curvature of the gradient at a particular point, you can eliminate some body parts. A head is mostly spherical and within a particular maximum/minimum, limbs and the torso are more cylindrical with a linear depth along one axis. Look for that linearity, and you could determine that is a limb and what direction it is aligned in.

  • by gmaslov (1983830) <> on Saturday March 26, 2011 @08:20PM (#35626404) Homepage

    So they fed an LCS with some sample data? OK, par-for-the-course. I'm far more interested in how they generated those '1 million' pre-labelled test images in the first place.

    I read the paper; it was clever. They used a standard motion capture setup with their actor(s) going through several hundred different movements. Since their algorithm is stateless, they could analyze the motion and produce many distinct poses from each movement. Each pose was then "retargeted" (a well known technique in animation; example []) onto many different 3D models of people of varying height, body type, etc., before finally being rendered into a perfectly labeled depth map.

    They went through several iterations of this process:

    1. Train their algorithm on this huge data set
    2. Notice that it doesn't work so well in some situations
    3. Have their mo-cap actor(s) produce additional data to cover those situations
    4. Process the new mo-cap data into however many thousands of additional training poses
    5. GOTO 10

Those who do not understand Unix are condemned to reinvent it, poorly. - Henry Spencer, University of Toronto Unix hack