Forgot your password?
typodupeerror
AI Input Devices Microsoft Games News

Kinect's AI Breakthrough Explained 97

Posted by Soulskill
from the expensive-hacker-toys dept.
mikejuk writes "Microsoft Research has just published a scientific paper (PDF) and a video showing how the Kinect body tracking algorithm works — it's almost as impressive as some of the uses the Kinect has been put to. This article summarizes how Kinect does it. Quoting: '... What the team did next was to train a type of classifier called a decision forest, i.e. a collection of decision trees. Each tree was trained on a set of features on depth images that were pre-labeled with the target body parts. That is, the decision trees were modified until they gave the correct classification for a particular body part across the test set of images. Training just three trees using 1 million test images took about a day using a 1000-core cluster.'"
This discussion has been archived. No new comments can be posted.

Kinect's AI Breakthrough Explained

Comments Filter:
  • by liquiddark (719647) on Saturday March 26, 2011 @06:11PM (#35625138)
    Layered classification nets have always struck me as the right approach, particularly as we learn more about how human senses work - it seems like a lot of our "thinking" is done much closer to our sense organce than we might have once imagined. Interesting that the less "organic" type, decision trees, were used rather than neural nets. One wonders if maybe it was more a matter of ease of phrasing/training/debugging than of classification itself that decided which type to use.
    • by hoytak (1148181) on Saturday March 26, 2011 @06:43PM (#35625328) Homepage

      Random forests have always been a nice classifier to use when working with really wacky data types. This is due in part to how easy it is to customize them; a lot of the ways they can be tweaked and tuned and customized have fairly intuitive effects on the outcome and behavior of the classifier. In my experience, while neural nets can also be pretty powerful, they are often much harder to work with as the parameters you have for tweaking can be really non-intuitive. We sometimes joke about neural nets being "black magic" because the training and tweaking can be really uninterpretable.

      However, the biggest reason random forests were used is probably because they are extremely fast on current chips, probably a couple orders of magnitude faster than neural nets when the trees are hard coded.

    • by Game_Ender (815505) on Saturday March 26, 2011 @06:52PM (#35625370)
      Yep, it's not exactly an AI break through but it's really cool to see a practical application of machine learning in the consumer arena.
      • by Twinbee (767046) on Saturday March 26, 2011 @10:12PM (#35626682) Homepage

        Yes, now all they need to do is fix the lag which can be quite high, maybe even 200ms:
        http://www.youtube.com/watch?v=weZOjotbuSU [youtube.com]

        Something really low like 16ms or better is needed so that we don't notice, according to this article:
        http://www.sussex.ac.uk/Users/km3/hfes.pdf [sussex.ac.uk]

        • by amorsen (7485)

          The youtube video doesn't really prove anything. The lag could just as easily be introduced by the TV or the game.

      • Yep, it's not exactly an AI break through but it's really cool to see a practical application of machine learning in the consumer arena.

        I suppose that's what the title "AI Breakthrough" means. Training decision trees in a random forest is not a breakthrough.

      • by hvm2hvm (1208954)
        Exactly... sometimes "good enough" is better than "it should work in theory but we don't have the required hardware/algorithmical/whatever capabilities yet". It probably won't work perfectly in some cases but for most applications it's great.

        I really think AI will be created in the same way. Once in a while a need appears for a AI related task and someone finds a "good enough solution". In time, someone will need a robot to have a serious conversation with and there will be enough knowledge lying around
    • Go search for Women Aspergers interview tony attwood.

      Keep listening until you get to the 'sixth sense' bit.

      You may not realize it, that doesn't mean that other people aren't 100% aware.. (e.g. I'm in the third person, it's pretty apparent that I don't make the spelling mistakes I just tell my body by pushing a command out to write some stuff, and it cocks up sometimes).

      It does similar in the other direction, with various levels of indirection... and I can also push things further down for some real number c

  • by Gulah (1983618)
    Smells like Neural Networks thinking ...
  • by Anonymous Coward on Saturday March 26, 2011 @06:27PM (#35625234)

    - "What do you do for a living?"

    - "I train trees to make a decision forest that can see human limbs."

    - "Ah, I see. Makes sense. (WHAT THE FUCK???)"

  • From the summary it looks like they are basically using a classifier which they spent a lot of time training, and it works well. This is impressive, but I don't know if it meets the story title's claim of "AI breakthrough", since from the summary it sounds basically like, "researchers used classifier for classifying data and it worked!" Can someone summarize in a little more detail exactly what the "breakthrough" entails, other than basically standard use of classifiers for training on data sets?

  • by Anonymous Coward

    So they fed an LCS with some sample data? OK, par-for-the-course. I'm far more interested in how they generated those '1 million' pre-labelled test images in the first place.

    • by hedwards (940851)

      The same way that cybercriminals crack captchas, they just offered up a picture of a random boob to a random boob. The real problem was stopping at 1m pictures.

    • > I'm far more interested in how they generated those '1 million'
      > pre-labelled test images in the first place.

      Snapshots from the webcams attached to computers running Windows.

    • by gmaslov (1983830) <gmaslov@bootis.org> on Saturday March 26, 2011 @09:20PM (#35626404) Homepage

      So they fed an LCS with some sample data? OK, par-for-the-course. I'm far more interested in how they generated those '1 million' pre-labelled test images in the first place.

      I read the paper; it was clever. They used a standard motion capture setup with their actor(s) going through several hundred different movements. Since their algorithm is stateless, they could analyze the motion and produce many distinct poses from each movement. Each pose was then "retargeted" (a well known technique in animation; example [snu.ac.kr]) onto many different 3D models of people of varying height, body type, etc., before finally being rendered into a perfectly labeled depth map.

      They went through several iterations of this process:

      1. Train their algorithm on this huge data set
      2. Notice that it doesn't work so well in some situations
      3. Have their mo-cap actor(s) produce additional data to cover those situations
      4. Process the new mo-cap data into however many thousands of additional training poses
      5. GOTO 10
      • What I find really interesting about this approach is that it's machine learning in a virtual environment.

        They essentially taught a game controller how to be a game controller by feeding it virtual players inside of a game.

        I suspect this is how we'll want to train all artificial intelligence agents. Why go through the trouble of building a robotic body for an AI to use when it can simply be provided a virtual world to live and grow in.

        I've also always been curious why more AI research doesn't take place i

  • by Anonymous Coward

    Ummm, all I've seen so far apart from this are pretty obvious uses of the depth sensor.

    What Microsoft has done is solved an extremely hard AI problem. Check out the body-part identification. I think more credit is due.

    • Re: (Score:2, Insightful)

      by Anonymous Coward

      Hum, no, actually, they just used a known for years technic of machine learning on a huge sample of data and it worked pretty well.
      From my point of view, there is no major breakthrough but still it's a nice solution.

  • Impressive. (Score:5, Funny)

    by Chocolate Teapot (639869) * on Saturday March 26, 2011 @06:59PM (#35625408) Journal

    Training just three trees using 1 million test images took about a day using a 1000-core cluster

    Trees have traditionally been trained in Entish, which although reliable, is such an un-hasty language.

    • Training just three trees using 1 million test images took about a day using a 1000-core cluster

      Trees have traditionally been trained in Entish, which although reliable, is such an un-hasty language.

      The 1K core cluster is mostly because it takes such a long time to say anything. Had they gone with a one core cluster, by days end, the system will have just managed to say 'good morning'. The end result is that it has accomplished nothing. Thankfully, with this system, they can complete this statement in a thousandth of the time, in other words, it reduced the startup time to 28.8 seconds.

    • 640K ought to be enough chlorophyll for anyone.

  • Very impressive (Score:1, Flamebait)

    by artor3 (1344997)

    A lot of the MS-haters on Slashdot tried to write off the Kinect as a nice bit of third-party hardware with a crappy MS-made driver. I wonder how they'll respond to this. Microsoft has really outdone themselves here. I think Penny Arcade [penny-arcade.com] put it best. If only they could apply this sort of innovation to their more important products, they'd be back on top in no time.

    • Well, I really like my Mac and I think this is cool. So, stuff that data point in your decision forest. :-)

    • Yep - this is certainly a very impressive product. MIcrosoft have an absolutely world-class reseach lab/staff, but it seems rare so far for their work to make it into products (same as was the case with Xerox PARC).

    • by Clsid (564627)
      Yeah, they make nice products when they face competition, there is no doubt about it. But even then, some of the commercial practices are questionable and that's where most of the hate comes from. For instance, you buy an XBox360 and a PS3. In the XBox you have to pay a monthly fee to play online games where as in the PS3 is completely free. If Microsoft is the only player in town in that particular case then we would be in a world of hurt. Luckily, having options pushes Microsoft to do the right thing, eve
    • The Penny Arcade strip was actually a send up of all the ridiculous hype surrounding the device. It can't actually restore sight to... you know what, never mind. Yeah, the Kinect is the digital manifestation of the second coming. It is the apex of technological development for the human race.

  • by Anonymous Coward

    I haven't thoroughly read the paper yet, but calling this an AI breakthrough is inappropriate for a number of reasons. First, this is an application of machine learning, which is not the same thing as AI. Second, it seems to be a fairly incremental work building on very common techniques--very far from a breakthrough in any respect. If you don't believe me, see some of Jamie Shotton's other work [shotton.org], which is good work, but this is nothing extraordinary in comparison.

    • by Jeremi (14640)

      First, this is an application of machine learning, which is not the same thing as AI.

      That's the beauty and mystery of AI -- once a technique is actually made to work on computers in the real world, it loses its status as an "AI technique". The AI goalposts automatically move ahead to some other, harder problem that isn't solved yet. Eventually we will have HAL-9000 style computers everywhere, and people will continually piss them off by telling them the reasons they don't count as "real AI".

      • by Anonymous Coward

        Nah. twice, three times, max. then the people will start dying.

  • So...it can't see the forest for the limbs?
  • Neural Network / perceptrons.
  • "[..] the decision trees were modified until they gave the correct classification for a particular body part across the test set of images"

    this is called cheating in machine learning (you are not allowed to modify your model(s) based on the results on the test set).
    and of course it is not what they do.

    nice piece work, tho IMHO not AI breakthrough.

  • Being possessed of an enormously large penis, I am unable to use Kinect as it keeps detecting it as a third leg!

  • The method they are using s called as haar cascades postulated by viola jones. I have used the same with opencv for a bit now. http://en.wikipedia.org/wiki/Haar-like_features [wikipedia.org] It's basically passing An image thru progressive classifiers to get a final weight of match. Microsoft may have done the training for generating the classifiers but the method has been around for a bit. "Decision tree".... Pfffft.
  • Why can't MS do stuff like this in all their departments? Are there not enough smart people to go around? You get truly cool things like this, juxtaposed with
    lame "us too!" attempts like WP7 and Bing.

    • "Us too!"?

      Well it's hard for them to do stuff like this in all departments when you don't acknowledge all the other times that they offer innovative or superior products.

      WP7 is in my opinion a far better thought out operating system from a user standpoint than any of the alternatives. So if by "Me Too!" you mean they released a great rewrite of their product which has been on the market longer than either Android or iOS then yes they too continued innovating. WinMo go sucky but when it was released it wa

Receiving a million dollars tax free will make you feel better than being flat broke and having a stomach ache. -- Dolph Sharp, "I'm O.K., You're Not So Hot"

Working...