Kinect's AI Breakthrough Explained 97

Posted by Soulskill on Saturday March 26, 2011 @05:58PM from the expensive-hacker-toys dept.

mikejuk writes "Microsoft Research has just published a scientific paper (PDF) and a video showing how the Kinect body tracking algorithm works — it's almost as impressive as some of the uses the Kinect has been put to. This article summarizes how Kinect does it. Quoting: '... What the team did next was to train a type of classifier called a decision forest, i.e. a collection of decision trees. Each tree was trained on a set of features on depth images that were pre-labeled with the target body parts. That is, the decision trees were modified until they gave the correct classification for a particular body part across the test set of images. Training just three trees using 1 million test images took about a day using a 1000-core cluster.'"

Kinect's AI Breakthrough Explained

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 97 Comments Log In/Create an Account

Comments Filter:

- Re: (Score:2)
  
  by Luckyo ( 1726890 ) writes:
  
  Any decent large data center will be happy to rent you one for a price?
  - Re: (Score:2)
    
    by davester666 ( 731373 ) writes:
    
    Why would MS rent/buy processor time? They've got the world's biggest botnet, and they even have the suckers pay MS to join it.
- Re: (Score:2)
  
  by metalmaster ( 1005171 ) writes:
  
  Wouldnt you still ned software capable of using all of the resources?
- Re: (Score:2)
  
  by woolpert ( 1442969 ) writes:
  
  Forget the 1000-core cluster. I want to know where I can get 1,000,000 images of people with all the (major) body parts zoned and referenced.
  That's an impressive test corpus.
  - Re: (Score:1)
    
    by lrnj ( 1986582 ) writes:
    
    I would assume they just used an established motion tracking system in parallel with the Kinect sensor input.
    At 30 fps, that's about 10 hours of input.
- Re:More advertising masquerading as news (Score:5, Informative)
  
  by symes ( 835608 ) writes: on Saturday March 26, 2011 @06:19PM (#35625190) Journal
  
  I don't think so this time. This is a reasonably well written formal paper sent for peer review. It is also quite nice to see this published openly.
  
  - Re: (Score:3)
    
    by Raenex ( 947668 ) writes:
    
    It is also quite nice to see this published openly.
    And no doubt backed up by a dozen patents.
    - Re:More advertising masquerading as news (Score:4, Insightful)
      
      by Jeremi ( 14640 ) writes: on Saturday March 26, 2011 @11:15PM (#35626968) Homepage
      
      And no doubt backed up by a dozen patents.
      Of course. That's the purpose of patents, to encourage inventors to publish their inventions openly.
      
      - Re: (Score:2)
        
        by Raenex ( 947668 ) writes:
        
        I'd rather they kept their secrets and let somebody else figure it out than be granted a monopoly on an idea.
        
        Re: (Score:2)
        
        by X0563511 ( 793323 ) writes:
        
        Ah, so you want shorter patent terms and non-ridiculous licensing costs.
        Yell at the government regarding the former, and yell at the sellers regarding the latter.
Sounds like vision, all right (Score:5, Interesting)

by liquiddark ( 719647 ) writes: on Saturday March 26, 2011 @06:11PM (#35625138)

Layered classification nets have always struck me as the right approach, particularly as we learn more about how human senses work - it seems like a lot of our "thinking" is done much closer to our sense organce than we might have once imagined. Interesting that the less "organic" type, decision trees, were used rather than neural nets. One wonders if maybe it was more a matter of ease of phrasing/training/debugging than of classification itself that decided which type to use.

- Re:Sounds like vision, all right (Score:5, Insightful)
  
  by hoytak ( 1148181 ) writes: on Saturday March 26, 2011 @06:43PM (#35625328) Homepage
  
  Random forests have always been a nice classifier to use when working with really wacky data types. This is due in part to how easy it is to customize them; a lot of the ways they can be tweaked and tuned and customized have fairly intuitive effects on the outcome and behavior of the classifier. In my experience, while neural nets can also be pretty powerful, they are often much harder to work with as the parameters you have for tweaking can be really non-intuitive. We sometimes joke about neural nets being "black magic" because the training and tweaking can be really uninterpretable.
  However, the biggest reason random forests were used is probably because they are extremely fast on current chips, probably a couple orders of magnitude faster than neural nets when the trees are hard coded.
  
- Re:Sounds like vision, all right (Score:5, Interesting)
  
  by Game_Ender ( 815505 ) writes: on Saturday March 26, 2011 @06:52PM (#35625370)
  
  Yep, it's not exactly an AI break through but it's really cool to see a practical application of machine learning in the consumer arena.
  
  - Re:Sounds like vision, all right (Score:5, Interesting)
    
    by Twinbee ( 767046 ) writes: on Saturday March 26, 2011 @10:12PM (#35626682)
    
    Yes, now all they need to do is fix the lag which can be quite high, maybe even 200ms:
    http://www.youtube.com/watch?v=weZOjotbuSU [youtube.com]
    Something really low like 16ms or better is needed so that we don't notice, according to this article:
    http://www.sussex.ac.uk/Users/km3/hfes.pdf [sussex.ac.uk]
    
    - Re: (Score:2)
      
      by amorsen ( 7485 ) writes:
      
      The youtube video doesn't really prove anything. The lag could just as easily be introduced by the TV or the game.
  - Re: (Score:2)
    
    by dominious ( 1077089 ) writes:
    
    Yep, it's not exactly an AI break through but it's really cool to see a practical application of machine learning in the consumer arena.
    I suppose that's what the title "AI Breakthrough" means. Training decision trees in a random forest is not a breakthrough.
  - Re: (Score:3)
    
    by hvm2hvm ( 1208954 ) writes:
    
    Exactly... sometimes "good enough" is better than "it should work in theory but we don't have the required hardware/algorithmical/whatever capabilities yet". It probably won't work perfectly in some cases but for most applications it's great.
    
    I really think AI will be created in the same way. Once in a while a need appears for a AI related task and someone finds a "good enough solution". In time, someone will need a robot to have a serious conversation with and there will be enough knowledge lying around
- Re: (Score:1)
  
  by oliverthered ( 187439 ) writes:
  
  Go search for Women Aspergers interview tony attwood.
  Keep listening until you get to the 'sixth sense' bit.
  You may not realize it, that doesn't mean that other people aren't 100% aware.. (e.g. I'm in the third person, it's pretty apparent that I don't make the spelling mistakes I just tell my body by pushing a command out to write some stuff, and it cocks up sometimes).
  It does similar in the other direction, with various levels of indirection... and I can also push things further down for some real number c
ANN? (Score:1)

by Gulah ( 1983618 ) writes:

Smells like Neural Networks thinking ...
Strange Descriptions... (Score:5, Funny)

by Anonymous Coward writes: on Saturday March 26, 2011 @06:27PM (#35625234)

- "What do you do for a living?"
- "I train trees to make a decision forest that can see human limbs."
- "Ah, I see. Makes sense. (WHAT THE FUCK???)"

- Re: (Score:1)
  
  by Beelzebud ( 1361137 ) writes:
  
  LOL Someone with points mod this up!
- Re: (Score:1)
  
  by Slutticus ( 1237534 ) writes:
  
  Sounds like an upcoming xkcd strip.
- Re: (Score:1)
  
  by Sal Zeta ( 929250 ) writes:
  
  -"Oh! So, you're the one who writes lyrics for Radiohead, then."
Need a more descriptive summary (Score:2)

by radarsat1 ( 786772 ) writes:

From the summary it looks like they are basically using a classifier which they spent a lot of time training, and it works well. This is impressive, but I don't know if it meets the story title's claim of "AI breakthrough", since from the summary it sounds basically like, "researchers used classifier for classifying data and it worked!" Can someone summarize in a little more detail exactly what the "breakthrough" entails, other than basically standard use of classifiers for training on data sets?
- - Re: (Score:2)
    
    by narcc ( 412956 ) writes:
    
    TFA says "it is all based on fairly standard classical pattern recognition"
    I'm a science reporter. I just want to clarify your above statement -- Are you saying that this is an unprecedented breakthrough in artificial intelligence research that will lead to "thinking machines" in the next year?
  - Re: (Score:2, Informative)
    
    by mikael ( 484 ) writes:
    
    The function: f=d(x+u/d(x))-d(x+v/d(x)) would calculate the depth gradient of the pixel. It's possible to reconstruct a three dimensional shape from a 2D image [weizmann.ac.il].
    Then your problem is trying to match a human skeleton to the shape. If you know the curvature of the gradient at a particular point, you can eliminate some body parts. A head is mostly spherical and within a particular maximum/minimum, limbs and the torso are more cylindrical with a linear depth along one axis. Look for that linearity, and you cou
    - Re: (Score:3)
      
      by marcansoft ( 727665 ) writes:
      
      This has nothing to do with reconstructing a depth image from a 2D image. The Kinect is a depth camera and already gives you a real depth image (not a guess).
Focussing on the normal bit (Score:1)

by Anonymous Coward writes:

So they fed an LCS with some sample data? OK, par-for-the-course. I'm far more interested in how they generated those '1 million' pre-labelled test images in the first place.
- Re: (Score:3)
  
  by hedwards ( 940851 ) writes:
  
  The same way that cybercriminals crack captchas, they just offered up a picture of a random boob to a random boob. The real problem was stopping at 1m pictures.
- Re: (Score:3)
  
  by multipartmixed ( 163409 ) writes:
  
  > I'm far more interested in how they generated those '1 million'
  > pre-labelled test images in the first place.
  Snapshots from the webcams attached to computers running Windows.
- Re:Focussing on the normal bit (Score:4, Informative)
  
  by gmaslov ( 1983830 ) writes: <gmaslov@bootis.org> on Saturday March 26, 2011 @09:20PM (#35626404) Homepage
  So they fed an LCS with some sample data? OK, par-for-the-course. I'm far more interested in how they generated those '1 million' pre-labelled test images in the first place.
  I read the paper; it was clever. They used a standard motion capture setup with their actor(s) going through several hundred different movements. Since their algorithm is stateless, they could analyze the motion and produce many distinct poses from each movement. Each pose was then "retargeted" (a well known technique in animation; example [snu.ac.kr]) onto many different 3D models of people of varying height, body type, etc., before finally being rendered into a perfectly labeled depth map.
  They went through several iterations of this process:
  
  Train their algorithm on this huge data set
  Notice that it doesn't work so well in some situations
  Have their mo-cap actor(s) produce additional data to cover those situations
  Process the new mo-cap data into however many thousands of additional training poses
  GOTO 10
  - Re: (Score:2)
    
    by im_thatoneguy ( 819432 ) writes:
    
    What I find really interesting about this approach is that it's machine learning in a virtual environment.
    They essentially taught a game controller how to be a game controller by feeding it virtual players inside of a game.
    I suspect this is how we'll want to train all artificial intelligence agents. Why go through the trouble of building a robotic body for an AI to use when it can simply be provided a virtual world to live and grow in.
    I've also always been curious why more AI research doesn't take place i
"Almost as impressive"? (Score:1)

by Anonymous Coward writes:

Ummm, all I've seen so far apart from this are pretty obvious uses of the depth sensor.
What Microsoft has done is solved an extremely hard AI problem. Check out the body-part identification. I think more credit is due.
- Re: (Score:2, Insightful)
  
  by Anonymous Coward writes:
  
  Hum, no, actually, they just used a known for years technic of machine learning on a huge sample of data and it worked pretty well.
  From my point of view, there is no major breakthrough but still it's a nice solution.
Impressive. (Score:5, Funny)

by Chocolate Teapot ( 639869 ) * writes: on Saturday March 26, 2011 @06:59PM (#35625408) Homepage Journal

Training just three trees using 1 million test images took about a day using a 1000-core cluster
Trees have traditionally been trained in Entish, which although reliable, is such an un-hasty language.

- Re: (Score:1)
  
  by feedayeen ( 1322473 ) writes:
  
  Training just three trees using 1 million test images took about a day using a 1000-core cluster
  Trees have traditionally been trained in Entish, which although reliable, is such an un-hasty language.
  The 1K core cluster is mostly because it takes such a long time to say anything. Had they gone with a one core cluster, by days end, the system will have just managed to say 'good morning'. The end result is that it has accomplished nothing. Thankfully, with this system, they can complete this statement in a thousandth of the time, in other words, it reduced the startup time to 28.8 seconds.
- Re: (Score:1)
  
  by TheoMurpse ( 729043 ) writes:
  
  640K ought to be enough chlorophyll for anyone.
Very impressive (Score:1, Flamebait)

by artor3 ( 1344997 ) writes:

A lot of the MS-haters on Slashdot tried to write off the Kinect as a nice bit of third-party hardware with a crappy MS-made driver. I wonder how they'll respond to this. Microsoft has really outdone themselves here. I think Penny Arcade [penny-arcade.com] put it best. If only they could apply this sort of innovation to their more important products, they'd be back on top in no time.
- - Re: (Score:1)
    
    by Dr Max ( 1696200 ) writes:
    
    I for one welcome the day i no longer have to push analogue sticks around or furiously slide a mouse around a desktop to take out a guy in the latest fps. Give me a good head and gun tracking system and maybe a heads up display any time. I agree we aren't at nirvana yet but it wont be long before your controlling your rts armies with hand signs from the heavens.
  - Re: (Score:2)
    
    by drinkypoo ( 153816 ) writes:
    
    As for the device and things like that in general, just like Eyetoy, these will never, ever replace a controller. Ever.
    I think it's fairly clear that the future involves both approaches, sometimes both in one game. Keeping gamepad support anywhere it is possible to do so keeps the game accessible for as many people as possible, e.g. the disabled. But I really enjoy the fact that the Wii gets me moving around. I imagine I'd enjoy the same thing about Kinect (but my 360's optical drive died and I have been extremely lazy about replacing it. I have all the pieces...)
- Re: (Score:2)
  
  by hoytak ( 1148181 ) writes:
  
  More important products.... yes....
  http://www.telegraph.co.uk/technology/microsoft/8287610/Xbox-Kinect-helps-Microsoft-beat-Wall-Street-profit-forecasts.html [telegraph.co.uk]
- Re: (Score:2)
  
  by Concerned Onlooker ( 473481 ) writes:
  
  Well, I really like my Mac and I think this is cool. So, stuff that data point in your decision forest. :-)
- - Re: (Score:2)
    
    by wjsteele ( 255130 ) writes:
    
    "IBM's BOCA RATON : created the first PC"
    
    And here all this time I thought the Apple computer was out before the IBM... silly me.
    
    Bill
- Re: (Score:2)
  
  by SpinyNorman ( 33776 ) writes:
  
  Yep - this is certainly a very impressive product. MIcrosoft have an absolutely world-class reseach lab/staff, but it seems rare so far for their work to make it into products (same as was the case with Xerox PARC).
- Re: (Score:2)
  
  by Clsid ( 564627 ) writes:
  
  Yeah, they make nice products when they face competition, there is no doubt about it. But even then, some of the commercial practices are questionable and that's where most of the hate comes from. For instance, you buy an XBox360 and a PS3. In the XBox you have to pay a monthly fee to play online games where as in the PS3 is completely free. If Microsoft is the only player in town in that particular case then we would be in a world of hurt. Luckily, having options pushes Microsoft to do the right thing, eve
- Re: (Score:2)
  
  by ObsessiveMathsFreak ( 773371 ) writes:
  
  The Penny Arcade strip was actually a send up of all the ridiculous hype surrounding the device. It can't actually restore sight to... you know what, never mind. Yeah, the Kinect is the digital manifestation of the second coming. It is the apex of technological development for the human race.
Summary hyperbole (Score:1)

by Anonymous Coward writes:

I haven't thoroughly read the paper yet, but calling this an AI breakthrough is inappropriate for a number of reasons. First, this is an application of machine learning, which is not the same thing as AI. Second, it seems to be a fairly incremental work building on very common techniques--very far from a breakthrough in any respect. If you don't believe me, see some of Jamie Shotton's other work [shotton.org], which is good work, but this is nothing extraordinary in comparison.
- - Re: (Score:1)
    
    by Needlzor ( 1197267 ) writes:
    
    As Abe Othman and Ariel Procaccia said: "AI is whatever gets published at AAAI/IJCAI". Best definition of AI yet.
- Re: (Score:2)
  
  by Jeremi ( 14640 ) writes:
  
  First, this is an application of machine learning, which is not the same thing as AI.
  That's the beauty and mystery of AI -- once a technique is actually made to work on computers in the real world, it loses its status as an "AI technique". The AI goalposts automatically move ahead to some other, harder problem that isn't solved yet. Eventually we will have HAL-9000 style computers everywhere, and people will continually piss them off by telling them the reasons they don't count as "real AI".
  - Re: (Score:1)
    
    by Anonymous Coward writes:
    
    Nah. twice, three times, max. then the people will start dying.
- Re: (Score:2)
  
  by shriphani ( 1174497 ) writes:
  
  The sensor came from primasense. The algorithms in it are entirely from MSR.
  - Re: (Score:1)
    
    by citizenr ( 871508 ) writes:
    
    The sensor came from primasense. The algorithms in it are entirely from MSR.
    Why can you download SDK with this software from Primasense directly, but not from Microsoft if the algorithms are M$ property?
    Looks like M$ is just appropriating third party research.
    - Re: (Score:1)
      
      by shriphani ( 1174497 ) writes:
      
      Sorry I misspelled the name there. The company is PrimeSense. Here's where I see the paper beating the OpenNI SDK - 200 fps on consumer grade hardware. This is just what the paper claims it is - a simple machine learning technique that when applied correctly produced very good results and allowed them to launch a highly successful peripheral.
      Looks like M$ is just appropriating third party research.
      Splendid. Primesense are not complaining about this paper but you accuse MSR of stealing work?
- Re: (Score:2)
  
  by marcansoft ( 727665 ) writes:
  
  PrimeSense developed the sensor technology (hardware and firmware) that gives you a depth image. Microsoft took that depth image and created the algorithms that perform body tracking (software).
  PrimeSense also have their own body tracking solution (they call it NITE), but it's based on an entirely difference concept and requires a calibration pose to "lock in" initially. Microsoft doesn't use NITE.
Kinect's Perspective (Score:1)

by AnotherAnonymousUser ( 972204 ) writes:

So...it can't see the forest for the limbs?
Summary in a few words (Score:1)

by elsJake ( 1129889 ) writes:

Neural Network / perceptrons.
TFA makes it sound like they're cheating (Score:2)

by L4z4ru5 ( 1705054 ) writes:

"[..] the decision trees were modified until they gave the correct classification for a particular body part across the test set of images"
this is called cheating in machine learning (you are not allowed to modify your model(s) based on the results on the test set).
and of course it is not what they do.
nice piece work, tho IMHO not AI breakthrough.
- Re: (Score:2)
  
  by flyingkillerrobots ( 1865630 ) writes:
  
  That's what a tuning set is for. Do you trust the summaries here to give a perfect description of what is going on?
Re: (Score:1)

by account_deleted ( 4530225 ) writes:

Comment removed based on user account deletion
- Re: (Score:2)
  
  by Neil Boekend ( 1854906 ) writes:
  There are 2 solutions:
  
  Don't play naked.
  Don't lie about your penis size
- - Re: (Score:2)
    
    by aled ( 228417 ) writes:
    
    I don`t know if there is a version of windows with support for more than 256 logical processors (whatever that means). http://www.microsoft.com/windowsserver2008/en/us/r2-scalability-reliability.aspx [microsoft.com]
Decision tree my a$$ (Score:1)

by sundru ( 709023 ) writes:

The method they are using s called as haar cascades postulated by viola jones. I have used the same with opencv for a bit now. http://en.wikipedia.org/wiki/Haar-like_features [wikipedia.org] It's basically passing An image thru progressive classifiers to get a final weight of match. Microsoft may have done the training for generating the classifiers but the method has been around for a bit. "Decision tree".... Pfffft.
Why? (Score:1)

by MoeDrippins ( 769977 ) writes:

Why can't MS do stuff like this in all their departments? Are there not enough smart people to go around? You get truly cool things like this, juxtaposed with
lame "us too!" attempts like WP7 and Bing.
- Re: (Score:2)
  
  by im_thatoneguy ( 819432 ) writes:
  
  "Us too!"?
  Well it's hard for them to do stuff like this in all departments when you don't acknowledge all the other times that they offer innovative or superior products.
  WP7 is in my opinion a far better thought out operating system from a user standpoint than any of the alternatives. So if by "Me Too!" you mean they released a great rewrite of their product which has been on the market longer than either Android or iOS then yes they too continued innovating. WinMo go sucky but when it was released it wa

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re:More advertising masquerading as news (Score:5, Informative)

Re: (Score:3)

Re:More advertising masquerading as news (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Sounds like vision, all right (Score:5, Interesting)

Re:Sounds like vision, all right (Score:5, Insightful)

Re:Sounds like vision, all right (Score:5, Interesting)

Re:Sounds like vision, all right (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:1)

ANN? (Score:1)

Strange Descriptions... (Score:5, Funny)

Re: (Score:1)

Re: (Score:1)

Re: (Score:1)

Need a more descriptive summary (Score:2)

Re: (Score:2)

Re: (Score:2, Informative)

Re: (Score:3)

Focussing on the normal bit (Score:1)

Re: (Score:3)

Re: (Score:3)

Re:Focussing on the normal bit (Score:4, Informative)

Re: (Score:2)

"Almost as impressive"? (Score:1)

Re: (Score:2, Insightful)

Impressive. (Score:5, Funny)

Re: (Score:1)

Re: (Score:1)

Very impressive (Score:1, Flamebait)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Summary hyperbole (Score:1)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

Kinect's Perspective (Score:1)

Summary in a few words (Score:1)

TFA makes it sound like they're cheating (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Decision tree my a$$ (Score:1)

Why? (Score:1)

Re: (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals