Kinect's AI Breakthrough Explained 97
mikejuk writes "Microsoft Research has just published a scientific paper (PDF) and a video showing how the Kinect body tracking algorithm works — it's almost as impressive as some of the uses the Kinect has been put to. This article summarizes how Kinect does it. Quoting: '... What the team did next was to train a type of classifier called a decision forest, i.e. a collection of decision trees. Each tree was trained on a set of features on depth images that were pre-labeled with the target body parts. That is, the decision trees were modified until they gave the correct classification for a particular body part across the test set of images. Training just three trees using 1 million test images took about a day using a 1000-core cluster.'"
Re: (Score:2)
Any decent large data center will be happy to rent you one for a price?
Re: (Score:2)
Why would MS rent/buy processor time? They've got the world's biggest botnet, and they even have the suckers pay MS to join it.
Re: (Score:2)
Re: (Score:2)
Forget the 1000-core cluster. I want to know where I can get 1,000,000 images of people with all the (major) body parts zoned and referenced.
That's an impressive test corpus.
Re: (Score:1)
I would assume they just used an established motion tracking system in parallel with the Kinect sensor input.
At 30 fps, that's about 10 hours of input.
Re:More advertising masquerading as news (Score:5, Informative)
I don't think so this time. This is a reasonably well written formal paper sent for peer review. It is also quite nice to see this published openly.
Re: (Score:3)
It is also quite nice to see this published openly.
And no doubt backed up by a dozen patents.
Re:More advertising masquerading as news (Score:4, Insightful)
And no doubt backed up by a dozen patents.
Of course. That's the purpose of patents, to encourage inventors to publish their inventions openly.
Re: (Score:2)
I'd rather they kept their secrets and let somebody else figure it out than be granted a monopoly on an idea.
Re: (Score:2)
Ah, so you want shorter patent terms and non-ridiculous licensing costs.
Yell at the government regarding the former, and yell at the sellers regarding the latter.
Sounds like vision, all right (Score:5, Interesting)
Re:Sounds like vision, all right (Score:5, Insightful)
Random forests have always been a nice classifier to use when working with really wacky data types. This is due in part to how easy it is to customize them; a lot of the ways they can be tweaked and tuned and customized have fairly intuitive effects on the outcome and behavior of the classifier. In my experience, while neural nets can also be pretty powerful, they are often much harder to work with as the parameters you have for tweaking can be really non-intuitive. We sometimes joke about neural nets being "black magic" because the training and tweaking can be really uninterpretable.
However, the biggest reason random forests were used is probably because they are extremely fast on current chips, probably a couple orders of magnitude faster than neural nets when the trees are hard coded.
Re:Sounds like vision, all right (Score:5, Interesting)
Re:Sounds like vision, all right (Score:5, Interesting)
Yes, now all they need to do is fix the lag which can be quite high, maybe even 200ms:
http://www.youtube.com/watch?v=weZOjotbuSU [youtube.com]
Something really low like 16ms or better is needed so that we don't notice, according to this article:
http://www.sussex.ac.uk/Users/km3/hfes.pdf [sussex.ac.uk]
Re: (Score:2)
The youtube video doesn't really prove anything. The lag could just as easily be introduced by the TV or the game.
Re: (Score:2)
Yep, it's not exactly an AI break through but it's really cool to see a practical application of machine learning in the consumer arena.
I suppose that's what the title "AI Breakthrough" means. Training decision trees in a random forest is not a breakthrough.
Re: (Score:3)
I really think AI will be created in the same way. Once in a while a need appears for a AI related task and someone finds a "good enough solution". In time, someone will need a robot to have a serious conversation with and there will be enough knowledge lying around
Re: (Score:1)
Go search for Women Aspergers interview tony attwood.
Keep listening until you get to the 'sixth sense' bit.
You may not realize it, that doesn't mean that other people aren't 100% aware.. (e.g. I'm in the third person, it's pretty apparent that I don't make the spelling mistakes I just tell my body by pushing a command out to write some stuff, and it cocks up sometimes).
It does similar in the other direction, with various levels of indirection... and I can also push things further down for some real number c
ANN? (Score:1)
Strange Descriptions... (Score:5, Funny)
- "What do you do for a living?"
- "I train trees to make a decision forest that can see human limbs."
- "Ah, I see. Makes sense. (WHAT THE FUCK???)"
Re: (Score:1)
Re: (Score:1)
Re: (Score:1)
Need a more descriptive summary (Score:2)
From the summary it looks like they are basically using a classifier which they spent a lot of time training, and it works well. This is impressive, but I don't know if it meets the story title's claim of "AI breakthrough", since from the summary it sounds basically like, "researchers used classifier for classifying data and it worked!" Can someone summarize in a little more detail exactly what the "breakthrough" entails, other than basically standard use of classifiers for training on data sets?
Re: (Score:2)
TFA says "it is all based on fairly standard classical pattern recognition"
I'm a science reporter. I just want to clarify your above statement -- Are you saying that this is an unprecedented breakthrough in artificial intelligence research that will lead to "thinking machines" in the next year?
Re: (Score:2, Informative)
The function: f=d(x+u/d(x))-d(x+v/d(x)) would calculate the depth gradient of the pixel. It's possible to reconstruct a three dimensional shape from a 2D image [weizmann.ac.il].
Then your problem is trying to match a human skeleton to the shape. If you know the curvature of the gradient at a particular point, you can eliminate some body parts. A head is mostly spherical and within a particular maximum/minimum, limbs and the torso are more cylindrical with a linear depth along one axis. Look for that linearity, and you cou
Re: (Score:3)
This has nothing to do with reconstructing a depth image from a 2D image. The Kinect is a depth camera and already gives you a real depth image (not a guess).
Focussing on the normal bit (Score:1)
So they fed an LCS with some sample data? OK, par-for-the-course. I'm far more interested in how they generated those '1 million' pre-labelled test images in the first place.
Re: (Score:3)
The same way that cybercriminals crack captchas, they just offered up a picture of a random boob to a random boob. The real problem was stopping at 1m pictures.
Re: (Score:3)
> I'm far more interested in how they generated those '1 million'
> pre-labelled test images in the first place.
Snapshots from the webcams attached to computers running Windows.
Re:Focussing on the normal bit (Score:4, Informative)
I read the paper; it was clever. They used a standard motion capture setup with their actor(s) going through several hundred different movements. Since their algorithm is stateless, they could analyze the motion and produce many distinct poses from each movement. Each pose was then "retargeted" (a well known technique in animation; example [snu.ac.kr]) onto many different 3D models of people of varying height, body type, etc., before finally being rendered into a perfectly labeled depth map.
They went through several iterations of this process:
Re: (Score:2)
What I find really interesting about this approach is that it's machine learning in a virtual environment.
They essentially taught a game controller how to be a game controller by feeding it virtual players inside of a game.
I suspect this is how we'll want to train all artificial intelligence agents. Why go through the trouble of building a robotic body for an AI to use when it can simply be provided a virtual world to live and grow in.
I've also always been curious why more AI research doesn't take place i
"Almost as impressive"? (Score:1)
Ummm, all I've seen so far apart from this are pretty obvious uses of the depth sensor.
What Microsoft has done is solved an extremely hard AI problem. Check out the body-part identification. I think more credit is due.
Re: (Score:2, Insightful)
Hum, no, actually, they just used a known for years technic of machine learning on a huge sample of data and it worked pretty well.
From my point of view, there is no major breakthrough but still it's a nice solution.
Impressive. (Score:5, Funny)
Trees have traditionally been trained in Entish, which although reliable, is such an un-hasty language.
Re: (Score:1)
Trees have traditionally been trained in Entish, which although reliable, is such an un-hasty language.
The 1K core cluster is mostly because it takes such a long time to say anything. Had they gone with a one core cluster, by days end, the system will have just managed to say 'good morning'. The end result is that it has accomplished nothing. Thankfully, with this system, they can complete this statement in a thousandth of the time, in other words, it reduced the startup time to 28.8 seconds.
Re: (Score:1)
640K ought to be enough chlorophyll for anyone.
Very impressive (Score:1, Flamebait)
A lot of the MS-haters on Slashdot tried to write off the Kinect as a nice bit of third-party hardware with a crappy MS-made driver. I wonder how they'll respond to this. Microsoft has really outdone themselves here. I think Penny Arcade [penny-arcade.com] put it best. If only they could apply this sort of innovation to their more important products, they'd be back on top in no time.
Re: (Score:1)
Re: (Score:2)
As for the device and things like that in general, just like Eyetoy, these will never, ever replace a controller. Ever.
I think it's fairly clear that the future involves both approaches, sometimes both in one game. Keeping gamepad support anywhere it is possible to do so keeps the game accessible for as many people as possible, e.g. the disabled. But I really enjoy the fact that the Wii gets me moving around. I imagine I'd enjoy the same thing about Kinect (but my 360's optical drive died and I have been extremely lazy about replacing it. I have all the pieces...)
Re: (Score:2)
More important products.... yes....
http://www.telegraph.co.uk/technology/microsoft/8287610/Xbox-Kinect-helps-Microsoft-beat-Wall-Street-profit-forecasts.html [telegraph.co.uk]
Re: (Score:2)
Well, I really like my Mac and I think this is cool. So, stuff that data point in your decision forest. :-)
Re: (Score:2)
"IBM's BOCA RATON : created the first PC"
And here all this time I thought the Apple computer was out before the IBM... silly me.
Bill
Re: (Score:2)
Yep - this is certainly a very impressive product. MIcrosoft have an absolutely world-class reseach lab/staff, but it seems rare so far for their work to make it into products (same as was the case with Xerox PARC).
Re: (Score:2)
Re: (Score:2)
The Penny Arcade strip was actually a send up of all the ridiculous hype surrounding the device. It can't actually restore sight to... you know what, never mind. Yeah, the Kinect is the digital manifestation of the second coming. It is the apex of technological development for the human race.
Summary hyperbole (Score:1)
I haven't thoroughly read the paper yet, but calling this an AI breakthrough is inappropriate for a number of reasons. First, this is an application of machine learning, which is not the same thing as AI. Second, it seems to be a fairly incremental work building on very common techniques--very far from a breakthrough in any respect. If you don't believe me, see some of Jamie Shotton's other work [shotton.org], which is good work, but this is nothing extraordinary in comparison.
Re: (Score:1)
Re: (Score:2)
First, this is an application of machine learning, which is not the same thing as AI.
That's the beauty and mystery of AI -- once a technique is actually made to work on computers in the real world, it loses its status as an "AI technique". The AI goalposts automatically move ahead to some other, harder problem that isn't solved yet. Eventually we will have HAL-9000 style computers everywhere, and people will continually piss them off by telling them the reasons they don't count as "real AI".
Re: (Score:1)
Nah. twice, three times, max. then the people will start dying.
Re: (Score:2)
Re: (Score:1)
The sensor came from primasense. The algorithms in it are entirely from MSR.
Why can you download SDK with this software from Primasense directly, but not from Microsoft if the algorithms are M$ property?
Looks like M$ is just appropriating third party research.
Re: (Score:1)
Looks like M$ is just appropriating third party research.
Splendid. Primesense are not complaining about this paper but you accuse MSR of stealing work?
Re: (Score:2)
PrimeSense developed the sensor technology (hardware and firmware) that gives you a depth image. Microsoft took that depth image and created the algorithms that perform body tracking (software).
PrimeSense also have their own body tracking solution (they call it NITE), but it's based on an entirely difference concept and requires a calibration pose to "lock in" initially. Microsoft doesn't use NITE.
Kinect's Perspective (Score:1)
Summary in a few words (Score:1)
TFA makes it sound like they're cheating (Score:2)
"[..] the decision trees were modified until they gave the correct classification for a particular body part across the test set of images"
this is called cheating in machine learning (you are not allowed to modify your model(s) based on the results on the test set).
and of course it is not what they do.
nice piece work, tho IMHO not AI breakthrough.
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
Decision tree my a$$ (Score:1)
Why? (Score:1)
Why can't MS do stuff like this in all their departments? Are there not enough smart people to go around? You get truly cool things like this, juxtaposed with
lame "us too!" attempts like WP7 and Bing.
Re: (Score:2)
"Us too!"?
Well it's hard for them to do stuff like this in all departments when you don't acknowledge all the other times that they offer innovative or superior products.
WP7 is in my opinion a far better thought out operating system from a user standpoint than any of the alternatives. So if by "Me Too!" you mean they released a great rewrite of their product which has been on the market longer than either Android or iOS then yes they too continued innovating. WinMo go sucky but when it was released it wa