DeepMind Used YouTube Videos To Train Game-Beating Atari Bot (theregister.co.uk) 61
Artem Tashkinov shares a report from The Register: DeepMind has taught artificially intelligent programs to play classic Atari computer games by making them watch YouTube videos. Exploration games like 1984's Montezuma's Revenge are particularly difficult for AI to crack, because it's not obvious where you should go, which items you need and in which order, and where you should use them. That makes defining rewards difficult without spelling out exactly how to play the thing, and thus defeating the point of the exercise. For example, Montezuma's Revenge requires the agent to direct a cowboy-hat-wearing character, known as Panama Joe, through a series of rooms and scenarios to reach a treasure chamber in a temple, where all the goodies are hidden. Pocketing a golden key, your first crucial item, takes about 100 steps, and is equivalent to 100^18 possible action sequences.
To educate their code, the researchers chose three YouTube gameplay videos for each of the three titles: Montezuma's Revenge, Pitfall, and Private Eye. Each game had its own agent, which had to map the actions and features of the title into a form it could understand. The team used two methods: temporal distance classification (TDC), and cross-modal temporal distance classification (CDC). The DeepMind code still relies on lots of small rewards, of a kind, although they are referred to as checkpoints. While playing the game, every sixteenth video frame of the agent's session is taken as a snapshot and compared to a frame in a fourth video of a human playing the same game. If the agent's game frame is close or matches the one in the human's video, it is rewarded. Over time, it imitates the way the game is played in the videos by carrying out a similar sequence of moves to match the checkpoint frame. In the end, the agent was able to exceed average human players and other RL algorithms: Rainbow, ApeX, and DQfD. The researchers documented their method in a paper this week. You can view the agent in action here.
To educate their code, the researchers chose three YouTube gameplay videos for each of the three titles: Montezuma's Revenge, Pitfall, and Private Eye. Each game had its own agent, which had to map the actions and features of the title into a form it could understand. The team used two methods: temporal distance classification (TDC), and cross-modal temporal distance classification (CDC). The DeepMind code still relies on lots of small rewards, of a kind, although they are referred to as checkpoints. While playing the game, every sixteenth video frame of the agent's session is taken as a snapshot and compared to a frame in a fourth video of a human playing the same game. If the agent's game frame is close or matches the one in the human's video, it is rewarded. Over time, it imitates the way the game is played in the videos by carrying out a similar sequence of moves to match the checkpoint frame. In the end, the agent was able to exceed average human players and other RL algorithms: Rainbow, ApeX, and DQfD. The researchers documented their method in a paper this week. You can view the agent in action here.
Mimicry (Score:2, Interesting)
So, essentially, little more than mimicry to bootstrap what any human who had never seen a video game would do intuitively and without instruction...
Winter is coming: https://blog.piekniewski.info/2018/05/28/ai-winter-is-well-on-its-way/
Captcha: outvote
Re: (Score:2)
Re: (Score:2)
; yet, its very future depends on the ability to mimic the current human overlords.
I wonder if something similar could be used to make more realistic text-to-voice synthesizers.
Re: (Score:2)
...computer needs pen and paper...
That's one way to implement optical storage.
Re: (Score:2)
no amount of videos can make you automatically finish some games unless you are up to the required skills/reflexes
AI skills impress me. Here it sounds like the primary skill is mimicry, but others exist. AI reflexes should be up to almost any task.
Re: (Score:2)
I have been very impressed with how AI technology is progressing, and often argue with those on Slashdot who think anything short of Skynet is not "real AI". But training AI to win at Atari games is one story that just doesn't mean much to me. Those games are so basic it still seems the work done to beat chess masters two decades ago was more difficult.
Once these AI's can beat professional Starcraft players, or build an AI that can beat Diety level human players in Civ 5 without cheating, then it will becom
Re: (Score:2)
But training AI to win at Atari games is one story that just doesn't mean much to me.
I figure there was a reason they couldn't train an AI to beat these games a few years ago. Must be harder than you'd assume. And if they can beat them now, that's a step of progress.
Re: (Score:2)
But training AI to win at Atari games is one story that just doesn't mean much to me.
I figure there was a reason they couldn't train an AI to beat these games a few years ago. Must be harder than you'd assume. And if they can beat them now, that's a step of progress.
Many times it is just because no one had tried yet. They thought it wasn't possible yet, then saw extreme success in another area, and decided to take a crack at it.
Also sometimes no one had done it because they don't see the need. There are probably plenty of AI related tests you could do that may be a good training exercize but don't necessarily show anything new or novel. I'm not saying I could do this, just like I couldn't play in the NBA. But that doesn't mean it is news every time an NBA player makes
Monkey see, monkey do (Score:2)
So if an algorithm "watches" a video of a good player, the algorithm's play approaches the level of the human. Hooray?
Re: (Score:2)
Re: Monkey see, monkey do (Score:2)
Re: (Score:2)
So if an algorithm "watches" a video of a good player, the algorithm's play approaches the level of the human. Hooray?
ANNs use gradient descent, which is prone to converge on local minima. So for good performance, you need to get it into the right ballpark. This is why image recognizers often use auto-encoding. The same is going on here. Learn to mimic the best human, and then use that as the starting point for further optimization.
Montezuma's Revenge Map (Score:2)
What do you mean "not obvious where to go" ? There are only 9 levels and the map is in the shape of a pyramid [symlink.dk] -- sections of the pyramid are blocked off for that level.
True, it isn't deterministic, but cry me a river. That's the _whole_ point of intelligence --- to make an intelligent decision!
Re:Montezuma's Revenge Map (Score:5, Insightful)
That's the _whole_ point of intelligence --- to make an intelligent decision!
The problem is that intelligence operates on previously recognized patterns. A human playing the game already knows the concept of a map, and a pyramid, and understands locked doors that can be opened with a key. The AI starts with absolutely zero knowledge.
100^18 possible action sequences (Score:3)
I'm pretty sure they must mean 18^100 possible sequences.
That is, if there are 18 possibilities for each step, then 100 steps would yield 18^100 possibilities.
Similar to how if there are two choices to make (go left or go right) and you make 100 of them, then there would be 2^100 possibilities.
Re: (Score:1)
Yeah exactly. And even 18^100 is not entirely correct, because first going left and then going right is (usually) the same as first going right and then going left. So the actual number of unique possibilities will be far lower.
Text adventures (Score:1)
Can they beat Zork yet?
Re: Text adventures (Score:1)
Re: (Score:1)
Getting the Amulet? That's easy! I've had the Amulet dozens of times.
But there's the pesky Wizard, getting thrown back down several levels in Gehennom over and over, and the Plane of Air and the Astral Plane are crazy hard.
I say let an AI read the source, the spoilers, and the full RGRN archive. It still won't ascend.
Re: (Score:2)
Already been done. https://www.reddit.com/r/netha... [reddit.com]
Smoke and mirrors (Score:4, Informative)
So random control inputs are fired over and over until the image of the gameplay the AI is doing very closely matches that of an expert human player. There really is no intelligence to this at all. If the slightest bit of randomness occurs in the game then it will fail, because that would not match the game the human played originally.
the agent was able to exceed average human players
Uh, that's because the "average human" would suck at these kinds of games, and the AI has merely copied the exact gameplay of an expert human who played it originally. So an expert human at a given game exceeds the average human. I hope it didn't take too much research money to come to that conclusion.
Re: (Score:1)
Have you ever seen how a baby learns to do something? Sending muscle control signals over and over, untl the output matches expectations.
Or does a baby just 'know' how to walk? or even to accurately put its thumb in its mouth?
Re: (Score:2)
It is how humans learn -- the vast majority is watching what others did which is successful (also the easiest way to change someone's mind, as it is the opposite of preaching.)
Re: (Score:2)
It's not the same though. Animals don't make random spazmodic movements until it matches the desired output. Instead, they build a mental model correlating their actions to desired outputs. That is entirely different, and it is why this approach is not AI at all. I was very excited. :-(
The real test is this: Create a new screen in Montezuma's Revenge, and let a human player and an AI player both play that new screen. It sounds like this "AI" would simply stand there since it did not have any input on w
Re: (Score:1)
Alternatively, move one of the keys 25 pixels to the left or right. The human players will probably complete the level just the same and maybe not even notice. The AI will probably jump where the key used to be
Not necessarily. By studying the human games, it doesn't just memorize the exact screens and movements. It can also extract patterns that can be used somewhere else, like the image of a key.
Re: (Score:2)
Dan East's comments made me think otherwise. I can't tell from the linked article, and I don't see them giving it any novel inputs. Many versions of Montezuma's Revenge reset all the object positions when you exit and come back into the room, so you get a free "reset" each time.
Re: (Score:2)
It's almost like a toddler trying to mash different shaped objects in various holes, until it figures out that the cylinder goes in the circle.
Sufficiently advanced smoke and mirrors is indistinguishable from intelligence.
Re: (Score:1)
It's almost like a toddler trying to mash different shaped objects in various holes, until it figures out that the cylinder goes in the circle.
Sufficiently advanced smoke and mirrors is indistinguishable from intelligence.
no it is not
the toddler isn't copying a video of another toddler putting objects in holes. they are literally only rewarding this ai when it copies the video. this is a form of copying, not learning. it's an expensive mirror, not intelligent. mirrors are not intelligent because they always mirror what is in front of them, you're way off.
I'll start worrying when... (Score:3)
They train one to get all 222 points in Leisure Suit Larry I.
Best I ever got was 221, lol.
Informative (Score:1)
Re: (Score:2)
change the game in any way (Score:2)
A step back for DeepMind (Score:2)
This sounds like a step back for DeepMind. The whole point of their high profile project AlphaZero was to learn to play games (go, chess, shogi) without any mimicry of human players, and it proved that such an approach could be successful. It soundly defeated the previous go world champ, AlphaGo Master, which was trained with top-level human games and self-play. AlphaZero wasn't taught or shown any patterns, rather it discovered them through self-play and random moves.
The funny thing is, after AlphaZero, De
Re: (Score:2)
It's not really a step back, but rather a more difficult problem. The reason that AlphaZero worked, is because the consequences of a mistake are quickly visible, leading to a short feedback cycle of improvements.
With these particular games, there are too many choices, and too much delay between making a choice and the consequence, making it very hard to detect patterns between a specific action and the outcome.
Re: (Score:2)
With these particular games, there are too many choices, and too much delay between making a choice and the consequence, making it very hard to detect patterns between a specific action and the outcome.
Humans have big problems with that too...
Later Levels (Score:2)
The later levels in Montezuma's Revenge were the same as the earlier ones, except they were blacked out. You have to have already memorized where to go.