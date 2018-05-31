DeepMind Used YouTube Videos To Train Game-Beating Atari Bot (theregister.co.uk) 24
Artem Tashkinov shares a report from The Register: DeepMind has taught artificially intelligent programs to play classic Atari computer games by making them watch YouTube videos. Exploration games like 1984's Montezuma's Revenge are particularly difficult for AI to crack, because it's not obvious where you should go, which items you need and in which order, and where you should use them. That makes defining rewards difficult without spelling out exactly how to play the thing, and thus defeating the point of the exercise. For example, Montezuma's Revenge requires the agent to direct a cowboy-hat-wearing character, known as Panama Joe, through a series of rooms and scenarios to reach a treasure chamber in a temple, where all the goodies are hidden. Pocketing a golden key, your first crucial item, takes about 100 steps, and is equivalent to 100^18 possible action sequences.
To educate their code, the researchers chose three YouTube gameplay videos for each of the three titles: Montezuma's Revenge, Pitfall, and Private Eye. Each game had its own agent, which had to map the actions and features of the title into a form it could understand. The team used two methods: temporal distance classification (TDC), and cross-modal temporal distance classification (CDC). The DeepMind code still relies on lots of small rewards, of a kind, although they are referred to as checkpoints. While playing the game, every sixteenth video frame of the agent's session is taken as a snapshot and compared to a frame in a fourth video of a human playing the same game. If the agent's game frame is close or matches the one in the human's video, it is rewarded. Over time, it imitates the way the game is played in the videos by carrying out a similar sequence of moves to match the checkpoint frame. In the end, the agent was able to exceed average human players and other RL algorithms: Rainbow, ApeX, and DQfD. The researchers documented their method in a paper this week. You can view the agent in action here.
; yet, its very future depends on the ability to mimic the current human overlords.
I wonder if something similar could be used to make more realistic text-to-voice synthesizers.
So if an algorithm "watches" a video of a good player, the algorithm's play approaches the level of the human. Hooray?
So if an algorithm "watches" a video of a good player, the algorithm's play approaches the level of the human. Hooray?
ANNs use gradient descent, which is prone to converge on local minima. So for good performance, you need to get it into the right ballpark. This is why image recognizers often use auto-encoding. The same is going on here. Learn to mimic the best human, and then use that as the starting point for further optimization.
What do you mean "not obvious where to go" ? There are only 9 levels and the map is in the shape of a pyramid [symlink.dk] -- sections of the pyramid are blocked off for that level.
True, it isn't deterministic, but cry me a river. That's the _whole_ point of intelligence --- to make an intelligent decision!
I'm pretty sure they must mean 18^100 possible sequences.
That is, if there are 18 possibilities for each step, then 100 steps would yield 18^100 possibilities.
Similar to how if there are two choices to make (go left or go right) and you make 100 of them, then there would be 2^100 possibilities.
Can they beat Zork yet?
So random control inputs are fired over and over until the image of the gameplay the AI is doing very closely matches that of an expert human player. There really is no intelligence to this at all. If the slightest bit of randomness occurs in the game then it will fail, because that would not match the game the human played originally.
the agent was able to exceed average human players
Uh, that's because the "average human" would suck at these kinds of games, and the AI has merely copied the exact gameplay of an expert human who played it originally. So an
They train one to get all 222 points in Leisure Suit Larry I.
Best I ever got was 221, lol.