Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
AI Games

Uber has Cracked Two Classic '80s Video Games by Giving an AI Algorithm a New Type of Memory (technologyreview.com) 100

An algorithm that remembers previous explorations in Montezuma's Revenge and Pitfall! could make computers and robots better at learning how to succeed in the real world. From a report: A new kind of machine-learning algorithm just mastered a couple of throwback video games that have proved to be a big headache for AI. Those following along will know that AI algorithms have bested the world's top human players at the ancient, elegant strategy game Go, one of the most difficult games imaginable. But two pixelated classics from the era of 8-bit computer games -- Montezuma's Revenge and Pitfall! -- have stymied AI researchers. There's a reason for this seeming contradiction. Although deceptively simple, both Montezuma's Revenge and Pitfall! have been immune to mastery via reinforcement learning, a technique that's otherwise adept at learning to conquer video games.

DeepMind, a subsidiary of Alphabet focused on artificial intelligence, famously used it to develop algorithms capable of learning how to play several classic video games at an expert level. Reinforcement-learning algorithms mesh well with most games, because they tweak their behavior in response to positive feedback -- the score going up. The success of the approach has generated hope that AI algorithms could teach themselves to do all sorts of useful things that are currently impossible for machines. The problem with both Montezuma's Revenge and Pitfall! is that there are few reliable reward signals. Both titles involve typical scenarios: protagonists explore blockish worlds filled with deadly creatures and traps. But in each case, lots of behaviors that are necessary to advance within the game do not help increase the score until much later. Ordinary reinforcement-learning algorithms usually fail to get out of the first room in Montezuma's Revenge, and in Pitfall! they score exactly zero.

This discussion has been archived. No new comments can be posted.

Uber has Cracked Two Classic '80s Video Games by Giving an AI Algorithm a New Type of Memory

Comments Filter:
  • by LostOne ( 51301 ) on Tuesday November 27, 2018 @01:50PM (#57709228) Homepage

    So researchers have discovered that short term gains can come at the expense of long term success? *gasp* Say it isn't so!

    Actually, that's been a known problem for a long time.You end up at a local maximum on the "score" function and now you have no possible way to improve so re-enforcement learning just keeps you there even though you might do substantially better if you actually took a decrease in the "score" and ended up on the path to some other maximum on the function.

    (Oh, and "Fr1st ps0t!", especially if it isn't.)

    • by lgw ( 121541 ) on Tuesday November 27, 2018 @01:59PM (#57709296) Journal

      This is more important than youmake it out to be. The key to these games is that you have to make a map to succeed. That's not the kind of learning you get from "machine learning", as obvious as it might be to a human player.

      One of the many ways that AI is nothing like intelligence is the absence of any representational model of the real world. It's no accident that the neurological seat of human intelligence is an addition to our massive vision processing wetware - understanding the world in terms of objects precedes self awareness in the only example we can study. "AI" doesn't work that way, at least for the most part.

      I find it impressive that someone has managed to connect the idea of making a map with the internals of machine learning (which are completely arbitrary matrices that have no obvious connection to the result).

      • by Areyoukiddingme ( 1289470 ) on Tuesday November 27, 2018 @03:54PM (#57710178)

        One of the many ways that AI is nothing like intelligence is the absence of any representational model of the real world.

        There are many kinds of AI. Neural nets don't construct a representational model of the world from visual input but other AI techniques do. The Soar [umich.edu] framework used so successfully for the machine-controlled antagonists in Descent (among many other uses) supports chunking, reinforcement learning, episodic learning, and semantic learning. It is based on the unified theory of cognition. It has both a temporary and permanent representational memory. It's fundamentally rule-based, rather than a neural net.

        There was at one time a neural net version of Soar called Neuro-Soar but it's not part of the mainstream Soar library.

      • by Kjella ( 173770 )

        One of the many ways that AI is nothing like intelligence is the absence of any representational model of the real world.

        I agree that could be a problem if the number of possible interactions is so large that you need a semantic understanding to whittle it down to a reasonable number. But if it can backtrack random luck to identify the game mechanics that triggered it that's a huge step up, like for example to open the treasure chest it must first find the key or to cross the drawbridge it must first lower it - even though that gives no score by itself. It's hard for an AI to pick out the "meaningful" actions from all the res

      • by DeVilla ( 4563 )

        understanding the world in terms of objects precedes self awareness in the only example we can study

        So you're saying they've laid the foundation for skynet, right? I like the idea that understanding Montezuma's Revenge can lead to self awareness.

    • Re: (Score:2, Insightful)

      So researchers have discovered that short term gains can come at the expense of long term success? *gasp* Say it isn't so!

      To be fair many an MBA hasn't gotten this internalized so it is progress.

    • Well the normal problem is the longer your plan out the more variations you need to figure in. Short term success often leads us to a point where we can face the long term problem, vs failing before you get to that point.
      The idea of the perfect AI algorithm has been available for generations. Just simulate all possible next solutions and pick the path to best success. But the problem is these steps take massive amount of computational time, that grow exponentially the further you go out. A good AI design

      • by DeVilla ( 4563 )

        The idea of the perfect AI algorithm has been available for generations. Just simulate all possible next solutions and pick the path to best success.

        When I was in AI classes, this wasn't considered AI. This was just an exhaustive search. AKA, brute force. The point in AI (back then) was to try to identify how people were able to intelligently avoid exhaustive searches of problem spaces while finding still a reasonably good solution so we could better simulate it in software.

        Well that, and some people wanted to build C-3PO.

    • So researchers have discovered that short term gains can come at the expense of long term success?

      Unfortunately the AI was deleted before it could take control of the company and shut it down for the greater good of humanity.

  • by JMZero ( 449047 )

    These games aren't hard for a computer to play. You could write a fairly straightforward algorithm that would play them both well.

    What's hard is to develop a very general learning algorithm - one that doesn't know about the task - that just happens to pass the test of being able to learn these games.

    The approach here seems "cheaty". That's not to say their technique is useless (and maybe it's more generalizable than I'm giving it credit for) - but from the vague overview of the article it seems like they'

    • The approach here seems "cheaty". That's not to say their technique is useless (and maybe it's more generalizable than I'm giving it credit for) - but from the vague overview of the article it seems like they're effectively juicing their performance.

      Is teaching children cheating?

      • by JMZero ( 449047 )

        No, but telling them the answers to the test can be. To be fair, from the article it's hard to tell where on that spectrum they are.

  • by phantomfive ( 622387 ) on Tuesday November 27, 2018 @02:09PM (#57709374) Journal
    Here is the key quote from the article that vaguely describes the algorithm they used:

    The team’s new family of reinforcement-learning algorithms, dubbed Go-Explore, remember where they have been before, and will return to a particular area or task later on to see if it might help provide better overall results. The researchers also found that adding a little bit of domain knowledge, by having human players highlight interesting or important areas, sped up the algorithms’ learning and progress by a remarkable amount.

    • So they've figured out that getting X to happen is rewarding, but hard because X happens if you do something after W happens, which occurs if you do something after V happens, etc..

      • I think the essential difficulty they are running into is that a human will look at the screen and say, "Oh, that is a room, there is gravity, and it looks like I can jump." A human doesn't need to play through the scenario 10,000,000 times, humans learn in ways besides just reinforced repetition. I would even suggest that is not our primary form of learning, although it is a powerful one in some situations.

        It seems unlikely that a system that only learns through many repetitions will become general inte
        • A human doesn't need to play through the scenario 10,000,000 times, humans learn in ways besides just reinforced repetition

          Humans learn about things by experience. You have a lot of information you've learned through repetition--like gravity and pain. I stepped on a bee once because I'd gotten stung earlier and wanted to confirm that stepping on a bee caused pain; I now know that some animals can sting and don't need to mess with them to figure out they'll inflict pain. I learned that hornets and scorpions sting by repeatedly being stung by bees.

          I can work out how things work without doing them because I can simulate them

          • Did you understand what I said?
            • Yes. You said a human doesn't have to run through a scenario repeatedly to learn it; you seem to think each scenario is unique, and not comparable to different scenarios already experienced repeatably--i.e. that most "new" scenarios are actually old scenarios.

              Do you know why you can't retain fluency in language just by a study of grammar and vocabulary?

              You can't just memorize words. I remember words--even English words--because every concept I want to convey links to thousands of phrases I've heard, s

              • Yes. You said a human doesn't have to run through a scenario repeatedly to learn it; you seem to think each scenario is unique, and not comparable to different scenarios already experienced repeatably--i.e. that most "new" scenarios are actually old scenarios.

                No, what he/she said is that unlike AI, human does NOT need to learn from repetitions. Human can learn by using logic (as an assumption).

                • Humans don't learn using logic; they apply logic to assess. Logic is a tool learned by repetition.

                  Logic also does not apply to the unknown: if I explain to you the R-Star target and the movement of equities, you can't tell me what should or shouldn't happen to the economy. It's completely-logical, but you don't have prior knowledge to assess it.

                  Even then, when you get down to it, you're applying an array of tools that you built through repetition of experience. You're quite slow at getting a result s

                  • Still, you missed the point of the poster. And yes, we can learn from logical conclusion without going through a scenario or repetitions.

                    Repetition is a brute-force learning. We, humans, can still learn a different way that is not necessary to be brute-force. All you are talking about is just an obvious learning and basic -- first-hand experience. Logic is a part of second-hand experience because we could use knowledge proven by others to make a conclusion; thus it is another type of learning.

                    • yes, we can learn from logical conclusion without going through a scenario or repetitions.

                      You're not learning; you're applying previous knowledge.

                      If I say: 3x^2 + 2x = 164, solve for x, what do you do? You apply your knowledge of algebra to obtain the value of x.

                      What if you don't know algebra?

                      How did you get to know algebra?

                      You got to know algebra by repetition learning. You can't just read a book on algebra and have it memorized; you need to perform repetitive tasks.

                      Logic is a part of second-hand experience because we could use knowledge proven by others to make a conclusion; thus it is another type of learning.

                      That's not learning. Obtaining a result by following a process is doing. You haven't learned until you can follow th

                    • You are obviously not listening or even think out of the box as many others do. I am not going to try further to explain it to you anymore. You keep using the same reasoning which is the base part of learning. Anyway I should have left it as is (as others stopped after they said/insulted you). I now know why others don't want to reply to your objection. It is my fault to attempt to show you a way.

                    • You are obviously not listening or even think out of the box as many others do.

                      I'm listening; you're just wrong.

                      You keep using the same reasoning which is the base part of learning

                      Reasoning is spelled with letters, like "R". Learning starts with an "L".

                      You keep telling me I can't fly, but I put one foot in front of the other. Isn't that flying?

                      No, it's walking, just like reasoning is reasoning and learning is learning.

                      Learning is specifically the encoding of new knowledge so that it is retained and doesn't have to be reasoned out or observed again. It has to be recallable as its own thing or it isn't learned. Reasoning something out isn't know

              • If you understood it, then I'm sure you can think of your own examples of not needing many repetitions (like a neural network would) to learn something.

                Or maybe you don't understand AI well enough to reason logically about it.
                • Give me an example and I'll tell you what previously-learned implement you're actually using.

                  • idiot. I give you knowledge and you'd rather argue.
                    • Have you considered that maybe you're wrong?

                      Let's say I work at a fast food place and sell food. Someone orders a burrito ($2.89), milk shake ($1.93), and nachos ($1.28). I add them up and get $6.10.

                      You assert that I have learned that a burrito, milk shake, and nachos cost $6.10.

                      You're wrong.

                      Three customers later, someone orders a burrito, milk shake, and nachos. I remember this just happened a few minutes ago, but I don't remember how much they cost, together. I must re-compute these things usin

        • After failing off an drawbridge next time we can slam on the gas or add map data that there is one.
          After driving off an pier next time we can add wait for ferry to the map data.

        • "Oh, that is a room, there is gravity, and it looks like I can jump." A human doesn't need to play through the scenario 10,000,000 times, humans learn in ways besides just reinforced repetition. I would even suggest that is not our primary form of learning, although it is a powerful one in some situations. .

          I don't know about that. Humans spend the first few of months of their lives randomly jerking their limbs around until they've figured out the basics of gravity. It takes a surprisingly long time for an infant to figure out how to roll over on its own, much less jump.

          All of that past repetition learning about gravity and geometry is factored in when you play this video game, so now it just *looks* like intuition.

    • by AvitarX ( 172628 )

      Isn't that how deep blue beat a human grand master the first time?

      Seems a good step.

      • Isn't that how deep blue beat a human grand master the first time?

        Seems a good step.

        If your goal is building an algorithm that can beat a human, then it's a great first step. So if your goal is to create general AI, then it's not a step at all.

        These ancient games by themselves aren't particularly interesting though, no one cares if a computer can beat a human. The only reason they are of interest is as a potential stepping stone to general AI.

        • by AvitarX ( 172628 )

          How effective is general purpose Real Intelligence without being guided?

          The Octopus is maybe an answer?

          • How effective is general purpose Real Intelligence without being guided?

            We can answer that question. If someone stuck you in the room with the game and no instructions, could you get higher than a score of zero? Could you beat the game? For indeed, you are a general purpose Real Intelligence.

            • by AvitarX ( 172628 )

              Sure, but I've had over 30 years of human guidance to get there.

              • Now you're intentionally being obtuse. If it were just a matter of time, we could upload our current AI algorithms into a Boston Dynamics piece of hardware and wait.
                • by AvitarX ( 172628 )

                  If this technique can lead to an AI capable of learning many previously un achievable tasks with minor human input, I would think it's progress in creating AI and not not a step at all.

                  If I played pitfall as a child, I wouldn't know what to do without context (money bags good, and what not), but an older sibling may say "go over there, there's money bags".

                  I actually remember being confused by pitfall and just dyeing a lot actually.

                • by ceoyoyo ( 59147 )

                  That is an experiment that's being done. The idea is to learn how much behaviour can arise spontaneously with a reinforcement learning algorithm, some basic motivations, and real-world sensory input.

                  That IS how we do a great deal of our learning, almost exclusively when we're young. When we advance a little further we get some direct supervised learning mixed in.

              • Sure, but I've had over 30 years of human guidance to get there.

                Due to humans being so slow, that's still about 5 orders of magnitude less than the guidance that the software got.

                • by ceoyoyo ( 59147 ) on Tuesday November 27, 2018 @09:49PM (#57712246)

                  Humans are not slow. Hinton has computed the amount of sensory information that is processed by the human brain, using reasonable approximations for things like the effective sampling rate of the eyes and ears. It's enormous.

                  The *consciousness* that we subjectively experience is slow. We're also pretty horrible at tasks we have to consciously think about as we're doing them too. Both of which suggest that "consciousness" might be considerably less important than many give it credit for.

                  • Humans are not slow. Hinton has computed the amount of sensory information that is processed by the human brain, using reasonable approximations for things like the effective sampling rate of the eyes and ears. It's enormous.

                    Humans can "process" maybe 1 high-res photo per second. Computers can do the same several thousands times faster. Humans *ARE* slow and discard almost all sensory input in any given second (flash 1 million stills at a human in a single second and you'll be lucky if they manage to catch even one of those images).

                    The *consciousness* that we subjectively experience is slow.

                    In which case my point to the OP still stands - his "30 years of human guidance" can be sent to a NN in a weekend.

                    • by ceoyoyo ( 59147 )

                      No, it can't. The human neural network processes a large amount of data, not just vision but lots of other sensory input as well, and it does it a lot faster than any existing computer doing anything of even vaguely comparable sophistication. We don't know exactly how 30 years of a human observing and learning about the world translates into analogously training an artificial neural network, but it is definitely more than a weekend's worth. Reasonable estimates put it at more than 30 years worth, not to

                    • I have some experience with this. So have I. AI hype restarted just a few years ago so by now almost anyone interested has looked into it, played with it, etc. Some of us have even done postgrad in it.

                      I very strongly suspect you do not. But you don't even have to believe me. Geoff Hinton has made the same argument, and he has some pretty reasonable qualifications.

                      And his numbers fail the most basic of tests - how much information can a human process in a single second. His numbers are counting each sensory input separately while comparing to a NN looking at whole images. If we count each individual pixel in each image the NN receives it is vastly faster than the human.

                      Looking at images alone, humans can't "see" more than a few distinct images per second while the NN can see, react and adjust to a few hundreds of thousands of images per second. A 30 year period of hu

  • by Anonymous Coward

    So what does this (an article about Alphabet's deep learning) have to do with Uber???

  • by smoothnorman ( 1670542 ) on Tuesday November 27, 2018 @02:25PM (#57709510)
    Once again, this seems to be a case of "AI" research re-discovering some basic math: if the function has discontinuities or is otherwise non-differentiable ("behaviors that are necessary to advance within the game do not help increase the score until much later.") then its optimization is hard or dependent on fortunate starting conditions.

    As an aside, have we even developed an accepted definition for what properly qualifies as "AI"? Recently I was being flogged some software who's selling point was an "AI engine" which turned out to be little more than a previous version with little bit of Bayesian statistics bolted on.

    • by ceoyoyo ( 59147 )

      Reinforcement learning basically exists to solve problems that have the properties you describe. Researchers in the field have been aware of them for a long time. Many modern reinforcement learning algorithms basically use artificial neural networks to estimate the trickier bits in a Q-learning framework. Q-learning was introduced in 1989 and the basic theory developed in the early nineties.

  • > protagonists explore blockish worlds filled with deadly creatures and traps.

    No, they ARE block rooms -- each room IS exactly 40x25 tiles (on the Apple ][ it only displays 40x24 tiles.) The tiles just happen to be a) animated, and b) mega-tiles such as ladders which are three tiles wide.

    Also, here is a map of the world [symlink.dk] --- It make a pyramid shape, go figure!

    Impressive that it could fit in 99 rooms in less then 32 KB !

  • Well why not just make building models of the environment and reducing surprise at observation compared to expectation the optimization goal?

    Isn't that the point of "free energy principle" thinking?

  • Why wouldn't you assign points for a strategy not resulting in something negative (ie staying alive points)? Seems like an easy tweak.
  • uber already has one death how many more before they get an safe auto drive AI?

    • by mentil ( 1748130 )

      I dunno, how many more would it have to kill before YOU would consider it safe?
      Oh, wait...

  • by SuperKendall ( 25149 ) on Tuesday November 27, 2018 @04:02PM (#57710252)

    I couldn't find this link anywhere in the actual article Slashdot linked to or the summary - the blog post laying out what Go-Explore is in more detail:

    http://eng.uber.com/go-explore/ [uber.com]

  • Comment removed based on user account deletion
    • by mentil ( 1748130 )

      Wait that doesn't sound at all like... oh wait, I was thinking of Custer's Revenge. I'd wondered why they picked THAT game.

  • ... is unstructured. When you put an 'algorithm' in a game it has no awareness to make discoveries about goals and motivations, take the idea of them mentioning one of their algorithms not being able to get out of the first room...

    Now think of what that means, your AI has no sense of when to move on because it doesn't realize there's nothing interesting going on in the space it finds itself. When goals don't exist or are unstructured you basically have to invent goals, aka come to the realization your wa

  • This reminds me in some ways of the chart parsers [wikipedia.org] I was playing around with in university for a paper on natural language processing. I think these days they are mostly used in the context of code compilation, but I must admit I don't know much about modern natural language processing tools, so I don't know if they're still a thing there.
  • The title mentions novel AI techniques by Uber involving a new type of memory. Cool!!

    The summary does not mention Uber nor this new memory. What are exactly the news?

  • From TFA: "The researchers also found that adding a little bit of domain knowledge, by having human players highlight interesting or important areas, sped up the algorithms’ learning and progress by a remarkable amount." This defeats the whole purpose of autonomous independent exploration.

Real Programmers don't eat quiche. They eat Twinkies and Szechwan food.

Working...