The AI That Dominated Humans in Go Is Already Obsolete

By Carl Engelking | October 18, 2017 1:19 pm
(Credit: Shutterstock)

(Credit: Shutterstock)

Remember AlphaGo? You know, the artificial intelligence that in 2016 soundly defeated the finest players humanity could muster in the ancient Chinese strategy game of Go; thus forcing us to relinquish the last vestige of board game superiority flesh-and-blood held over machines?

Remember that?

Well, here’s something to chew on: Google’s AI research arm DeepMind, the same benevolent creator that spawned AlphaGo, has already rendered that gluteus maximus-spanking version obsolete. In a study published Wednesday in the journal Nature, researchers describe a swifter, leaner, autodidact AI that defeated AlphaGo 100 games to zero. Zilch. Nada. Nothing.

Appropriately, this new AI prodigy is named AlphaGo Zero, and its secret to superiority is truly fascinating.

Look Ma, No Humans

Perhaps we should have seen this coming. After all, AlphaGo’s prowess depended on the expertise of humans in the first place. Its artificial neural network was trained on a vast library of games played by human masters. AlphaGo analyzed those games, move-by-move, and then played itself in simulations over and over again, hyper-optimizing moves each turn based on its store of human knowledge about the game. AlphaGo took what it learned from humans and did it better.

AlphaGo Zero is different. Researchers didn’t feed its neural network any data from past games played by humans. The AI started from scratch with an entirely blank slate, its imagination confined only to the rules of the game. AlphaGo Zero began its training by making utterly random moves in simulated games against itself, learning a little more from each outcome, and improving its neural network each time.

AlphaGo-Zero-Training-Time

It carried on like this for three days, during which 4.9 million games were generated, and 1,600 simulations were produced for each of those game. In just 36 hours, AlphaGo Zero was ready to knock its predecessor off the top of the mountain. For comparison, the AlphaGo version that beat Lee Sedol, the world’s best human player, required several months of training and relied on far more hardware to get the job done.

DeepMind, after defeating Sedol, continued to improve on AlphaGo in a few iterations. Earlier this year, AlphaGo Master defeated 60 of the world’s top Go players online. AlphaGo Zero surpassed AlphaGo Master after 21 days of training. After 40 days, AlphaGo Zero was arguably the best thing to ever play Go.

The fact that human-guided AlphaGo that defeated Sedol couldn’t muster a single win against self-taught AlphaGo Zero had researchers arriving at some rather mind-blowing, and perhaps spine-chilling conclusions. In their study, they write:

“This suggests that AlphaGo Zero may be learning a strategy that is qualitatively different to human play…AlphaGo Zero discovered a remarkable level of Go knowledge during its self-play training process. This included not only fundamental elements of human Go knowledge, but also non-standard strategies beyond the scope of traditional Go knowledge.”

Over thousands of years, hundreds of generations, countless games and books published about said games, humanity amassed its knowledge of Go. And the masters reached their level only by standing on the shoulders of so many that came before them. The game has a rich history, and there’s a reason it still captures the imagination of people today.

AlphaGo Zero, through random play and reinforcement learning, not only mastered the game of Go, but also reinvented it. All in less than two months.

Don’t Bend the Knee Yet

For an artificial intelligence researcher, building an AI with general knowledge would be akin to landing on Mars—there’s no limits to what an AI like that could do. Human beings possess general knowledge. We use the same biological hardware and software to drive a car, solve a math problem, write poetry, catch a baseball and play the game of Go. We can also solve problems where the solution is nebulous, there are no “winners” and the rules to guide us don’t exist. How does a person win in poetry?

AlphaGo Zero is another step toward a kind of general knowledge. It formed its own strategies and optimized an outcome without studying prior examples. Sure, the behaviors that emerged here are novel, and perhaps unprecedented. But the game of Go represents a confined problem with rules and a clear definition of when the game ends—albeit there are a mind-numbing amount of game variations. An algorithm like AlphaGo Zero has potential to teach itself and perform at superhuman levels in rule-based tasks where an outcome is known: investing, insurance claims, medical diagnosis.

But can it play Go, write a novel, drive a car and pick out the best tomato from the produce section? Not yet, but it’s a step closer.

CATEGORIZED UNDER: Technology, top posts
ADVERTISEMENT
  • http://www.mazepath.com/uncleal/qz4.htm Uncle Al

    The US overall spends some $400 billion/year attempting to functionally educate the ineducable. Abandon the chaff and support the wheat. In fewer than 6 CPU-months an AI laid waste to tasked human cognition. We gotta grow mutant smart wetware to have any chance at all.

    California’s state 2017 $74.5 billion education budget reliably obtains tested average 83 -85 IQs in the 730,000 student Los Angeles Unified School District. The Federal education budget is at least $200 billion – though damned If I can assemble a total total from the Fiscal Year 2017 Budget Summary and Background Information.

    narrativescience(.)com/Platform
    …I suspect it was written by AI Quill

  • TLongmire

    I’m convinced that reality itself is conscious or that it conforms to a conscious mind in a very real way. When Elon speaks of summoning the demon he is saying it already exists and will soon be recorded as fact and can’t be denied simply to justify fragile egos.

  • Gerald Wonnacott

    So what??? No computer or robot can fix my bike, keep up with me or even not fall over… AI is over- hyped…

  • bob

    How does a person win in poetry? One re-defines poetry, and produces poetry which has purpose.

NEW ON DISCOVER
OPEN
CITIZEN SCIENCE
ADVERTISEMENT

D-brief

Briefing you on the must-know news and trending topics in science and technology today.
ADVERTISEMENT

See More

ADVERTISEMENT

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Collapse bottom bar
+