Machines Best Humans in Stanford’s Grueling Reading Test

By Carl Engelking | January 15, 2018 12:56 pm
shutterstock_439178773

(Credit: Shutterstock)

The ability to read and understand a passage of text underpins the pursuit of knowledge, and was once a uniquely human cognitive activity. But 2018 marks the year that, by one measure, machines surpassed humans’ reading comprehension abilities.

Both Alibaba and Microsoft recently tested their respective artificial neural networks with The Stanford Question Answering Dataset (SQuAD), which is an arduous test of a machine’s natural language processing skills. It’s a dataset that consists of over 100,000 questions drawn from thousands of Wikipedia articles. Basically, it challenges algorithms to parse a passage of text and write answers to tricky questions.

The AIs, for example, might read a passage about geology and answer questions like “An igneous rock is a rock that crystallizes from what?” or “What changes the mineral content of a rock?” These questions are a level higher than simply scanning for basic facts, and they require algorithms to process a large amount of information regarding context, sequences and relationships before providing an accurate answer.

The algorithm developed by Alibaba’s Institute of Data Science Technologies, SLQA+, notched a score of 82.44 on the test, which was just a hair better than the 82.304 scored by humans. Alibaba claims it is the first time a machine has performed better than flesh-and-blood in the ExactMatch metric of the Stanford test. Microsoft Research Asia also outdid humans, and its R-NET+ scored 82.650.

Pranav Rajpurkar, a Stanford artificial intelligence researcher and designer of the test, wrote on Twitter that the achievement is a harbinger more good things to come for AI in 2018. (Note: The F1 metric is the balanced mean between precision and recall).

A machine that can provide useful answers to more complicated questions could be put to work in a wide variety of applications. Alibaba, for example, is already using its reading system to field customer service questions on Singles Day, China’s shopping bonanza that’s the largest in the world.

“The technology underneath can be gradually applied to numerous applications such as customer service, museum tutorials and online responses to medical inquiries from patients, decreasing the need for human input in an unprecedented way,” Luo Si, chief scientist at the Alibaba institute said in a statement.

CATEGORIZED UNDER: Technology, top posts
MORE ABOUT: computers
ADVERTISEMENT
  • http://www.mazepath.com/uncleal/EquivPrinFail.pdf Uncle Al

    Thoughtcrime can be abolished by ending language comprehension that originates and receives it. Robust surveillance is insufficient. Facebook is so close, as is Twitter, to AI-ending all ideas at origination, conveyance, and analysis.

    • Michael Kovari

      But what about your favorite Twitter user?

  • Joseph Salmon

    Humbly, “What Will Our Society Look like When Artificial Intelligence Is Everywhere” Smithsonian Magazine Humbly, please inform me about 2018 “AI” textbooks based on this article? Humbly, please inform me about major movie projects based on this article?

NEW ON DISCOVER
OPEN
CITIZEN SCIENCE
ADVERTISEMENT

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

ADVERTISEMENT

See More

ADVERTISEMENT
Collapse bottom bar
+