Jeopardy

By Clive Thompson
The New York Times, June 14, 2010

Edited by Andy Ross

The new IBM supercomputer system Watson can understand a question posed in natural language and respond with a precise, factual answer. This fall, the producers of the TV quiz show Jeopardy will pit Watson against some of the game's best former players.

To test Watson's capabilities against humans, IBM scientists have began holding live Jeopardy tests. By the end of one day's testing, the human contestants were impressed, and even slightly unnerved, by Watson. Several made references to Skynet, the computer system in the Terminator movies that achieves consciousness and decides humanity should be destroyed.

IBM has a knack for pitting man against machine. In 1997, the company's supercomputer Deep Blue famously beat the grandmaster Garry Kasparov at chess. But this time, IBM wanted a grand challenge that would meet a real-world need.

When an IBM executive suggested taking on Jeopardy he was immediately pooh-poohed. Deep Blue played chess well because the game is perfectly logical and can be reduced easily to math. But the rules of language are much trickier. Jeopardy's witty, punning questions are especially tricky. And winning requires finding an answer in a few seconds.

David Ferrucci, IBM senior manager for its Semantic Analysis and Integration department, heads the Watson project. An AI researcher who has long specialized in question-answering systems, Ferrucci chafed at the slow progress in the field. But he craved an ambitious goal that would break new ground. Jeopardy fit the bill.

Computer scientists now use statistics to analyze huge piles of documents, like books and news stories. Algorithms take any subject and automatically learn what types of words are most strongly correlated with it. In theory, this sort of statistical computation has been possible for decades, but it was impractical. All that changed in the last ten years. Computer power became drastically cheaper and the amount of online text exploded.

In 2006, Ferrucci tested IBM's most advanced system by giving it 500 questions from previous Jeopardy shows. The results plotted on a graph and compared with human Jeopardy winners were dismal. But Ferrucci argued that with new hardware he could make faster progress than ever before. If they could succeed at Jeopardy, IBM could bring the technology to market as question-answering systems. In 2007, his bosses gave him three to five years and a team of developers.

Watson has enormous speed and memory. Ferrucci's team input millions of documents into Watson to build up its knowledge base, including books, reference material, any sort of dictionary, thesauri, folksonomies, taxonomies, encyclopedias, novels, and so on.

Watson is fast enough to try thousands of parallel ways of tackling a Jeopardy clue. Ferrucci decided that previous systems don't work well because no single algorithm can simulate the human ability to parse language and facts. Instead, Watson uses parallel algorithms to analyze a question in different ways, generating hundreds of possible solutions, and ranks these answers according to plausibility. Watson produces an enormous number of possibilities, then ranks them by assessing how likely each one is to answer the question.

By 2008, Watson had edged into the Jeopardy winner's cloud on the graph. IBM executives called up Harry Friedman, the executive producer of the show, and suggested putting Watson on the air. Friedman quickly accepted the challenge: "Because it's IBM, we took it seriously."

Jeopardy will hold a special match pitting Watson against one or more famous winners from the past. If the contest includes the very best players, Watson may lose. It's pretty far up in the winner's cloud, but it's not yet at the top.

Ferrucci says his team will continue to fine-tune Watson, but improving its performance is getting harder. "When we first started, we'd add a new algorithm and it would improve the performance by 10 percent, 15 percent," he says. "Now it'll be like half a percent is a good improvement." Watson might lose merely because of bad luck.

IBM plans to sell versions of Watson to companies in the next year or two. Watson could help decision-makers sift through enormous piles of written material in seconds. Its speed and quality could make it part of rapid-fire decision-making, with users talking to Watson to guide their thinking process.

At first, a Watson system could cost several million dollars, because it needs to run on a big IBM supercomputer. But within ten years an artificial brain like Watson could run on a much cheaper server, affordable by any small firm, and later perhaps even on a laptop.

Watson-level AI could make it easier for citizens to get answers quickly from big bureaucracies. But critics wonder about the wisdom of relying on AI systems in the face of complex reality. And while service companies can save money by relying more on such systems, customers crave the ability to talk to a real human on the phone.

A lot of human knowledge isn't represented in words alone, and a computer won't learn that stuff just by encoding English language texts, as Watson does.

Watson can answer only questions asking for an objectively knowable fact. It cannot produce an answer that requires judgment. Watson doesn't come close to replicating human wisdom.
 

AR  I'm enthused. In 1991, at Springer, I edited a book on IBM research in question-answering systems (LNAI 546 on LILOG) and saw that there was real hope of cracking the natural dialog challenge. Than, at SAP, my team's TREX engine showed how close we were to having the hardware and the algorithms (for parallel statistical evaluations) to make working systems for answering questions. Other IBM projects over the years have been closing in on this goal. Now at last the prize is almost in our grasp: first Jeopardy and then ... Globorg!