Well before Ken Jennings and Brad Rutter, IBM’s design team grappled with a different challenge – getting beaten to the punch by someone else inventing a trivia-savvy artificial mind. Final Jeopardy discusses Watson’s early development and how this Q&A juggernaut overcame the “Basement Baseline.”
In the early days of 2007, before he agreed to head up a Jeopardy project, IBM’s David Ferrucci harboured two conflicting fears. The first of his nightmare scenarios was perfectly natural: A Jeopardy computer would fail, embarrassing the company and his team.
But his second concern, failure’s diabolical twin, was perhaps even more terrifying. What if IBM spent tens of millions of dollars and devoted centuries of researcher years to this project, played it up in the press, and then saw someone beat them to it? Ferrucci pictured a solitary hacker in a garage, cobbling together free software from the Web and maybe hitching it to Wikipedia and other online databases. What if the Jeopardy challenge turned out to be not too hard but too easy?
That would be worse, far worse, than failure. IBM would become the laughingstock of the tech world, an old-line company completely out of touch with the technology revolution – precisely what its corporate customers paid it billions of dollars to track. Ferrucci’s first order of business was to make sure that this could never happen. “It was due diligence,” he later said.
He had a new researcher on his team, James Fan, a young Chinese American with a fresh doctorate from the University of Texas. As a newcomer, Fan was free of institutional pre-conceptions about how Q-A systems should work. He had no history with the annual government-sponsored competitions, in which IBM’s technology routinely botched two questions for every one it got right. Trim and soft-spoken, his new IBM badge hanging around his neck, Fan was an outsider. And he now faced a singular assignment: to build a Jeopardy computer all by himself. He was given 500 Jeopardy clues to train his machine and one month to make it smart. His system would be known as Basement Baseline.
So on a February day in 2007, James Fan set out to program a Q-A machine from scratch. He started by drawing up an inventory of the software tools and reference documents he thought he’d need. First would be a so-called type system. This would help the computer figure out if it was looking for a person, place, animal, or thing. After all, if it didn’t know what it was looking for, finding an answer was little more than a crap-shoot; generating enough “confidence” to bet on that answer would be impossible. For humans, distinguishing President George Washington from the bridge named after him isn’t much of a challenge. Context makes it clear. Bridges don’t deliver inaugural addresses; presidents are rarely jammed at rush hour, with half-hour delays from Jersey. What’s more, when placed in sentences, people usually behave differently than roads or bridges.
But what’s simple for us involved hard work for Fan’s Q-A computer. It had to comb through the structure of the question, picking out the subjects, objects, and prepositions. Then it had to consult exhaustive reference lists that had been built up in the industry over decades, laying out hundreds of thousands of places, things and actions and the web of relationships among them. These were known as “ontologies”. Think of them as cheat sheets for computers. If a
finger was a subject, for example, it fell into human anatomy and was related to the hand and the thumb and to verbs such as “to point” and “to pluck”. (Conversely, when “the finger” turned up as the object of the verb “to give”, a sophisticated ontology might steer the computer toward the neighbourhood of insults, gestures, and obscenities.)
In any case, Fan needed both a type system and a knowledge base to understand questions and hunt for answers. He didn’t have either, so he took a hacker’s shortcut and used Google and Wikipedia. (While the true Jeopardy computer would have to store its knowledge in its “head”, prototypes like Fan’s were free to search the Web.) From time to time, Fan found, if he typed a clue into Google, it led him to a Wikipedia page – and the subject of the page turned out to be the answer. The following clue, for example, would confound even the most linguistically adept computer. In the category The Author Twitters, it reads: “Czech out my short story ‘A Hunger Artist’! Tweet done. Max Brod, pls burn my laptop.”
A good human Jeopardy player would see past the crazy syntax, quickly recognising the short story as one written by Franz Kafka, along with a reference to Kafka’s Czech nationality and his longtime associate Max Brod. In the same way, a search engine would zero in on those helpful key words and pay scant attention to the sentence surrounding them. When Fan typed the clue into Google, the first Wikipedia page that popped up was “Franz Kafka,” the correct answer. This was a primitive method. And Fan knew that a computer relying on it would botch the great majority of Jeopardy clues. It would be crashing and burning in the game against even ignorant humans, let alone Ken Jennings. But one or two times out of ten, it worked. For Fan, it was a start.
The month passed. Fan added more features to Basement Baseline. But at the end, the system was still missing vital components. Most important, it had no mechanism for gauging its level of confidence in its answers. “I didn’t have time to build one,” Fan said. This meant the computer didn’t know what it knew. In a game, it wouldn’t have any idea when to buzz. In the end, Fan blew off game strategy entirely and focused simply on building a machine that could answer Jeopardy clues.
It was on a March morning at IBM labs in Hawthorne, NY, that James Fan’s Basement Baseline faced off against Big Blue’s in-house question-answering system, known as Piquant. The results, from Ferrucci’s perspective, were ideal. The Piquant system succeeded on only 30 percent of the clues, far below the level needed for Jeopardy. It had high confidence on only 5 per cent of them, and of those it got only 47 per cent right. Fan’s Basement Baseline fared almost as well by a number of measures but was still woefully short of what was needed. Fan proved that a hacker’s concoction was far from Jeopardy standards – which was a relief. But by nearly matching the company’s state-of-the-art in Q-A technology, he highlighted its inadequacies.
The Jeopardy challenge, it was clear, would require another program, another technology platform, and a far bolder approach. The job, Ferrucci said, called for “the most sophisticated intelligence architecture the world has ever seen.” He proceeded to tell his bosses that he would lead a team to assemble a Jeopardy machine – provided that they gave him the resources to build a big one.
Stephen Baker was BusinessWeek’s senior technology writer for a decade, based first in Paris and later in New York. He has also written for the Los Angeles Times, the Boston Globe, and the Wall Street Journal. Roger Lowenstein called his first book, THE NUMERATI, “an eye-opening and chilling book.” Baker blogs at finaljeopardy.net.
Final Jeopardy: Man vs. Machine and the Quest to Know Everything is also available from Amazon.com
Excerpted from FINAL JEOPARDY: Man vs. Machine and the Quest to Know Everything by Stephen Baker. Copyright © 2011 by Stephen Baker. Used by permission of Houghton Mifflin Harcourt Publishing Company. All rights reserved.
Top Art Courtesy of The Associated Press