Humans vs. Watson (Programmed by Humans): Who Has The Advantage?


DAY 3 UPDATE:  If you are a technology person, you had to be impressed.  We all know who won by now so I won’t belabor it.  Ken Jennings played better and made a game of it … at least for a while.  He seemed to anticipate the buzz a little bit better and got on a roll.

You may have noticed that Watson struggled in certain categories last night.  “Actors Who Direct” gave very short clues (or questions) like “The Great Debaters” for which the correct answer was “Who is Denzel Washington”.  For Watson, the longer the question, the better.  If it takes a longer time for Alex to read the question, Watson has more time to consider candidate answers, evidence scores and confidence rankings.  This is another reason why Watson does better in certain categories.  In an attempt to remain competitive in this situation, Watson has multiple ways to process clues or questions.  There is what is called the “short path” (to an answer).  This is used for shorter questions when Watson has less time to decide whether to buzz in or not.  Watson is more inconsistent when it has to answer faster.  As seen last night, he either chose not to answer or Ken and Brad beat him to it.

In the end, the margin of victory was decisive for Watson.  In total, $1.25 million was donated to charity and Ken and Brad took home a parting gifts of $150,000 and $100,000 respectively … pretty good for all involved.  The real winners are science and technology.   This is a major advance in computing that could revolutionize the way we interact with computers … especially with questions and answers.  The commercial applications seem endless.

DAY 2 UPDATE:  Last night was compelling to watch.  I was at the Washington, DC viewing event with several hundred customers, partners and IBMers.  The atmosphere in the briefing center was electric.  When the game started with Watson taking command, the room erupted in cheers.  After Watson got on a roll, and steamrolled Brad and Ken for most of Double Jeopardy, the room began to grow silent in awe of what was happening. 

Erik Mueller (IBM Research) was our featured speaker.  He was bombarded … before, during and after the match with questions like “How does he know what to bet?”  “How does Watson process text?”  How would this be used in medical research?”  “What books were in Watson’s knowledge base?”  “Can Watson hear?” “Does he have to press a button like the human contestants?” and many more.

I was there as a subject matter expert and even though the spotlight was rightfully on Eric, I did get to answer a question on how some of Watson’s technology was being used today.  I explained how our IBM Content Analytics is used and how it is helping to power Watson’s natural language prowess.

When Watson incorrectly answered “What is Toronto????” in Final Jeopardy, the room audibly gasped (myself included).  As everyone seemed to hold their breath, I looked at Erik and he was smiling like a Cheshire cat … brimming with confidence.  The room cheered and applauded when the Watson’s small bet was revealed … a seeming acknowledgement to the technological brilliance.  Applause for a wrong answer!

Afterwards, there were many ideas on how Watson could be applied.  My favorite was from a legal industry colleague who had a number of suggestions for how Watson could optimize document review and analysis that is currently a problem for judges and litigators.

Yesterday (below) I said the human’s have a slight advantage.  And while Watson has built an impressive lead, I still feel that way.  Many of yesterday’s categories played to Watson’s fact based strengths.  It could go the other way tonight and Brad and Ken could get right back into the match.  The second game will air tonight in its entirety and the scores from both games will be combined to determine the $1 million prize winner.  Watson is entering tonight with a more than $25,000 lead.  IBM is donating all prize winnings to charity and Ken Jennings and Brad Rutter are donating 50% to charity.

DAY 1 POST:  After Day 1, Watson is tied with Brad Rutter at $5,000 going into Double Jeopardy – which is pretty impressive.  Ken Jennings has yet to catch his stride.  Brad and Ken seemed a little shell shocked at first, but Brad rebounded right when Watson was faltering towards the end of the first round.  This got me to thinking I should go into a little more detail about who really has the advantage … Watson or the humans? 

If you watched it last night, you may have observed that Watson does very well with factual questions.  He did very well in the Beatles song category – they were mostly facts with contextual references to lyrics.  Answers that involve multiple facts, all of which are required to answer the correct response but are unlikely to be found the same place, are much harder for Watson.  This is why Watson missed the Harry Potter question involving Lord Voldemort.  Watson also switched categories frequently which is part of his game strategy.  You may have also noticed that Watson can’t see or hear.  He answered a question wrong even though Ken gave the same wrong answer seconds before.  More on this later in the post.

Here goes … my take on who has the advantage …

Question Understanding :  Advantage Humans

Humans:  Seemingly Effortless.  Almost instantly knows what is being asked, what is important and how it applies – very naturally gets focus, references, hints, puns, implications, etc.

Watson:  Hugely Challenging.  Has to be programmed to analyze enormous numbers of possibilities to get just a hint of the relevant meaning.  Very difficult due to variability, implicit context, ambiguityof structure and meaning in language.

Language Understanding:  Advantage Humans

Humans:  Seemingly Effortless.  Powerful, general, deep and fast in understanding language – reading, experiencing, summarizing, storing knowledge in natural language.  This information is written for human consumption so reading and understanding what it says is natural for humans.

Watson:  Hugely Challenging.  Answers need to be determined and justified in natural language sources like news articles, reference texts, plays, novels, etc.  Watson must be carefully programmed and automatically trained to deeply analyze even just tiny subsets of language effectively.  Very different from web search, must find a precise answer and understand enough of what it read to know if and why a possible answer may be correct.

Self‐Knowledge (Confidence):  Advantage Humans

Humans:  Seemingly Effortless.  Most often, and almost instantly, humans know if they know the answer.

Watson:  Hugely Challenging.  1000’s of algorithms run in parallel to find and analyze 1000’s of written texts for many different types of evidence.  The results are combined, scored and weighed for their relative importance – how much they justify a candidate answer.  This has to happen in 3 seconds to compute a confidence and decide whether or not to ring-in before it is too late.

Breadth of Knowledge:  Advantage Humans

Humans:  Limited by self-contained memory.  Estimates of >1000’s of terabytes are all much higher than Watson’s memory capacity.  Ability to flexibly understand and summarize human relevance means that humans’ raw input capacity is even higher.

Watson:  Limited by self‐contained memory.  Roughly 1 Million books worth of content stored and processed in 15 Terabytes of working memory.  Weaker ability to meaningfully understand, relate and summarize human‐relevant content.  Must look at lots of data to compute statistical relevance.

Processing Speed:  Advantage Humans

Humans:  Fast Accurate Language Processing.  Native, strong, fast, language abilities.  Highly associative, highly flexible memory and speedy recall.  Very fast to speed read clue, accurately grasp question, determine confidence and answer – in just seconds. 

Watson:  Hugely Challenging.  On 1 CPU Watson can take over 2 hours to answer to a typical Jeopardy! question.  Watson must be parallelized, perhaps in ways similar to the brain, to simultaneously use 1000’s of compute cores to compete against humans in the 3-5 second range.

Reaction Speed:  Toss-up

Humans:  Times the Buzz.  Slower raw reaction speed but potentially faster to the buzz.  Listens to clue and anticipates when to buzz in.  “Timing the buzz” like this providing humans with the fastest absolute possible response time.

Watson:  Fast Hand.  More consistently deliver’s a fast reaction time but ONLY IF and WHEN can determine high enough confidence in time to buzz‐in.  Not able to anticipate when to buzz‐in based on listening to clue, which gives fastest possible response time to humans.  Also has to press same mechanical button as humans do.

Compute Power:  Won’t Impact Outcome

Humans:  Requires 1 brain that fits in a shoebox, can run on a tuna‐fish sandwich and be cooled with a hand‐held paper fan.

Watson:  Hugely Challenging.  Needs 2,880 compute cores (10 refrigerators worth in size and space) requiring about 80Kw of power and 20 tons of cooling.

Betting and Strategy:  Advantage Watson

Humans:  Slower, typically less precise.  Uses strategy and adjusts based on situation and game position.

Watson: Faster, more accurate calculations.  Uses strategy and adjusts based on situation and game position.

Emotions:  Advantage Watson

Humans:  Yes. Can slow down and /or confuse processing.

Watson:  No. Does NOT get nervous, tired, upset or psyched out (but the Watson programming team does!).

In-Game Learning:  Advantage Humans

Humans:  Learn very quickly from context, voice expression and (mostly importantly) right and wrong answers.

Watson:  Watson does not have the ability to hear (speech to text).  It is my understanding that Watson is “fed” the correct answer (in text) after each question so he can learn about the category even if he gets it wrong or does not answer.  However, I don’t believe he is “fed” the wrong answers though.  This is a disadvantage for Watson.  As seen last night, it is not uncommon for him to answer with the same wrong answer as another contestant.  This also happened in the sparring rounds leading up to the taping of last nights show.

As you can see things are closely matched but a slight advantage has to go to Ken and Brad.

And what about Watson’s face?

Another observation I made was how cool Watson’s avatar was.  It actually expresses what he is thinking (or processing).  The Watson avatar shares the graphic structure and tonality of the IBM Smarter Planet marketing campaign; a global map projection with a halo of “thought rays.”  The avatar features dozens of differentiated animation states that mirror the many stages of Jeopardy! gameplay – from choosing categories and answering clues, to winning and losing, to making Daily Double wagers and playing Final Jeopardy!.  Even Watson’s level of confidence – the numeric threshold that determines whether or not Watson will buzz in to answer – is made visible.  Watson’s stage presence is designed to depict the interior processes of the advanced computing system that powers it.  A significant portion of the avatar consists of colored threads orbiting around a central core.  The threads and thought rays that make up Watson’s avatar change color and speed depending on what happens during the game.  For example, when Watson feels confident in an answer the rays on the avatar turn green; they turn orange when Watson gets the answer wrong.  You will see the avatar speed up and activate when Watson’s algorithms are working hard to answer a clue.

I’ll be glued to the TV tonight and tomorrow.  Regardless of the outcome, this whole experience has been fascinating to me … so much so that I just published a new podcast on ECM, Content Analytics and Watson.

You can also visit my previous blog postings on Watson at: IBM at 100:  A Computer Called Watson“What is Content Analytics?, Alex”, 10 Things You Need to Know About the Technology Behind Watson and Goodbye Search … It’s About Finding Answers … Enter Watson vs. Jeopardy!

19 thoughts on “Humans vs. Watson (Programmed by Humans): Who Has The Advantage?

  1. Excellent post, Craig!

    And thanks for answering the question my brother Bill asked me last night about how much power Watson needs. Bill also wondered if there was a spike in power consumption during the average 3 seconds it takes to do the analysis.

  2. Pingback: Quora

  3. Excellent comparison! I wonder how many Compute Power’s will have to be continually added to keep up with new emotions, intelligence, categories and information continually created, gathered and consumed by humans. How fast is Watson updating his knowledge? Is he current? I heard it uses 15TB of RAM. Fascinating!

  4. Craig… excellent comments of advantages and disadvantages. Whether Watson loses at Jeopardy or not – IBM and the future of Content Analytics still wins out. ;-)

  5. Excellent post, thank you!

    My excitement about three game nights – oh, through the roof to say the least, but can you tell us a bit about the folks at IBM: how much excitement is in the air over there? Are you guys even able to sleep at night? How is it to be in the audience and watch Watson go through these games? Does it feel like it’s your child out there competing?!

  6. Interested to see round 2 as well. One thing I noticed was that there is a good programming ‘finish’ on Watson, such as the ability to pronounce French words properly (I forget which one it was, I just remember it was done smoothly). It would be interesting to see how the game would play if there were speech to text and/or voice stress analysis, and when the opponents were stressing Watson would lower the acceptable confidence level threshhold a bit to press the buzzer and answer more often, thus inducing additional stress in the opponents and throwing them off their game, not that getting tied or beaten by a machine wouldn’t be stressful enough.

  7. Excellent post Craig! My favorite line was “Humans: Requires 1 brain that fits in a shoebox, can run on a tuna‐fish sandwich and be cooled with a hand‐held paper fan.”

  8. Your analysis of the buzzer advantage is very skimpy, and simply does not answer the question – - how and when does Watson receive the “input” that “it is now permitted to ring in”. It has been emphatically stated that Watson cannot “hear” or “see”. So, I think that there are only two possibilities:

    1. Watson begins a cycle of pushing the buzzer as soon as an answer has been reached. Humans appear to do this, hoping to tie it “just right”. If this is the case, then Jeopardy hardware engineers should release information on how long is the “lockout” period, and how is a “ring-in” signal parsed. Is it a momentary contact switch? (it probably is), and if so, then the lockout period is presumably tied to the beginning of the click (which still has a defined duration). If the buzzer is an old-style (analog) switch that remains in the “on” (closed) position as long as the threshold pressure is applied, then presumably the lockout period begins when the thumb releases. Either way, there is a very precise optimal frequency for the cycle of depressing the button. It caqn be engineered to perfection, in a way that ahuman could never fully mimic.

    2. Alternatively – - and this seems slightly more likely – - just as the question is provided through an elecronic input… the “go” signal (the lights on the side of the video monitor screen(s) is likewise provided as an electronic input to Watson. Since most good contestants have arrived at an answer well before the question has been fully read aloud, ersponding to this “go” signal clearly can be engineered so that the machine can always win. If a human can respond within two milliseconds, then a machine can ring in with one millisecond. If a human can ring in within FIVE _tenths_ of a millisecond (unlikely), then a machine can be engineered to respond in under FOUR.

    I think that scenario two provides the best way to create a reasonable simulation of a level playing field – - make Watson “watch for” the “go” signal (i.e., the light to come on at the periphery of the video monitor screen(s). And as a corollary, imposr on Watson a random “blink rate” comparable to a spectrum of actual contestants. A blink rate would mean that for any given interval of “x milliseconds”, Watson might have a random chance of “not having its eyes open” at precisely the minimum interval after the go signal begins.

    One final, and mostly unrelated point. On various web sites, some commenters have suggested that Watson _should_ be required to “erad” the question, using a visual character recognition approach. This is pointless, I am pretty sure. Since the character type face (font) is pretty much always the same, visual character recogntion would be child’s play to translate what appears on the screen into text – - in literally sub-millisecond time. No need to require Watson to “read” the question(s).

    Anyone who is not impressed by what the IBM ersearch team produced for this project really does not understand computers, not analytics.

    But the robotics won the game – - at the buzzer.

    In some contests, a machine would always win… weightlifting? javelin throw? A competition of humans against machien would be silly. But ice dancing? mogul skiing? at least for the next few decades, robotics will be unable to match humans.

    • Rex – Sorry for the “skimpiness”. Let me try to clarify. When the question is revealed, all contestants can “read” the text. The test is fed to Watson at the same time the humans see it. None of the contestants can answer until Alex is finishing reading the question and the board lights up. Since Watson cannot see or hear, he cannot take advantage of visual or auditory clues to help determine when to buzz in. He cannot anticipate when Alex will stop speaking and when the light signaling it’s OK to answer will come on. Brad and Ken had an enormous advantage with their ability to anticipate the light and buzz in ahead of Watson. In short, Watson does have to wait for the “go” light.

      Watson’s advantage is in consistency of the buzz. He consistently buzzes in quickly if his confidence is high enough but has the same mechanical limitations of pushing the button. A human like Ken, who is known to be fast at buzzing has a serious advantage if his timing is on.

      Watson has multiple ways to process the questions. There is what is called the “short” path (to an answer). This is used for shorter questions when Watson has less time to decide whether to buzz in or not. When the question is longer, Watson has more time to consider candidate answers and evidence / confidence rankings. This is another reason why Watson does better in certain categories. For Watson, the longer the question, the better.

      • I don’t think you have improved your answer.

        <>

        They ALL have to wait for the “go” light. The ability to anticipate when it will illuminate doesn’t mean that the human can get a buzz-in even one microsecond BEFORE the go light comes on.

        So, the question remains unanswered – - since Watson cannot see, how is the input provided to say “the go list is now on”? There are only two [pssible answers: 1) Watson gets an electronic input, or 2) Watson NEVER knows when it goes on, and just starts signaling at some (optimum?) frequency as soon as an answer has met the target confidence value.

        If Watson receives the input electronically, then the path from input ("seeing the light" for a human) to actuating the buzzer is inherently shorter for machine than for human - - from retina to visual cortex, to motor cortex, to thumb... requires a physiologically defined time interval. It is probably measured in milliseconds - - some humans might average 2-3, others 5-10, others slower stil, maybe no better than 20 or even 50 even if they are completely confident they have the correct response. But no human can push that time interval lower and lower and lower, with practice. If the fastest human on earth ultimately gets the response time down to a few hundred microseconds, that skill hits a threshold - - it can NOT go sub-microsecond, sub nano-second, etc.

        But Watson could easily have an algorithm (separate from the language processing and scoring the candidate responses) that completely trumps humans every time: "go" light input [true] AND confidence threshold [sufficient] => actuate the buzzer – - this really can be in the sub-microsecond range.

        I remain convinced that making this playing field level requires the introduction of a “blink rate”. Require sensing of the actual “go” light, and make it vulnerable – - just as humans are – - to “not having your eyes open” at that precise microsecond window when the light illuminates. For humans, the eyelid is a physical barrier to initiating the retina-to-brain-to-thumb sequence.

        Watson needs to have eyelids too; otherwise, the power of the language processing and response scoring — the skill that humans call “instant recall” is being artificially magnified by the inherently faster efferent-afferent loop, created by engineers.

        I bet anything that the same IBM team can create a system that can consistently beat all humans at “rock-paper-scissors” where language processing and response scoring are NOT part of the competition – - in the final upswing/downswing of the human hand, the intended “play” will be visually discernible, if the video sampling rate is high enough. Substituting a choice that is based on “knowing” (by “seeing” the human opponent’s hand start to form its choice) can be done, by a mechanical system with microsecond (or sub-microsecond) speed that no human could ever match.

      • Yes Rex – they all have to wait for the “go” light. Watson gets an electronic signal when it comes on. Bran and Ken can buzz when they see the light, or try to anticipate the light, and gain an advantage by buzzing in ahead of Watson. Since this cannot be precisly measured, due to human variances in response, it is estimated Watson’s response time is roughly the same as human contestants. Thanks for your input.

  9. craigrhinehart :Yes Rex – they all have to wait for the “go” light. Watson gets an electronic signal when it comes on. Bran and Ken can buzz when they see the light, or try to anticipate the light, and gain an advantage by buzzing in ahead of Watson. Since this cannot be precisly measured, due to human variances in response, it is estimated Watson’s response time is roughly the same as human contestants. Thanks for your input.

    I guess that we will simply have to agree to disagree.

    You asset that…

    “it is estimated Watson’s response time is roughly the same as human contestants”

    I ask… “estimated by whom?”

    It seems equally valid for me to assert that…

    “it is estimated Watson’s response time is always faster than human contestants”

    I do not understand the basis for believing that the millisecond timing involved cannot be measured. Of course it can.

    • Rex – Thanks for your opinions on the topic. As I explained before, Watson’s timing is constant … and yes it is fast. It is not always the fastest though and it would be inaccurate to say that. Humans are inconsistent … they are sometimes faster at buzzing and sometimes not. Ken Jennings has acknowledged publicly he is able to beat Watson to the buzz. I guess he might know. Besides, if Watson is as unfairly fast as you claim, he would have beaten them to the buzz every single time. This obviously did not happen.

      In my opinion, the buzzing part is not what is impressive anyway. Answering such a high percentage of questions accurately is though. Even on questions where Brad or Ken buzzed in ahead of Watson, you could see that Watson had the right answer in many cases.

  10. I do not normally watch this sort of television, but this provided a very new twist to “reality tv” computing technology has made significant progress as demonstrated in ability to process natural language in near or better speeds then humans. He was not perfect, but then, humans do the same thing rely on incomplete or inaccurate information to formulate decisions. BRAVE NEW WORLD !

  11. Pingback: Watson at Jeopardy – A Race Of Machines ? « My missives

  12. Pingback: “Hey, Watson! Is Santa real?” – Why IBM Watson is a naïve 6-year old… « For what it's worth…

  13. Pingback: TV Re-runs, Watson and My Blog « Craig Rhinehart's ECM Insights

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s