Watson and The Future of ECM

In the past, I have whipped out my ECM powered crystal ball to pontificate about the future of Enterprise Content Management.  These are always fun to write and share (see Top 10 ECM Pet Peeve Predictions for 2011  and Crystal Ball Gazing … Enterprise Content Management 2020).  This one is a little different though …  on the eve of the AIIM International Conference and Expo at info360, I find myself wondering … what are we going to do with all this new social content … all of these content based conversations in all of their various forms?

We’ve seen the rise of the Systems of Engagement concept and number of new systems that enable social business.  We’re adopting new ways to work together leveraging technologies like collaborative content, wikis, communities, RSS and much more.  All of this new content being generated is text based and expressed in natural language.  I suggest you read AIIM’s report Systems of Engagement and the Future of Enterprise IT: A Sea Change in Enterprise for a perspective on the management aspects of the future of ECM.  It lays out how organizations must think about information management, control, and governance in order to deal with social technologies.

Social business is not just inside the firewall though.  Blogs, wikis and social network conversations are giving consumers and businesses a voice and power they’ve never have before … again based in text and expressed in natural language.  This is a big deal.  770 million people worldwide visited a social networking site last year (according to a comScore report titled Social Networking Phenomenon) … and amazingly, over 500 billion impressions annually are being made about products and services (according to a new book Empowered written by Josh Bernoff and Ted Schadler).

But what is buried in these text based natural language conversations?  There is an amazing amout of information trapped inside.  With all these conversations happening between colleagues, customers and partners … what can we learn from our customers about product quality, customer experience, price, value, service and more?  What can we learn from our internal conversations as well?  What is locked in these threads and related documents about strategy, projects, issues, risks and business outcomes.

We have to find out!  We have to put this information to work for us.

But guess what?  The old tools don’t work.  Data analysis is a powerful thing but don’t expect today’s business intelligence tools to understand language and threaded conversations.  When you analyze data … a 5 is always a 5.  You don’t have to understand what a 5 is or figure out what it means.  You just have to calculate it against other numeric indicators and metrics.

Content … and all of the related conversations aren’t numeric.  You must start by understanding what it all means, which is why understanding natural language is key.  Historically, computers have failed at this.  New tools and techniques are needed because content is a whole different challenge.  A very big challenge.  Think about it … a “5” represents a value, the same value, every single time.  There is no ambiguity.  In natural language, the word “premiere” could be a noun, verb or adjective.  It could be a title of a person, an action or the first night of a theatre play.  Natural language is full of ambiguity … it is nuanced and filled with contextual references.  Subtle meaning, irony, riddles, acronyms, idioms, abbreviations and other language complexities all present unique computing challenges not found with structured data.  This is precisely why IBM chose Jeopardy! as a way to showcase the Watson breakthrough.

IBM Watson (DeepQA) is the world’s most advanced question answering machine that uncovers answers by understanding the meaning buried in the context of a natural language question.  By combining advanced Natural Language Processing (NLP) and DeepQA automatic question answering technology, Watson represents the future of content and data management, analytics, and systems design.  IBM Watson leverages core content analysis, along with a number of other advanced technologies, to arrive at a single, precise answer within a very short period of time.  The business applications for this technology are limitless starting with clinical healthcare, customer care, government intelligence and beyond.

You can read some of my other blog postings on Watson (see “What is Content Analytics?, Alex”, 10 Things You Need to Know About the Technology Behind Watson and Goodbye Search … It’s About Finding Answers … Enter Watson vs. Jeopardy! … or better yet … if you want to know how Watson actually works, hear it live at my AIIM / info360 main stage session IBM Watson and the Impact on ECM this coming Wednesday 3/23 at 9:30 am.

BLOG UPDATE:  Here is a link to the slides used at the AIIM / info360 keynote.

Back to my crystal ball … my prediction is that natural language based computing and related analysis is the next big wave of computing and will shape the future of ECM.  Watson is an enabling breakthrough and is the start of something big.  With all this new information, we’ll want to use to understand what is being said, and why, in all of these conversations.  Most of all, we’ll want to leverage this new found insight for business advantage.  One compelling and obvious example is to be to answer age old customer questions like “Are our customers happy with us?” “How happy” “Are they so happy, we should try to sell something else?” … or … “Are our customers unhappy?” “Are they so unhappy, we should offer them something to prevent churn?” Undestanding the customer trends and emerging opportunities across a large set of text based conversations (letters, calls, emails, web postings and more) is now possible.

Who wouldn’t want to undertstand their customers, partners, constituents and employees better?  Beyond this, Watson will be applied to industries like healthcare to help doctors more effectively diagnose diseases and this is just the beginning.  Organizations everywhere will want to unlock the insights trapped in their enterprise content and leverage all of these conversations … in ways we haven’t even thought of yet … but I’ll save that for the next time I use my ECM crystal ball.

As always … leave me your thoughts and ideas here and hope to see you Wednesday at The AIIM International Conference and Expo at info360 http://www.aiimexpo.com/.

IBM at 100: UPC … The Transformation of Retail

In my continuing series of IBM at 100 achievements … this is one of my favorites of all the ones I plan to republish here. The humble Universal Product Code (UPC), also known as the bar code, along with the related deployment of scanners, fundamentally changed many of the practices of retailers and all organizations that buy and move things, from large industrial equipment to pencils purchased in stationery stores. These two technologies led to the use of in-store information processing systems in almost every industry around the world, applied to millions of types of goods and items. UPC is planet Earth’s most pervasive inventory tracking tool.

N. Joseph Woodland, later an IBMer but then working at Drexel Institute of Technology, applied for the first patent on bar code technology on October 20, 1949, and along with Bernard Silver, received the patent on October 7, 1952. And there it sat for more than two decades. In those days there was no way to read the codes, until the laser became a practical tool. About 1970 at IBM Research Triangle Park, George Laurer went to work on how to scan labels and to develop a digitally readable code. Soon a team formed to address the issue, including Woodland. Their first try was a bull’s-eye bar code; nobody was happy with it because it took up too much space on a carton.

Meanwhile, the grocery industry in post-war America was adapting to the boom in suburban supermarkets–seeking to automate checkout at stores to increase speed, drive down the cost of hiring so many checkout clerks and systematize in-store inventory management. Beginning in the 1960s, various industry task forces went to work defining requirements and technical specifications. In time the industry issued a request to computer companies to submit proposals.

IBM’s team had also reworked its design going to the now familiar rows of bars each containing multiple copies of data. Woodland, who had helped create the original bull’s-eye design, then later worked on the bar code, writing IBM’s response to the industry’s proposal. Another group of IBMers at the Rochester, Minnesota Laboratory built a prototype scanner using optics and lasers. In 1973, the grocery industry’s task force settled on a standard that very closely paralleled IBM’s approach. The industry wanted a standard that all grocers and their suppliers could use.

IBM was well positioned and became one of the earliest suppliers of scanning equipment to the supermarket world. On October 11, 1973, IBM became one of the earliest vendors to market with a system, called the IBM 3660. In time it became a workhorse in the industry. It included a point-of-sale terminal (digital cash register) and checkout scanner that could read the UPC symbol. The grocery industry compelled its suppliers of products in boxes and cans to start using the code, and IBM helped suppliers acquire the technology to work with the UPC.

On June 26, 1974, the first swipe was done at a Marsh’s supermarket in Troy, Ohio, which the industry had designated as a test facility. The first product swiped was a pack of Wrigley’s Juicy Fruit chewing gum, now on display at the Smithsonian’s National Museum of American History in Washington, D.C. Soon, grocery stores began adopting the new scanners, while customers were slowly educated on their accuracy in quoting prices.

If there had been any doubts about the new system’s prospects, they were gone by the end of the 1970s. The costs of checking out customers went down; the accuracy of transactions went up; checkouts sped up by some 40 percent; and in-store inventory systems dramatically improved management of goods on hand, on order or in need of replenishment. And that was just the beginning. An immediate byproduct was the ability of stores to start tracking the buying habits of customers in general and, later, down to the individual, scanning bar coded coupons and frequent shopper cards. In the four years between 1976 and 1980, the number of grocery stores using this technology jumped from 104 to 2,207, and they were spreading to other countries.

In the 1980s, IBM and its competitors introduced the new technology to other industries (including variations of the American standard bar codes that were adopted in Western Europe). And IBM Raleigh kept improving the technology. In December 1980, IBM introduced the 3687 scanner that used holographic technologies—one of the first commercial applications of this technology. In October 1987, the IBM 7636 Bar Code Scanner was introduced–and as a result, throughout the 1980s factories adopted the IBM bar code to track in-process inventory. Libraries used it to do the same with books. In the 1990s, hand-held scanners made it easier to apply bar codes to things beyond cartons and cans and to scan them, eventually using wireless technology. Meanwhile innovation expanded in the ability of a bar code to hold more information.

These technologies make it possible for all kinds of organizations, schools, universities and companies in all industries to leverage the power of computers to manage their inventories. In many countries, almost every item now purchased in a retail store has a UPC printed on it, and is scanned. UPC led to the retirement of the manual and electro-mechanical cash registers which, as a technology, had been around since the 1880s. By the early 2000s, bar code technologies had become a $17 billion business, scanned billions of times each day.

The full text of this article can be found on IBM at 100: http://www.ibm.com/ibm100/us/en/icons/upc/

Humans vs. Watson (Programmed by Humans): Who Has The Advantage?

DAY 3 UPDATE:  If you are a technology person, you had to be impressed.  We all know who won by now so I won’t belabor it.  Ken Jennings played better and made a game of it … at least for a while.  He seemed to anticipate the buzz a little bit better and got on a roll.

You may have noticed that Watson struggled in certain categories last night.  “Actors Who Direct” gave very short clues (or questions) like “The Great Debaters” for which the correct answer was “Who is Denzel Washington”.  For Watson, the longer the question, the better.  If it takes a longer time for Alex to read the question, Watson has more time to consider candidate answers, evidence scores and confidence rankings.  This is another reason why Watson does better in certain categories.  In an attempt to remain competitive in this situation, Watson has multiple ways to process clues or questions.  There is what is called the “short path” (to an answer).  This is used for shorter questions when Watson has less time to decide whether to buzz in or not.  Watson is more inconsistent when it has to answer faster.  As seen last night, he either chose not to answer or Ken and Brad beat him to it.

In the end, the margin of victory was decisive for Watson.  In total, $1.25 million was donated to charity and Ken and Brad took home a parting gifts of $150,000 and $100,000 respectively … pretty good for all involved.  The real winners are science and technology.   This is a major advance in computing that could revolutionize the way we interact with computers … especially with questions and answers.  The commercial applications seem endless.

DAY 2 UPDATE:  Last night was compelling to watch.  I was at the Washington, DC viewing event with several hundred customers, partners and IBMers.  The atmosphere in the briefing center was electric.  When the game started with Watson taking command, the room erupted in cheers.  After Watson got on a roll, and steamrolled Brad and Ken for most of Double Jeopardy, the room began to grow silent in awe of what was happening. 

Erik Mueller (IBM Research) was our featured speaker.  He was bombarded … before, during and after the match with questions like “How does he know what to bet?”  “How does Watson process text?”  How would this be used in medical research?”  “What books were in Watson’s knowledge base?”  “Can Watson hear?” “Does he have to press a button like the human contestants?” and many more.

I was there as a subject matter expert and even though the spotlight was rightfully on Eric, I did get to answer a question on how some of Watson’s technology was being used today.  I explained how our IBM Content Analytics is used and how it is helping to power Watson’s natural language prowess.

When Watson incorrectly answered “What is Toronto????” in Final Jeopardy, the room audibly gasped (myself included).  As everyone seemed to hold their breath, I looked at Erik and he was smiling like a Cheshire cat … brimming with confidence.  The room cheered and applauded when the Watson’s small bet was revealed … a seeming acknowledgement to the technological brilliance.  Applause for a wrong answer!

Afterwards, there were many ideas on how Watson could be applied.  My favorite was from a legal industry colleague who had a number of suggestions for how Watson could optimize document review and analysis that is currently a problem for judges and litigators.

Yesterday (below) I said the human’s have a slight advantage.  And while Watson has built an impressive lead, I still feel that way.  Many of yesterday’s categories played to Watson’s fact based strengths.  It could go the other way tonight and Brad and Ken could get right back into the match.  The second game will air tonight in its entirety and the scores from both games will be combined to determine the $1 million prize winner.  Watson is entering tonight with a more than $25,000 lead.  IBM is donating all prize winnings to charity and Ken Jennings and Brad Rutter are donating 50% to charity.

DAY 1 POST:  After Day 1, Watson is tied with Brad Rutter at $5,000 going into Double Jeopardy – which is pretty impressive.  Ken Jennings has yet to catch his stride.  Brad and Ken seemed a little shell shocked at first, but Brad rebounded right when Watson was faltering towards the end of the first round.  This got me to thinking I should go into a little more detail about who really has the advantage … Watson or the humans? 

If you watched it last night, you may have observed that Watson does very well with factual questions.  He did very well in the Beatles song category – they were mostly facts with contextual references to lyrics.  Answers that involve multiple facts, all of which are required to answer the correct response but are unlikely to be found the same place, are much harder for Watson.  This is why Watson missed the Harry Potter question involving Lord Voldemort.  Watson also switched categories frequently which is part of his game strategy.  You may have also noticed that Watson can’t see or hear.  He answered a question wrong even though Ken gave the same wrong answer seconds before.  More on this later in the post.

Here goes … my take on who has the advantage …

Question Understanding :  Advantage Humans

Humans:  Seemingly Effortless.  Almost instantly knows what is being asked, what is important and how it applies – very naturally gets focus, references, hints, puns, implications, etc.

Watson:  Hugely Challenging.  Has to be programmed to analyze enormous numbers of possibilities to get just a hint of the relevant meaning.  Very difficult due to variability, implicit context, ambiguityof structure and meaning in language.

Language Understanding:  Advantage Humans

Humans:  Seemingly Effortless.  Powerful, general, deep and fast in understanding language – reading, experiencing, summarizing, storing knowledge in natural language.  This information is written for human consumption so reading and understanding what it says is natural for humans.

Watson:  Hugely Challenging.  Answers need to be determined and justified in natural language sources like news articles, reference texts, plays, novels, etc.  Watson must be carefully programmed and automatically trained to deeply analyze even just tiny subsets of language effectively.  Very different from web search, must find a precise answer and understand enough of what it read to know if and why a possible answer may be correct.

Self‐Knowledge (Confidence):  Advantage Humans

Humans:  Seemingly Effortless.  Most often, and almost instantly, humans know if they know the answer.

Watson:  Hugely Challenging.  1000’s of algorithms run in parallel to find and analyze 1000’s of written texts for many different types of evidence.  The results are combined, scored and weighed for their relative importance – how much they justify a candidate answer.  This has to happen in 3 seconds to compute a confidence and decide whether or not to ring-in before it is too late.

Breadth of Knowledge:  Advantage Humans

Humans:  Limited by self-contained memory.  Estimates of >1000’s of terabytes are all much higher than Watson’s memory capacity.  Ability to flexibly understand and summarize human relevance means that humans’ raw input capacity is even higher.

Watson:  Limited by self‐contained memory.  Roughly 1 Million books worth of content stored and processed in 15 Terabytes of working memory.  Weaker ability to meaningfully understand, relate and summarize human‐relevant content.  Must look at lots of data to compute statistical relevance.

Processing Speed:  Advantage Humans

Humans:  Fast Accurate Language Processing.  Native, strong, fast, language abilities.  Highly associative, highly flexible memory and speedy recall.  Very fast to speed read clue, accurately grasp question, determine confidence and answer – in just seconds. 

Watson:  Hugely Challenging.  On 1 CPU Watson can take over 2 hours to answer to a typical Jeopardy! question.  Watson must be parallelized, perhaps in ways similar to the brain, to simultaneously use 1000’s of compute cores to compete against humans in the 3-5 second range.

Reaction Speed:  Toss-up

Humans:  Times the Buzz.  Slower raw reaction speed but potentially faster to the buzz.  Listens to clue and anticipates when to buzz in.  “Timing the buzz” like this providing humans with the fastest absolute possible response time.

Watson:  Fast Hand.  More consistently deliver’s a fast reaction time but ONLY IF and WHEN can determine high enough confidence in time to buzz‐in.  Not able to anticipate when to buzz‐in based on listening to clue, which gives fastest possible response time to humans.  Also has to press same mechanical button as humans do.

Compute Power:  Won’t Impact Outcome

Humans:  Requires 1 brain that fits in a shoebox, can run on a tuna‐fish sandwich and be cooled with a hand‐held paper fan.

Watson:  Hugely Challenging.  Needs 2,880 compute cores (10 refrigerators worth in size and space) requiring about 80Kw of power and 20 tons of cooling.

Betting and Strategy:  Advantage Watson

Humans:  Slower, typically less precise.  Uses strategy and adjusts based on situation and game position.

Watson: Faster, more accurate calculations.  Uses strategy and adjusts based on situation and game position.

Emotions:  Advantage Watson

Humans:  Yes. Can slow down and /or confuse processing.

Watson:  No. Does NOT get nervous, tired, upset or psyched out (but the Watson programming team does!).

In-Game Learning:  Advantage Humans

Humans:  Learn very quickly from context, voice expression and (mostly importantly) right and wrong answers.

Watson:  Watson does not have the ability to hear (speech to text).  It is my understanding that Watson is “fed” the correct answer (in text) after each question so he can learn about the category even if he gets it wrong or does not answer.  However, I don’t believe he is “fed” the wrong answers though.  This is a disadvantage for Watson.  As seen last night, it is not uncommon for him to answer with the same wrong answer as another contestant.  This also happened in the sparring rounds leading up to the taping of last nights show.

As you can see things are closely matched but a slight advantage has to go to Ken and Brad.

And what about Watson’s face?

Another observation I made was how cool Watson’s avatar was.  It actually expresses what he is thinking (or processing).  The Watson avatar shares the graphic structure and tonality of the IBM Smarter Planet marketing campaign; a global map projection with a halo of “thought rays.”  The avatar features dozens of differentiated animation states that mirror the many stages of Jeopardy! gameplay – from choosing categories and answering clues, to winning and losing, to making Daily Double wagers and playing Final Jeopardy!.  Even Watson’s level of confidence – the numeric threshold that determines whether or not Watson will buzz in to answer – is made visible.  Watson’s stage presence is designed to depict the interior processes of the advanced computing system that powers it.  A significant portion of the avatar consists of colored threads orbiting around a central core.  The threads and thought rays that make up Watson’s avatar change color and speed depending on what happens during the game.  For example, when Watson feels confident in an answer the rays on the avatar turn green; they turn orange when Watson gets the answer wrong.  You will see the avatar speed up and activate when Watson’s algorithms are working hard to answer a clue.

I’ll be glued to the TV tonight and tomorrow.  Regardless of the outcome, this whole experience has been fascinating to me … so much so that I just published a new podcast on ECM, Content Analytics and Watson.

You can also visit my previous blog postings on Watson at: IBM at 100:  A Computer Called Watson“What is Content Analytics?, Alex”, 10 Things You Need to Know About the Technology Behind Watson and Goodbye Search … It’s About Finding Answers … Enter Watson vs. Jeopardy!

IBM at 100: A Computer Called Watson

Watson is an efficient analytical engine that pulls many sources of data together in real-time, leverages natural language processing, discovers an insight, and deciphers a degree of confidence.

In my continuing series of IBM at 100 achievements, I saved the Watson achievement posting for today. In an historic event beginning tonight, in February 2011 IBM’s Watson computer will compete on Jeopardy! against the TV quiz show’s two biggest all-time champions. Watson is a supercomputer running software called DeepQA, developed by IBM Research. While the grand challenge driving the project is to win on Jeopardy!, the broader goal of Watson was to create a new generation of technology that can find answers in unstructured data more effectively than standard search technology.

Watson does a remarkable job of understanding a tricky question and finding the best answer. IBM’s scientists have been quick to say that Watson does not actually think. “The goal is not to model the human brain,” said David Ferrucci, who spent 15 years working at IBM Research on natural language problems and finding answers amid unstructured information. “The goal is to build a computer that can be more effective in understanding and interacting in natural language, but not necessarily the same way humans do it.”

Computers have never been good at finding answers. Search engines don’t answer a question–they deliver thousands of search results that match keywords. University researchers and company engineers have long worked on question answering software, but the very best could only comprehend and answer simple, straightforward questions (How many Oscars did Elizabeth Taylor win?) and would typically still get them wrong nearly one third of the time. That wasn’t good enough to be useful, much less beat Jeopardy! champions.

The questions on this show are full of subtlety, puns and wordplay—the sorts of things that delight humans but choke computers. “What is The Black Death of a Salesman?” is the correct response to the Jeopardy! clue, “Colorful fourteenth century plague that became a hit play by Arthur Miller.” The only way to get to that answer is to put together pieces of information from various sources, because the exact answer is not likely to be written anywhere.

Watson leverages IBM Content Analytics for part of the natural language processing. Watson runs on a cluster of PowerPC 750™ computers—ten racks holding 90 servers, for a total of 2880 processor cores. It’s really a room lined with black cabinets stuffed with hundreds of thousands of processors plus storage systems that can hold the equivalent of about one million books worth of information. Over a period of years, Watson was fed mountains of information, including text from commercial sources, such as the World Book Encyclopedia, and sources that allow open copying of their content, such as Wikipedia and books from Project Gutenberg.  Learn more about the technology under the covers on my previous posting 10 Things You Need to Know About the Technology Behind Watson.

When a question is put to Watson, more than 100 algorithms analyze the question in different ways, and find many different plausible answers–all at the same time. Yet another set of algorithms ranks the answers and gives them a score. For each possible answer, Watson finds evidence that may support or refute that answer. So for each of hundreds of possible answers it finds hundreds of bits of evidence and then with hundreds of algorithms scores the degree to which the evidence supports the answer. The answer with the best evidence assessment will earn the most confidence. The highest-ranking answer becomes the answer. However, during a Jeopardy! game, if the highest-ranking possible answer isn’t rated high enough to give Watson enough confidence, Watson decides not to buzz in and risk losing money if it’s wrong. The Watson computer does all of this in about three seconds.

By late 2010, in practice games at IBM Research in Yorktown Heights, N.Y., Watson was good enough at finding the correct answers to win about 70 percent of games against former Jeopardy! champions. Then in early 2011, Watson went up against Jeopardy! superstars Ken Jennings and Brad Rutter.

Watson’s question-answering technology is expected to evolve into a commercial product. “I want to create something that I can take into every other retail industry, in the transportation industry, you name it,” John Kelly, who runs IBM Research, told The New York Times. “Any place where time is critical and you need to get advanced state-of-the-art information to the front decision-makers. Computers need to go from just being back-office calculating machines to improving the intelligence of people making decisions.”

When you’re looking for an answer to a question, where do you turn? If you’re like most people these days, you go to a computer, phone or mobile device, and type your question into a search engine. You’re rewarded with a list of links to websites where you might find your answer. If that doesn’t work, you revise your search terms until able to find the answer. We’ve come a long way since the time of phone calls and visits to the library to find answers.

But what if you could just ask your computer the question, and get an actual answer rather than a list of documents or websites? Question answering (QA) computing systems are being developed to understand simple questions posed in natural language, and provide the answers in textual form. You ask “What is the capital of Russia?” The computer answers “Moscow,” based on the information that has been loaded into it.

IBM is taking this one step further, developing the Watson computer to understand the actual meaning behind words, distinguish between relevant and irrelevant content, and ultimately demonstrate confidence to deliver precise final answers. Because of its deeper understanding of language, it can process and answer more complex questions that include puns, irony and riddles common in natural language. On February 14–16, 2010, IBM’s Watson computer will be put to the test, competing in three episodes of Jeopardy! against the two most successful players in the quiz show’s history: Ken Jennings and Brad Rutter.

The full text of this article can be found on IBM at 100: http://www.ibm.com/ibm100/us/en/icons/watson/

As for me … I am anxiously waiting to see what happens starting tonight.  See my previous blog postings on Watson at:  “What is Content Analytics?, Alex”, 10 Things You Need to Know About the Technology Behind Watson and Goodbye Search … It’s About Finding Answers … Enter Watson vs. Jeopardy!

Good luck tonight to Watson, Ken Jennings and Brad Rutter … may the best man win (so to speak)!

Introducing IBM at 100: Patents and Innovation

With the looming Jeopardy! challenge competition involving IBM Watson, I am feeling proud of my association with IBM.  In part because IBM is an icon of business.  As a tribute, I plan to re-post a few of the notable achievements by IBM and IBMers from the past 100 years as an attempt to put the company’s contributions years into perspective.   Has IBM made a difference on our world … our planet?  What kind of impact has IBM had on the world?  Is it really a smarter planet as a result of the past 100 years?

I hope to answer these and other questions through these posts.  A dedicated website has these postings and much more about IBM’s past 100 years.   There is also a great overview video.  Check back often.  New stories will be added throughout the centennial year.  Let’s start with Patents and Innovation … a cornerstone of IBM’s heritage and reputation.

IBM’s 100 Icons of Progress

In the span of a century, IBM has evolved from a small business that made scales, time clocks and tabulating machines to a globally integrated enterprise with 400,000 employees and a strong vision for the future. The stories that have emerged throughout our history are complex tales of big risks, lessons learned and discoveries that have transformed the way we work and live. These 100 iconic moments—these Icons of Progress—demonstrate our faith in science, our pursuit of knowledge and our belief that together we can make the world work better.

Patents and Innovation

By hiring engineer and inventor James W. Bryce in 1917, Thomas Watson Sr. showed his commitment to pure inventing. Bryce and his team established IBM as a long-term leader in the development and protection of intellectual property. By 1929, 90 percent of IBM’s products were the result of Watson’s investments in R&D. In 1940, the team invented a method for adding and subtracting using vacuum tubes—a basic building block of the fully electronic computers that transformed business in the1950s. This pattern—using innovation to create intellectual property—shaped IBM’s history.

On January 26, 1939, James W. Bryce, IBM’s chief engineer, dictated a two-page letter to Thomas J. Watson, Sr., the company’s president. It was an update on the research and patents he had been working on. Today, the remarkable letter serves as a window into IBM’s long-held role as a leader in the development and protection of intellectual property.

Bryce was one of the most prolific inventors in American history, racking up more than 500 U.S. and foreign patents by the end of his career. In his letter to Watson, he described six projects, each of which would be considered a signature life achievement for the average person. They included research into magnetic recording of data, an investigation into the use of light rays in computing and plans with Harvard University for what would become one of the first digital computers. But another project was perhaps most significant. Wrote Bryce: “We have been carrying on an investigation in connection with the development of computing devices which do not employ the usual adding wheels, but instead use electronic effects and employ tubes similar to those used in radio work.”

The investigation bore fruit. On January 15, 1940, Arthur H. Dickinson, Bryce’s top associate and a world-beating inventor in his own right, submitted an application for a patent for “certain improvements in accounting apparatus.” In fact, the patent represented a turning point in computing history. Dickinson, under Bryce’s supervision, had invented a method for adding and subtracting using vacuum tubes—a basic building block of the fully electronic computers that began to appear in the 1940s and transformed the world of business in the 1950s.

This pattern—using innovation to create intellectual property—is evident throughout IBM’s history. Indeed, intellectual property has been strategically important at IBM since before it was IBM.

The full text of this article can be found on IBM at 100: http://www.ibm.com/ibm100/us/en/icons/patents/

“What is Content Analytics?, Alex”

“The technology behind Watson represents the future of data management and analytics.  In the real world, this technology will help us uncover insights in everything from traffic to healthcare.”

– John Cohn, IBM Fellow, IBM Systems and Technology Group

How can the same technology used to play Jeopardy! give you better business insight?

Why Watson matters

You have to start by understanding that IBM Watson DeepQA is the world’s most advanced question answering machine.  It uncovers answers by understanding the meaning buried in the context of a natural language question.  By combining advanced Natural Language Processing (NLP) and DeepQA automatic question answering technology, Watson represents the future of content and data management, analytics, and systems design.  IBM Watson leverages core content analysis, along with a number of other advanced technologies, to arrive at a single, precise answer within a very short period of time.  The business applications for this technology is limitless starting with clinical healthcare, customer care, government intelligence and beyond.  I covered the technology side of Watson in my previous posting 10 Things You Need to Know About the Technology Behind Watson.

Amazingly, Watson works like the human brain to analyze the content of a Jeopardy! question.  First, it tries to understand the question to determine what is being asked.  In doing so, it first needs to analyze the natural language text.  Next, it tries to find reasoned answers, by analyzing a wide variety of disparate content mostly in the form of natural language documents.  Finally, Watson assesses and determines the relative likelihood that the answers found, are correct based on a confidence rating.

A great example of the challenge is described by Stephen Baker in his book Final Jeopardy: Man vs. Machine and the Quest to Know Everything: ‘When 60 Minutes premiered, this man was U.S. President.  ‘ Traditionally it’s been difficult for a computer to understand what ‘premiered’ means and that it’s associated with a date.  To a computer, ‘premiere’ could also mean ‘premier’.  Is the question about a person’s title or a production opening?  Then it has to figure out the date when an entity called ’60 Minutes’ premiered, and then find out who was the ‘U.S. President’ at that time.  In short, it requires a ton of contextual understanding.

I am not talking about search here.  This is far beyond what search tools can do.  A recent Forrester report, Take Control Of Your Content, states that 45% of the US workforce spends three or more hours a week just searching for information.  This is completely inefficient.  See my previous posting Goodbye Search … It’s About Finding Answers … Enter Watson vs. Jeopardy! for more on this topic.

Natural Language Processing (NLP) can be leveraged in any situation where text is involved. Besides answering questions, it can help improve enterprise search results or even develop an understanding of the insight hidden in the content itself.  Watson leverages the power of NLP as the cornerstone to translate interactions between computers and human (natural) languages.

NLP involves a series of steps that make text understandable (or computable).  A critical step, lexical analysis is the process of converting a sequence of characters into a set of tokens.  Subsequent steps leverage these tokens to perform entity extraction (people, places, things), concept identification (person A belongs to organization B) and the annotation of documents with this and other information.  A feature of IBM Content Analytics (known as LanguageWare) is performing the lexical analysis function in Watson as part of natural language processing.

Why this matters to your business

Jeopardy! poses a similar set of contextual information challenges as those found in the business world today:

  • Over 80 percent of information being stored is unstructured (is text based).
  • Understanding that 80 plus percent isn’t simple.  Like Jeopardy! … subtle meaning, irony, riddles, acronyms, abbreviations and other complexities all present unique computing challenges not found with structured data in order to derive meaning and insight. This is where natural language processing (NLP) comes in.

The same core NLP technology used in Watson is available now to deliver business value today by unlocking the insights trapped in the massive amounts of unstructured information in the many systems and formats you have today.  Understanding the content, context and value of this unstructured information presents an enormous opportunity for your business.  This is already being done today in a number of industries by leveraging IBM Content Analytics.

IBM Content Analytics (ICA) itself is a platform to derive rapid insight.  It can transform raw information into business insight quickly without building models or deploying complex systems.  Enabling all knowledge workers to derive insight in hours or days … not weeks or months.  It helps address industry specific problems such as healthcare treatment effectiveness, fraud detection, product defect detection, public safety concerns, customer satisfaction and churn, crime and terrorism prevention and more.  Here are some actual customer examples:

Healthcare Research – Like most healthcare providers, BJC Healthcare, had a treasure trove of historical information trapped in unstructured clinical notes, diagnostic reports containing essential information for the study of disease progression, treatment effectiveness and long-term outcomes.  Their existing Biomedical Informatics (BMI) resources were disjointed and non-interoperable, available only to a small fraction of researchers, and frequently redundant, with no capability to tap into the wealth of research information trapped in unstructured clinical notes, diagnostic report and the like.

With IBM Content Analytics, BJC and university researchers are now able to analyze unstructured information to answer key questions that were previously unavailable.  Questions like: Does the patient smoke?, How often and for how long?, If smoke free, how long? What home medications is the patient taking? What is the patient sent home with? What was the diagnosis and what procedures performed on patient?  BJC now has deeper insight into medical information and can uncover trends and patterns within their content, to provide better healthcare to their patients.

Customer Satisfaction – Identifying customer satisfaction trends about products, services and personnel is critical to most businesses.  The Hertz Corporation and Mindshare Technologies, a leading provider of enterprise feedback solutions, are using IBM Content Analytics software to examine customer survey data, including text messages, to better identify car and equipment rental performance levels for pinpointing and making the necessary adjustments to improve customer satisfaction levels.

By using IBM Content Analytics, companies like Hertz can drive new marketing campaigns or modify their products and services to meet the demands of their customers. “Hertz gathers an amazing amount of customer insight daily, including thousands of comments from web surveys, emails and text messages. We wanted to leverage this insight at both the strategic level and the local level to drive operational improvements,” said Joe Eckroth, Chief Information Officer, the Hertz Corporation.

For more information about ICA at Hertz: http://www-03.ibm.com/press/us/en/pressrelease/32859.wss

Research Analytics – To North Carolina State University, the essence of a university is more than education – it is the advancement and dissemination of knowledge in all its forms.  One of the main issues faced by NC State was dealing with the vast number of data sources available to them.  The university sought a solution to efficiently mine and analyze vast quantities of data to better identify companies that could bring NC State’s research to the public.  The objective was a solution designed to parse the content of thousands of unstructured information sources, perform data and text analytics and produce a focused set of useful results.

Using IBM Content Analytics, NC State was able to reduce the time needed to find target companies from months to days.  The result is the identification of new commercialization opportunities, with tests yielding a 300 percent increase in the number of candidates.  By obtaining insight into their extensive content sources, NC State’s Office of Technology Transfer was able to find more effective ways to license technologies created through research conducted at the university. “What makes the solution so powerful is its ability to go beyond conventional online search methods by factoring context into its results.” – Billy Houghteling, executive director, NC State Office of Technology Transfer.

For more information about ICA at NC State: http://www-01.ibm.com/software/success/cssdb.nsf/CS/SSAO-8DFLBX?OpenDocument&Site=software&cty=en_us

You can put the technology of tomorrow to work for you today, by leveraging the same IBM Content Analytics capability helping to power Watson.  To learn more about all the IBM ECM products utilizing Watson technology, please visit these sites:

IBM Content Analytics: http://www-01.ibm.com/software/data/content-management/analytics/

IBM Classification Module: http://www-01.ibm.com/software/data/content-management/classification/

IBM eDiscovery Analyzer: http://www-01.ibm.com/software/data/content-management/products/ediscovery-analyzer/

IBM OmniFind Enterprise Edition: http://www-01.ibm.com/software/data/enterprise-search/omnifind-enterprise/

You can also check out the IBM Content Analytics Resource Center or watch the “what it is and why it matters” video.

I’ll be at the Jeopardy! viewing party in Washington, DC on February 15th and 16th … hope to see you there.  In the mean time, leave me your thoughts and questions below.

10 Things You Need to Know About the Technology Behind Watson

What is so fascinating about a Computer System vs. Quiz Show?  The popularity of America’s favorite quiz show, Jeopardy!, stems from the unique challenges it poses to its contestants: the breadth of topics; the puns, metaphors, and slang in the questions; the speed it takes to buzz and answer.

These factors make Jeopardy! the perfect testing ground for Watson, the IBM computing system that can understand the complexities of human language and return a single, precise answer to a question.

Next month, IBM’s Watson will play Jeopardy! (on live network TV) with two of the all-time champions.  IBM offered a press sneak peek this week at a practice round that included Alex Trebek.  After seeing the clips, I am getting excited and am convinced this technology breakthrough is something special.  Here s what you need to know:

1.  What is Watson?

Watson is the name for IBM’s Question Answering (QA) computing system, built by a team of IBM Research scientists and university collaborators who set out to accomplish a grand challenge – to build a computing system that rivals a human’s ability to answer questions poised in natural language with speed, accuracy and confidence. It leverages Natural Language Processing (or NLP) to process extreme volumes of text.

Watson is powered by an IBM POWER7 platform to handle the massive analytics at speeds required to analyze complex language and deliver correct responses to natural language clues.  The system is a combination of current and new IBM technologies optimized to meet the specialized demands of processing an enormous amount of concurrent tasks, and content while analyzing content in real time.

2.  What is Natural Language Processing?

Natural language processing (NLP) is a field of computer science and linguistics concerned with the interactions between computers and human (natural) languages. It describes a set of linguistic, statistical, and machine learning techniques that allow text to be analyzed and key information extracted for other uses such as Question Answering or Content Analytics.

3.  What are QA and DeepQA?

Question Answering (QA) is the task of automatically answering a question posed in natural language. It involves first trying to understand the question to determine what is being asked. Then by analyzing a wide variety of disparate content mostly in the form of natural language documents to find reasoned answers. And finally, to assess based on the evidence, the relative likelihood that the found answers are correct. Collections can vary from small local document collections, to internal organization documents, to compiled newswire reports, to the World Wide Web. QA is regarded as the next step beyond current search engines.

DeepQA goes well beyond simple question reformulation or keyword analyses. Queries that include disambiguation, unfamiliar syntax, spatially or temporally constrained questions – or simply bad question framing – require a deeper level of content and text analysis.

4.  What is unique about the QA implementation for Watson?

Competing with humans on Jeopardy! poses an additional set of challenges, including, the variety of question types and styles, the broad and varied range of topics, the demand for high degrees of confidence and speed required a whole new approach to the problem.

5.  How does QA technology compare to document search?

The key difference between QA technology and document search is that document search takes a keyword query and returns a list of documents, ranked in order of relevance to the query (often based on popularity and page ranking), while QA technology takes a question expressed in natural language, seeks to understand it in much greater detail, and returns a precise answer to the question.

I touched on the frustrations of search in my previous posting Goodbye Search … It’s About Finding Answers … Enter Watson vs. Jeopardy

6.  How does Watson compare to the chess-playing system, Deep Blue?

Deep Blue demonstrated that computers can solve problems once thought the exclusive domain of human intelligence, albeit in perhaps very different ways than humans do.  Deep Blue was an amazing achievement in the application of compute power to an extraordinarily challenging but computationally well-defined and well-bounded game.  By searching and evaluating a huge space of possible chess board configurations, Deep Blue had the compute power to beat a grand master.

Watson faces a challenge that is entirely open-ended and defies the sort of well-bounded mathematical formulation that fits a game like Chess.  Watson has to operate in the near limitless, ambiguous and highly contextual domain of human language and knowledge.  Ultimately Watson’s scientific goal is to demonstrate how computers can get at the meaning behind a natural language question and infer precise answers from huge volumes of content, with justifications that ultimately make sense to humans.

Rather than challenging the human to search a vast mathematical space, the Watson project challenges the computer to operate in human terms.  Watson strives to understand and answer human questions and to know when it does and doesn’t know the answer.  The capability to assess its own knowledge and abilities, something humans find relatively easy, is exceedingly difficult for computers.

7.  How would this QA technology be used in a business setting?

DeepQA technology provides humans with a powerful tool for their information gathering and decision support.  One of many possible scenarios could be for the end user to enter their question in natural language form, much as if they were asking another person, and for the system to sift through vast amounts of potential evidence to return a ranked list of the most compelling, precise answers along with links to supporting or refuting evidence.  Other important scenarios will use DeepQA to analyze a collection of content and data representing a problem, for example a technical support problem or a medical case.  DeepQA will start to search for solution gathering and assessing evidence from many disparate data sources engaging human users to help provide the missing pieces of information that can help arrive at a solution or for example a differential diagnosis, in the case of medicine.

In addition, these answers would include summaries of their justifying or supporting evidence, allowing the user to quickly assess the evidence and select the correct answer.

Business applications include Customer Relationship Management, Regulatory Compliance, Contact Centers, Help Desks, Web Self-Service, Business Intelligence and more.  These applications will demand a deep understanding of users’ questions and analysis of huge volumes of natural language, structured and semi-structured content to rapidly deliver and justify precise, succinct, and high-confidence answers.

8.  What is the role of Unstructured Information Management Architecture (UIMA) in DeepQA and the Watson project?

Unstructured Information Management Architecture (UIMA) is the IBM developed open-source framework for analysis of unstructured content, such as natural language text, speech, images and video, which Watson uses to integrate and deploy a broad collection of deep analysis algorithms over vast amounts of content.

A number IBM ECM products are based on and leverage UIMA today. IBM Content Analytics, IBM OmniFind Enterprise Edition, IBM eDiscovery Analyzer and IBM Classification Module all are powered by, or benefit from, natural language processing and UIMA.

9.  Are any Enterprise Content Management (ECM) technologies actually part of Watson?

Yes, IBM Content Analytics is part of Watson.  After the question is asked, the text needs to be processed using natural language processing.  IBM Content Analytics (LanguageWare) and other techniques (secret sauce) are used to process the text, and understand the question, as part of the complex processing required to fully answer questions with confidence.  I will tackle this issue in more detail in my next blog posting.

IBM Content Analytics (or ICA) is a content analysis platform used to derive rapid insight from content and data.  It can transform raw information into business insight quickly without building models or deploying complex systems enabling businesses to derive insight in hours or days … not weeks or months.  It’s easy to use and designed for any knowledge worker who needs to search and explore content.  ICA can be extended for deeper insights by integrating to Cognos, SPSS, InfoSphere, Netezza and other Business Intelligence, Analytics and Data Warehouse systems. 

The ICA product itself includes tooling (LanguageWare) which is used to customize NLP processing and build industry or customer specific models and solutions.  This capability is at the core of natural language processing and is the very same ICA capability that is used in Watson.

10.  Who is going to win on February 14-16th?

My prediction … Watson is.

I watched the video yesterday of the practice rounds and Watson is impressive.  Watson performed impressively against Ken Jennings and Brad Rutter (the two contestants and the all-time champions).

So … who won the Jeopardy! practice round?

Watson won handily …

http://www.youtube.com/watch?v=12rNbGf2Wwo

Watson’s score was $4,400, beating Jennings by $1,000 and nearly quadrupling Rutter’s score.

IBM will donate 100% of Watson’s winnings to charity, while Rutter and Jennings said they will each donate 50% of their prizes. 

I am going to host a viewing party for colleagues, friends and family.  This is going to be exciting and fun … I can’t wait.  As an IBMer, I’ll be rooting for Watson to win but not for the obvious reason.  My rooting is really about my passion for the amazing technology breakthrough and the power of content analytics. 

Who do you think will win?  Leave me your thoughts below.