“What is Content Analytics?, Alex”

“The technology behind Watson represents the future of data management and analytics.  In the real world, this technology will help us uncover insights in everything from traffic to healthcare.”

– John Cohn, IBM Fellow, IBM Systems and Technology Group

How can the same technology used to play Jeopardy! give you better business insight?

Why Watson matters

You have to start by understanding that IBM Watson DeepQA is the world’s most advanced question answering machine.  It uncovers answers by understanding the meaning buried in the context of a natural language question.  By combining advanced Natural Language Processing (NLP) and DeepQA automatic question answering technology, Watson represents the future of content and data management, analytics, and systems design.  IBM Watson leverages core content analysis, along with a number of other advanced technologies, to arrive at a single, precise answer within a very short period of time.  The business applications for this technology is limitless starting with clinical healthcare, customer care, government intelligence and beyond.  I covered the technology side of Watson in my previous posting 10 Things You Need to Know About the Technology Behind Watson.

Amazingly, Watson works like the human brain to analyze the content of a Jeopardy! question.  First, it tries to understand the question to determine what is being asked.  In doing so, it first needs to analyze the natural language text.  Next, it tries to find reasoned answers, by analyzing a wide variety of disparate content mostly in the form of natural language documents.  Finally, Watson assesses and determines the relative likelihood that the answers found, are correct based on a confidence rating.

A great example of the challenge is described by Stephen Baker in his book Final Jeopardy: Man vs. Machine and the Quest to Know Everything: ‘When 60 Minutes premiered, this man was U.S. President.  ‘ Traditionally it’s been difficult for a computer to understand what ‘premiered’ means and that it’s associated with a date.  To a computer, ‘premiere’ could also mean ‘premier’.  Is the question about a person’s title or a production opening?  Then it has to figure out the date when an entity called ’60 Minutes’ premiered, and then find out who was the ‘U.S. President’ at that time.  In short, it requires a ton of contextual understanding.

I am not talking about search here.  This is far beyond what search tools can do.  A recent Forrester report, Take Control Of Your Content, states that 45% of the US workforce spends three or more hours a week just searching for information.  This is completely inefficient.  See my previous posting Goodbye Search … It’s About Finding Answers … Enter Watson vs. Jeopardy! for more on this topic.

Natural Language Processing (NLP) can be leveraged in any situation where text is involved. Besides answering questions, it can help improve enterprise search results or even develop an understanding of the insight hidden in the content itself.  Watson leverages the power of NLP as the cornerstone to translate interactions between computers and human (natural) languages.

NLP involves a series of steps that make text understandable (or computable).  A critical step, lexical analysis is the process of converting a sequence of characters into a set of tokens.  Subsequent steps leverage these tokens to perform entity extraction (people, places, things), concept identification (person A belongs to organization B) and the annotation of documents with this and other information.  A feature of IBM Content Analytics (known as LanguageWare) is performing the lexical analysis function in Watson as part of natural language processing.

Why this matters to your business

Jeopardy! poses a similar set of contextual information challenges as those found in the business world today:

  • Over 80 percent of information being stored is unstructured (is text based).
  • Understanding that 80 plus percent isn’t simple.  Like Jeopardy! … subtle meaning, irony, riddles, acronyms, abbreviations and other complexities all present unique computing challenges not found with structured data in order to derive meaning and insight. This is where natural language processing (NLP) comes in.

The same core NLP technology used in Watson is available now to deliver business value today by unlocking the insights trapped in the massive amounts of unstructured information in the many systems and formats you have today.  Understanding the content, context and value of this unstructured information presents an enormous opportunity for your business.  This is already being done today in a number of industries by leveraging IBM Content Analytics.

IBM Content Analytics (ICA) itself is a platform to derive rapid insight.  It can transform raw information into business insight quickly without building models or deploying complex systems.  Enabling all knowledge workers to derive insight in hours or days … not weeks or months.  It helps address industry specific problems such as healthcare treatment effectiveness, fraud detection, product defect detection, public safety concerns, customer satisfaction and churn, crime and terrorism prevention and more.  Here are some actual customer examples:

Healthcare Research – Like most healthcare providers, BJC Healthcare, had a treasure trove of historical information trapped in unstructured clinical notes, diagnostic reports containing essential information for the study of disease progression, treatment effectiveness and long-term outcomes.  Their existing Biomedical Informatics (BMI) resources were disjointed and non-interoperable, available only to a small fraction of researchers, and frequently redundant, with no capability to tap into the wealth of research information trapped in unstructured clinical notes, diagnostic report and the like.

With IBM Content Analytics, BJC and university researchers are now able to analyze unstructured information to answer key questions that were previously unavailable.  Questions like: Does the patient smoke?, How often and for how long?, If smoke free, how long? What home medications is the patient taking? What is the patient sent home with? What was the diagnosis and what procedures performed on patient?  BJC now has deeper insight into medical information and can uncover trends and patterns within their content, to provide better healthcare to their patients.

Customer Satisfaction – Identifying customer satisfaction trends about products, services and personnel is critical to most businesses.  The Hertz Corporation and Mindshare Technologies, a leading provider of enterprise feedback solutions, are using IBM Content Analytics software to examine customer survey data, including text messages, to better identify car and equipment rental performance levels for pinpointing and making the necessary adjustments to improve customer satisfaction levels.

By using IBM Content Analytics, companies like Hertz can drive new marketing campaigns or modify their products and services to meet the demands of their customers. “Hertz gathers an amazing amount of customer insight daily, including thousands of comments from web surveys, emails and text messages. We wanted to leverage this insight at both the strategic level and the local level to drive operational improvements,” said Joe Eckroth, Chief Information Officer, the Hertz Corporation.

For more information about ICA at Hertz: http://www-03.ibm.com/press/us/en/pressrelease/32859.wss

Research Analytics – To North Carolina State University, the essence of a university is more than education – it is the advancement and dissemination of knowledge in all its forms.  One of the main issues faced by NC State was dealing with the vast number of data sources available to them.  The university sought a solution to efficiently mine and analyze vast quantities of data to better identify companies that could bring NC State’s research to the public.  The objective was a solution designed to parse the content of thousands of unstructured information sources, perform data and text analytics and produce a focused set of useful results.

Using IBM Content Analytics, NC State was able to reduce the time needed to find target companies from months to days.  The result is the identification of new commercialization opportunities, with tests yielding a 300 percent increase in the number of candidates.  By obtaining insight into their extensive content sources, NC State’s Office of Technology Transfer was able to find more effective ways to license technologies created through research conducted at the university. “What makes the solution so powerful is its ability to go beyond conventional online search methods by factoring context into its results.” – Billy Houghteling, executive director, NC State Office of Technology Transfer.

For more information about ICA at NC State: http://www-01.ibm.com/software/success/cssdb.nsf/CS/SSAO-8DFLBX?OpenDocument&Site=software&cty=en_us

You can put the technology of tomorrow to work for you today, by leveraging the same IBM Content Analytics capability helping to power Watson.  To learn more about all the IBM ECM products utilizing Watson technology, please visit these sites:

IBM Content Analytics: http://www-01.ibm.com/software/data/content-management/analytics/

IBM Classification Module: http://www-01.ibm.com/software/data/content-management/classification/

IBM eDiscovery Analyzer: http://www-01.ibm.com/software/data/content-management/products/ediscovery-analyzer/

IBM OmniFind Enterprise Edition: http://www-01.ibm.com/software/data/enterprise-search/omnifind-enterprise/

You can also check out the IBM Content Analytics Resource Center or watch the “what it is and why it matters” video.

I’ll be at the Jeopardy! viewing party in Washington, DC on February 15th and 16th … hope to see you there.  In the mean time, leave me your thoughts and questions below.

16 thoughts on ““What is Content Analytics?, Alex”

    1. I am glad the Watson project is generating the type of interest it is. This has become a platform for Stephen, and others, to weigh in with various points of view … and add to the dialogue. All boats rise with a high tide. Stephen Baker presents the most accurate (and unbiased) view of what Watson is, and isn’t, in his book Final Jeopardy: Man vs. Machine and the Quest to Know Everything. At a minimum, Wolfram’s view is incorrect on his timing … Watson has been under development alot longer then the 1.5 years he asserts. I am also not sure I get his point of view about search. Watson is clearly not a search tool … glorified or otherwise. Think about it … win or lose, Watson will accomplish feats of natural language processing understanding, accuracy, speed and relevancy never seen before. It may even forever change the computing landscape … when was the last time you said, or thought, that about a search tool.

      1. Thanks, Craig, for an informed reply.

        A very quick question: are you currently in Orlando Lotusphere event? I haven’t seen you in IBM’s ECM stand.

  1. Hi Craig .. I cannot help but feel this is hype, hype and more hype ! Do you really expect me to believe that this system can answer “any question” on the planet ? I am reasonably familiar with research on NLP and most of the “General Knowledge” questions on Jeapordy are answerable using sources like Wikipedia and a few others. Certainly, there is some engineering challenge is stringing all this together to get an answer in 3 seconds but to position this as some major advancement in Natural Language Processing or Question Answering or even Search is an over-reach. I am willing to admit that IBM has built a specific system to do well on Jeapordy and that is no mean task. But I am skeptical about your claims that this will help solve Healthcare, Customer Satisfaction and the like. Let us call Watson what it actually is — A glorified (albeit cool) toy to win a game.

    1. John, You are welcome to your opinion. I suggest you watch some of the videos on YouTube about the real science behind Watson and the real computing challenges they are attempting to solve before relegating Watson to the toychest 🙂

      It’s a little long but very thorough:

Leave a Reply

%d bloggers like this: