IBM at 100: TAKMI, Bringing Order to Unstructured Data


As most of you know … I have been periodically posting some of the really fascinating top 100 innovations of the past 100 years as part of IBM’s Centennial celebration.

This one is special to me as it represents what is possible for the future of ECM.  I wasn’t around for tabulating machines and punch cards but have long been fascinated by the technology developments in the management and use of content.  As impressive as Watson is … it is only the most recent step in a long journey IBM has been pursuing to help computers better understood natural language and unstructured information.

As most of you probably don’t know … this journey started over 50 years ago in 1957 when IBM published the first research on this subject entitled A Statistical Approach to Mechanized Encoding and Searching of Literary InformationFinally … something in this industry older then I am!

Unstructured Information Management Architecture (UIMA)

Another key breakthrough by IBM in this area was the invention of UIMA.  Now an Apache Open Source project and OASIS standard, UIMA is an open, industrial-strength platform for unstructured information analysis and search.  It is the only open standard for text based processing and applications.  I plan to write more on UIMA in a future blog but I mention it here because it was an important step forward for the industry, Watson and TAKMI (now known as IBM Content Analytics).

TAKMI

In 1997, IBM researchers at the company’s Tokyo Research Laboratory pioneered a prototype for a powerful new tool capable of analyzing text. The system, known as TAKMI (for Text Analysis and Knowledge Mining), was a watershed development: for the first time, researchers could efficiently capture and utilize the wealth of buried knowledge residing in enormous volumes of text. The lead researcher was Tetsuya Nasukawa.

Over the past 100 years, IBM has had a lot of pretty important inventions but this one takes the cake for me.  Nasukawa-san once said,

“I didn’t invent TAKMI to do something humans could do, better.  I wanted TAKMI to do something that humans could not do.”

In other words, he wanted to invent something humans couldn’t see or do on their own … and isn’t that the whole point and value of technology anyway?

By 1997, text was searchable, if you knew what to look for. But the challenge was to understand what was inside these growing information volumes and know how to take advantage of the massive textual content that you could not read through and digest.

The development of TAKMI quietly set the stage for the coming transformation in business intelligence. Prior to 1997, the field of analytics dealt strictly with numerical and other “structured” data—the type of tagged information that is housed in fixed fields within databases, spreadsheets and other data collections, and that can be analyzed by standard statistical data mining methods.

The technological clout of TAKMI lay in its ability to read “unstructured” data—the data and metadata found in the words, grammar and other textual elements comprising everything from books, journals, text messages and emails, to health records and audio and video files. Analysts today estimate that 80 to 90 percent of any organization’s data is unstructured. And with the rising use of interactive web technologies, such as blogs and social media platforms, churning out ever-expanding volumes of content, that data is growing at a rate of 40 to 60 percent per year.

The key for the success was natural language processing (NLP) technology. Most of the data mining researchers were treating English text data as a bag of words by extracting words from character strings based on white spaces. However, since Japanese text data does not contain white spaces as word separators, IBM researchers in Tokyo applied NLP for extracting words, analyzing their grammatical features, and identifying relationships among words. Such in-depth analysis led to better results in text mining. That’s why the leading-edge text mining technology originated in Japan.

The complete article on TAKMI can be found at http://www.ibm.com/ibm100/us/en/icons/takmi/

Fast forward to today.  IBM has since commercialized TAKMI as IBM Content Analytics (ICA), a platform to derive rapid insight.  It can transform raw information into business insight quickly without building models or deploying complex systems enabling all knowledge workers to derive insight in hours or days … not weeks or months.  It helps address industry specific problems such as healthcare treatment effectiveness, fraud detection, product defect detection, public safety concerns, customer satisfaction and churn, crime and terrorism prevention and more.

I’d like to personally congratulate Nasukawa-san and the entire team behind TAKMI (and ICA) for such an amazing achievement … and for making the list.  Selected team members who contributed to TAKMI are Tetsuya Nasukawa, Kohichi Takeda, Hideo Watanabe, Shiho Ogino, Akiko Murakami, Hiroshi Kanayama, Hironori Takeuchi, Issei Yoshida, Yuta Tsuboi and Daisuke Takuma.

It’s a shining example of the best form of innovation … the kind that enables us to do something not previously possible.  Being recognized along with other amazing achievements like the UPC code, the floppy disk, magnetic stripe technology, laser eye surgery, the scanning tunneling microscope, fractal geometry, human genomics mapping is really amazing.

This type of enabling innovation is the future of Enterprise Content Management.  It will be fun and exciting to see if TAKMI (Content Analytics) has the same kind of impact on computing as the UPC code has had on retail shopping … or as laser eye surgery has had on vision care.

What do you think?  As always, leave for your thoughts and comments.

Other similar postings:

Watson and The Future of ECM

“What is Content Analytics?, Alex”

10 Things You Need to Know About the Technology Behind Watson

Goodbye Search … It’s About Finding Answers … Enter Watson vs. Jeopardy! 

5 thoughts on “IBM at 100: TAKMI, Bringing Order to Unstructured Data

  1. Craig,

    As usuaI, an interesting and thoughtful piece.

    I think the test for TAKMI (AKA Content Analytics or ICA) and IBM will be: Can ICA be as widely adopted as say Google analytics?

    While IBM has traditionally focused on enterprise class business cases, Microsoft has proven with SharePoint as Apple has with the iPhone that if you get the user experience nailed down and make it easy for any individual to acquire and use, then eventually the enterprise will follow.

    We know IBM has a top down strategy. Does IBM also have a bottom up strategy for ICA?

    Gary

    • Gary – According to Google’s website …

      “Google Analytics is the enterprise-class web analytics solution that gives you rich insights into your website traffic and marketing effectiveness”

      To me … this a completely different challenge. Most organizations use some technology, or perhaps manual analysis, to determine marketing effectiveness. Has Google figured out a better way to do this? I’ll leave it to the marketing gurus to judge.

      The original TAKMI breakthrough was doing something not previously possible before TAKMI … by humans .. or any other means. To me, this is real innovation. The ability to derive insight from large volumes of unstructured information has come along way since 1997 when the TAKMI project started. As an example, the use of content analysis in the eDiscovery process alone is a major breakthrough. Doing a large eDiscovery project today without analytics is inconceivable to many. The thought of social media analysis without the ability to analyze the content is also inconceivable. TAKMI came at a time when none of these things existed. I think we’re trying to compare apples and oranges.

      TAKMI has already made contributions to a number of IBM technologies and solutions since it’s introduction including some by partners. We’ll continue to invest and innovate in this area as we have done with UIMA, TAKMI, Watson and more. We’ll also continue to invest in matching go-to-market strategies and channels that help us reach the markets, industries and customers we’re trying to serve (as we have done over the past 100 years) 🙂

  2. Knowledge management has been an elusive goal for most of my 27 years as a RIM professional. There have been many imitators but none have been successful.
    Some organizations have attempted to develop knowledge management programs but these efforts have been doomed to failure for the following reasons:

    a) The focus has been on retrieving vital knowledge from departing employees who had no desire or incentive to pass on their knowledge.

    b) The process of collecting knowledge usually meant designing forms with specific fields that needed to be filled out.

    c) Existing employees did not have the time or motivation to fill out the forms, which may ask the wrong questions.

    Because of these difficulties, I have been resigned to the fact that knowledge management is unobtainable. However, I now believe that knowledge management is a reality. My reasoning is three-fold; effective tools have been developed to facilitate the collection of information and opinions, the culture of employees has changed, and a method to analyze and make sense of the vast knowledge of information has been developed. Let me explain my reasoning.

    First, the advent of social media has provided an effective set of tools for individuals to record their comments and conversations.

    Second the ‘university’ culture of new younger employees demands access to these tools. These employees are accustomed to using the internet to research and discuss. They are now demanding that the companies that employ them provide the same access. The result is a huge growth in electronic information that organizations must now manage and control.

    Lastly and most important, the development of information analytics software has provided a useful and effective tool to make sense of the seemingly unrelated treasure trove of information.

    The development of knowledge information programs within organizations is now a reality. I commend IBM for their recognition of the importance of information analytics, and their efforts in promoting this tool within our industry.

    • Sam – I also think that the early solutions that touted themselves as “knowledge management” solutions were ill suited to the purpose – and compounded the problem you highlight. Let’s hope innovations like TAKMI, Watson and whatever comes continue to enable the development of knowledge information programs.

  3. Pingback: TV Re-runs, Watson and My Blog « Craig Rhinehart's ECM Insights

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s