IBM at 100: TAKMI, Bringing Order to Unstructured Data

As most of you know … I have been periodically posting some of the really fascinating top 100 innovations of the past 100 years as part of IBM’s Centennial celebration.

This one is special to me as it represents what is possible for the future of ECM.  I wasn’t around for tabulating machines and punch cards but have long been fascinated by the technology developments in the management and use of content.  As impressive as Watson is … it is only the most recent step in a long journey IBM has been pursuing to help computers better understood natural language and unstructured information.

As most of you probably don’t know … this journey started over 50 years ago in 1957 when IBM published the first research on this subject entitled A Statistical Approach to Mechanized Encoding and Searching of Literary InformationFinally … something in this industry older then I am!

Unstructured Information Management Architecture (UIMA)

Another key breakthrough by IBM in this area was the invention of UIMA.  Now an Apache Open Source project and OASIS standard, UIMA is an open, industrial-strength platform for unstructured information analysis and search.  It is the only open standard for text based processing and applications.  I plan to write more on UIMA in a future blog but I mention it here because it was an important step forward for the industry, Watson and TAKMI (now known as IBM Content Analytics).


In 1997, IBM researchers at the company’s Tokyo Research Laboratory pioneered a prototype for a powerful new tool capable of analyzing text. The system, known as TAKMI (for Text Analysis and Knowledge Mining), was a watershed development: for the first time, researchers could efficiently capture and utilize the wealth of buried knowledge residing in enormous volumes of text. The lead researcher was Tetsuya Nasukawa.

Over the past 100 years, IBM has had a lot of pretty important inventions but this one takes the cake for me.  Nasukawa-san once said,

“I didn’t invent TAKMI to do something humans could do, better.  I wanted TAKMI to do something that humans could not do.”

In other words, he wanted to invent something humans couldn’t see or do on their own … and isn’t that the whole point and value of technology anyway?

By 1997, text was searchable, if you knew what to look for. But the challenge was to understand what was inside these growing information volumes and know how to take advantage of the massive textual content that you could not read through and digest.

The development of TAKMI quietly set the stage for the coming transformation in business intelligence. Prior to 1997, the field of analytics dealt strictly with numerical and other “structured” data—the type of tagged information that is housed in fixed fields within databases, spreadsheets and other data collections, and that can be analyzed by standard statistical data mining methods.

The technological clout of TAKMI lay in its ability to read “unstructured” data—the data and metadata found in the words, grammar and other textual elements comprising everything from books, journals, text messages and emails, to health records and audio and video files. Analysts today estimate that 80 to 90 percent of any organization’s data is unstructured. And with the rising use of interactive web technologies, such as blogs and social media platforms, churning out ever-expanding volumes of content, that data is growing at a rate of 40 to 60 percent per year.

The key for the success was natural language processing (NLP) technology. Most of the data mining researchers were treating English text data as a bag of words by extracting words from character strings based on white spaces. However, since Japanese text data does not contain white spaces as word separators, IBM researchers in Tokyo applied NLP for extracting words, analyzing their grammatical features, and identifying relationships among words. Such in-depth analysis led to better results in text mining. That’s why the leading-edge text mining technology originated in Japan.

The complete article on TAKMI can be found at

Fast forward to today.  IBM has since commercialized TAKMI as IBM Content Analytics (ICA), a platform to derive rapid insight.  It can transform raw information into business insight quickly without building models or deploying complex systems enabling all knowledge workers to derive insight in hours or days … not weeks or months.  It helps address industry specific problems such as healthcare treatment effectiveness, fraud detection, product defect detection, public safety concerns, customer satisfaction and churn, crime and terrorism prevention and more.

I’d like to personally congratulate Nasukawa-san and the entire team behind TAKMI (and ICA) for such an amazing achievement … and for making the list.  Selected team members who contributed to TAKMI are Tetsuya Nasukawa, Kohichi Takeda, Hideo Watanabe, Shiho Ogino, Akiko Murakami, Hiroshi Kanayama, Hironori Takeuchi, Issei Yoshida, Yuta Tsuboi and Daisuke Takuma.

It’s a shining example of the best form of innovation … the kind that enables us to do something not previously possible.  Being recognized along with other amazing achievements like the UPC code, the floppy disk, magnetic stripe technology, laser eye surgery, the scanning tunneling microscope, fractal geometry, human genomics mapping is really amazing.

This type of enabling innovation is the future of Enterprise Content Management.  It will be fun and exciting to see if TAKMI (Content Analytics) has the same kind of impact on computing as the UPC code has had on retail shopping … or as laser eye surgery has had on vision care.

What do you think?  As always, leave for your thoughts and comments.

Other similar postings:

Watson and The Future of ECM

“What is Content Analytics?, Alex”

10 Things You Need to Know About the Technology Behind Watson

Goodbye Search … It’s About Finding Answers … Enter Watson vs. Jeopardy! 

Content in Motion: The Voice of Your Customer

Do you listen to your customers?

No, really!  Of course, everyone answers “yes” when asked this question.  So much so … that the question really isn’t worth asking anymore.  The real question to ask is “What are you doing about it?”

Your customers write about your services, prices, product quality and their experiences with you in social media.  They write you letters (yes, letters on paper do exist), they send you emails, they call your call centers and even participate in surveys you conduct … Again I ask, what are you doing about it?

How are you translating all that information across all those input channels into action?  All of that content (you already have) in the form of customer interactions is just waiting to be leveraged (hhmmmm).

In three separate “C” Level studies (CIO, CFO, CEO) … the number one executive imperative was to “Reinvent Customer Relationships”.  Across the three studies, key findings were to:

  • Get closer to customers (top need)
  • Better understand what customers need
  • Deliver unprecedented customer service

Can anyone think of a better way to accomplish this then by examining all of that customer interaction based content to enable you to do something about it?  I bet there are loads of trends, patterns and new insights just waiting to be explored and discovered in those interactions … something demanding your attention and needing action.  This is one of the thoughts I had in mind when I blogged about “Content at Rest or Content in Motion? Which is Better?” a few weeks ago.  Clearly, identifying customer satisfaction trends about products, services and personnel is critical to any business.

The Hertz Corporation is doing this today.  They are using IBM Content Analytics software to examine customer interaction based content to better identify car and equipment rental performance levels for pinpointing and making the necessary adjustments to improve customer satisfaction levels.  Insights derived from enterprise content enable companies like Hertz to drive new marketing campaigns or modify their products and services to meet the demands of their customers.

“Hertz gathers an amazing amount of customer insight daily, including thousands of comments from web surveys, emails and text messages. We wanted to leverage this insight at both the strategic level and the local level to drive operational improvements,” said Joe Eckroth, Chief Information Officer, the Hertz Corporation.

Hertz isn’t just listening … they are taking action … by putting their content in motion.

Again I ask, what are you doing about it?  Why not test drive Hertz’s idea in your business?  You’ve already got the content to do so.

I welcome your input as always.  I recently bylined articles on Hertz and IBM Content Analytics for and entitled  “Insights into Action – Improving Service by Listening to the Voices of your Customers”.  For a more detailed profile on ICA at Hertz visit:

IBM … 100 Years Later

Nearly all the companies our grandparents admired have disappeared.  Of the top 25 industrial corporations in the United States in 1900, only two remained on that list at the start of the 1960s.  And of the top 25 companies on the Fortune 500 in 1961, only six remain there today.  Some of the leaders of those companies that vanished were dealt a hand of bad luck.  Others made poor choices. But the demise of most came about because they were unable simultaneously to manage their business of the day and to build their business of tomorrow.

IBM was founded in 1911 as the Computing Tabulating Recording Corporation through a merger of four companies: the Tabulating Machine Company, the International Time Recording Company, the Computing Scale Corporation, and the Bundy Manufacturing Company.  CTR adopted the name International Business Machines in 1924.  The distinctive culture and product branding has given IBM the nickname Big Blue.

As you read this, IBM begins its 101st year.  As I look back at the last century, there is a path that led us to this remarkable anniversary which has been both rich and diverse.  The innovations IBM has contributed includes products ranging from cheese slicers to calculators to punch cards – all the way up to game-changing systems like Watson.

But what stands out to me is what has remained unchanged.  IBM has always been a company of brilliant problem-solvers.  IBMers use technology to solve business problems.  We invent it, we apply it to complex challenges, and we redefine industries along the way.

This has led to some truly game-changing innovation.  Just look at industries like retail, air travel, and government.  Where would we be without UPC codes, credit cards and ATM machines, SABRE, or Social Security?  Visit the IBM Centennial site to see profiles on 100 years of innovation.

We haven’t always been right though … remember OS/2, the PCjr and Prodigy?

100 years later, we’re still tackling the world’s most pressing problems.  It’s incredibly exciting to think about the ways we can apply today’s innovation – new information based systems leveraging analytics to create new solutions, like Watson – to fulfill the promise of a Smarter Planet through smarter traffic, water, energy, and healthcare.  This promise of the future … is incredibly exciting and I look forward to helping IBM pave the way for continued innovation.

Watch the IBM Centennial film “Wild Ducks” or read the book.  IBM officially released a book last week celebrating the Centennial, “Making the World Work Better: The Ideas that Shaped a Century and a Company”.  The book consists of three original essays by leading journalists. They explore how IBM” has pioneered the science of information, helped reinvent the modern corporation and changed the way the world actually works.

As for me … I’ve been with IBM since the 2006 acquisition of FileNet and am proud to be associated with such an innovative and remarkable company.

IBM at 100: SAGE, The First National Air Defense Network

This week was a reminder of how technology can aid in our nation’s defense as we struck a major blow against terrorism.  Most people don’t realize IBM contributed to our nation’s defense in the many ways it has.  Here is just one example from 1949.

When the Soviet Union detonated their first atomic bomb on August 29, 1949, the United States government concluded that it needed a real-time, state-of-the-art air defense system.  It turned to Massachusetts Institute of Technology (MIT), which in turn recruited companies and other organizations to design what would be an online system covering all of North America using many technologies, a number of which did not exist yet.  Could it be done?  It had to be done.  Such a system had to observe, evaluate and communicate incoming threats much the way a modern air traffic control system monitors flights of aircraft.

This marked the beginning of SAGE (Semi-Automatic Ground Environment), the national air defense system implemented by the United States to warn of and intercept airborne attacks during the Cold War.  The heart of this digital system—the AN/FSQ-7 computer—was developed, built and maintained by IBM.  SAGE was the largest computer project in the world during the 1950s and took IBM squarely into the new world of computing.  Between 1952 and 1955, it generated 80 percent of IBM’s revenues from computers, and by 1958, more than 7000 IBMers were involved in the project.  SAGE spun off a large number of technological innovations that IBM incorporated into other computer products.

IBM’s John McPherson led the early conversations with MIT, and senior management quickly realized that this could be one of the largest data processing opportunities since winning the Social Security bid in the mid-1930s.  Thomas Watson, Jr., then lobbying his father and other senior executives to move into the computer market quickly, recalled in his memoirs that he wanted to “pull out all the stops” to be a central player in the project.  “I worked harder to win that contract than I worked for any other sale in my life.”  So did a lot of other IBMers: engineers designing components, then the computer; sales staff pricing the equipment and negotiating contracts; senior management persuading MIT that IBM was the company to work with; other employees collaborating with scores of companies, academics and military personnel to get the project up and running; and yet others who installed, ran and maintained the IBM systems for SAGE for a quarter century.

The online features of the system demonstrated that a new world of computing was possible—and that, in the 1950s, IBM knew the most about this kind of data processing.  As the ability to develop reliable online systems became a reality, other government agencies and private companies began talking to IBM about possible online systems for them.  Some of those projects transpired in parallel, such as the development of the Semi-Automated Business Research Environment (Sabre), American Airlines’ online reservation system, also built using IBM staff located inPoughkeepsie,New York.

In 1952, MIT selected IBM to build the computer to be the heart of SAGE. MIT’s project leader, Jay W. Forrester, reported later that the company was chosen because “in the IBM organization we observed a much higher degree of purposefulness, integration and “esprit de corps” than in other firms, and “evidence of much closer ties between research, factory and field maintenance at IBM.”  The technical skills to do the job were also there, thanks to prior experience building advanced electronics for the military.

IBM quickly ramped up, assigning about 300 full-time IBMers to the project by the end of 1953. Work was centered in IBM’s Poughkeepsie and Kingston, NY facilities and in Cambridge, Massachusetts, home of MIT.  New memory systems were needed; MITRE and the Systems Development Corporation (part of RAND Corporation) wrote software, and other vendors supplied components.  In June 1956, IBM delivered the prototype of the computer to be used in SAGE.  The press release called it an “electronic brain.”  It could automatically calculate the most effective use of missiles and aircraft to fend off attack, while providing the military commander with a view of an air battle. Although this seems routine in today’s world, it was an enormous leap forward in computing.  When fully deployed in 1963, SAGE included 23 centers, each with its own AN/FSQ-7 system, which really consisted of two machines (one for backup), both operating in coordination.  Ultimately, 54 systems were installed, all collaborating with each other. The SAGE system remained in service until January 1984, when it was replaced with a next-generation air defense network.

Its innovative technological contributions to IBM and the IT industry as a whole were significant.  These included magnetic-core memories, which worked faster and held more data than earlier technologies; a real-time operating system (a first); highly disciplined programming methods; overlapping computing and I/O operations; real-time transmission of data over telephone lines; use of CRT terminals and light pens (a first); redundancy and backup methods and components; and the highest reliability of computer systems (uptime) of the day.  It was the first geographically distributed, online, real-time application of digital computers in the world.  Because many of the technological innovations spun off from this project were ported over to new IBM computers in the second half of the 1950s by the same engineers who had worked on SAGE, the company was quickly able to build on lessons learned in how to design, manufacture and maintain complex systems.

Fascinating to be sure … the full article can be accessed at

IBM at 100: The 1401 Mainframe

In my continuing series of IBM at 100, I turn to our data processing heritage with the IBM 1401 Data Processing System (which was long before my time).

While the IBM 1401 Data Processing System wasn’t a great leap in power or speed, that was never the point. “It was a utilitarian device, but one that users had an irrational affection for,” wrote Paul E. Ceruzzi in his book, A History of Modern Computing.

There were several keys to the popularity of the 1401 system. It was one of the first computers to run completely on transistors—not vacuum tubes—and that made it smaller and more durable. It rented for US$2500 per month, and was touted as the first affordable general-purpose computer. It was also the easiest machine to program at the time. The system’s software, wrote Dag Spicer, senior curator at the Computer History Museum, “was a big improvement in usability.”

This more accessible computer unleashed pent-up demand for data processing. IBM was shocked to receive 5200 orders for the 1401 computer in just the first five weeks after introducing it—more than was predicted for the entire life of the machine. Soon, business functions at companies that had been immune to automation were taken over by computers. By the mid-1960s, more than 10,000 1401 systems were installed, making it by far the best-selling computer to date.

More importantly, it marked a new generation of computing architecture, causing business executives and government officials to think differently about computing. A computer didn’t have to be a monolithic machine for the elite. It could fit comfortably in a medium-size company or lab. In the world’s top corporations, different departments could have their own computers.

A computer could even wind up operating on an army truck in the middle of a forest. “There was not a very good grasp or visualization of the potential impact of computers—certainly as we know them today—until the 1401 came along,” said Chuck Branscomb, who led the 1401 design team. The 1401 system made enterprises of all sizes believe a computer was useful, and even essential.

By the late 1950s, computers had experienced tremendous changes. Clients drove a desire for speed. Vacuum-tube electronics replaced the electro-mechanical mechanisms of the tabulating machines that dominated information processing in the first half of the century. First came the experimental ENIAC, then Remington Rand’s Univac and the IBM 701, all built on electronics. Magnetic tape and then the first disk drives changed ideas about the accessibility of information. Grace Hopper’s compiler and John Backus’s FORTRAN programming language gave computer experts new ways to instruct machines to do ever more clever and complex tasks. Systems that arose out of those coalescing developments were a monumental leap in computing capabilities.

Still, the machines touched few lives directly. Installed and working computers numbered barely more than 1000. The world, in fact, was ready for a more accessible computer.

The first glimpse of that next generation of computing turned up in an unexpected place:France. “In the mid-1950s, IBM got a wake-up call,” said Branscomb, who ran one of IBM’s lines of accounting machines at the time. French computer upstart Machines Bull came out with its Gamma computers, small and fast compared to goliaths like the IBM 700 series. “It was a competitive threat,” Branscomb recalled.

Bull made IBM and others realize that entities with smaller budgets wanted computers. IBM scrambled together resources to try to make a competing machine. “It was 1957 and IBM had no new machine in development,” Branscomb said. “It was a real problem.”

During June and July 1957, IBM engineers and planners gathered inGermanyto propose several accounting machine designs. The anticipated product of this seven-week conference was known thereafter as the Worldwide Accounting Machine (WWAM), although no particular design was decided upon.

In September 1957, Branscomb was assigned to run the WWAM project. In March 1958, after Thomas Watson, Jr. expressed dissatisfaction with the WWAM project inEurope, the Endicott proposal for a stored-program WWAM was given formal approval as the company’s approach to meeting the need for an electronic accounting machine. The newly assigned project culminated in the announcement of the 1401 Data Processing System (although, for a time it carried the acronym SPACE).

The IBM 1401 Data Processing System—comprising a variety of card and tape models with a range of core memory sizes, and configured for stand-alone use and peripheral service for larger computers—was announced in October 1959.

Branscomb’s group set a target rental cost of US$2500 per month, well below a 700 series machine, and hit it. They also decided the computer had to be simple to operate. “We knew it was time for a dramatic change, a discontinuity,” Branscomb added. And indeed it was. The 1401 system extended computing to a new level of organization and user, driving information technology deeper into everyday life.

The full article can be accessed at

IBM at 100: UPC … The Transformation of Retail

In my continuing series of IBM at 100 achievements … this is one of my favorites of all the ones I plan to republish here. The humble Universal Product Code (UPC), also known as the bar code, along with the related deployment of scanners, fundamentally changed many of the practices of retailers and all organizations that buy and move things, from large industrial equipment to pencils purchased in stationery stores. These two technologies led to the use of in-store information processing systems in almost every industry around the world, applied to millions of types of goods and items. UPC is planet Earth’s most pervasive inventory tracking tool.

N. Joseph Woodland, later an IBMer but then working at Drexel Institute of Technology, applied for the first patent on bar code technology on October 20, 1949, and along with Bernard Silver, received the patent on October 7, 1952. And there it sat for more than two decades. In those days there was no way to read the codes, until the laser became a practical tool. About 1970 at IBM Research Triangle Park, George Laurer went to work on how to scan labels and to develop a digitally readable code. Soon a team formed to address the issue, including Woodland. Their first try was a bull’s-eye bar code; nobody was happy with it because it took up too much space on a carton.

Meanwhile, the grocery industry in post-war America was adapting to the boom in suburban supermarkets–seeking to automate checkout at stores to increase speed, drive down the cost of hiring so many checkout clerks and systematize in-store inventory management. Beginning in the 1960s, various industry task forces went to work defining requirements and technical specifications. In time the industry issued a request to computer companies to submit proposals.

IBM’s team had also reworked its design going to the now familiar rows of bars each containing multiple copies of data. Woodland, who had helped create the original bull’s-eye design, then later worked on the bar code, writing IBM’s response to the industry’s proposal. Another group of IBMers at the Rochester, Minnesota Laboratory built a prototype scanner using optics and lasers. In 1973, the grocery industry’s task force settled on a standard that very closely paralleled IBM’s approach. The industry wanted a standard that all grocers and their suppliers could use.

IBM was well positioned and became one of the earliest suppliers of scanning equipment to the supermarket world. On October 11, 1973, IBM became one of the earliest vendors to market with a system, called the IBM 3660. In time it became a workhorse in the industry. It included a point-of-sale terminal (digital cash register) and checkout scanner that could read the UPC symbol. The grocery industry compelled its suppliers of products in boxes and cans to start using the code, and IBM helped suppliers acquire the technology to work with the UPC.

On June 26, 1974, the first swipe was done at a Marsh’s supermarket in Troy, Ohio, which the industry had designated as a test facility. The first product swiped was a pack of Wrigley’s Juicy Fruit chewing gum, now on display at the Smithsonian’s National Museum of American History in Washington, D.C. Soon, grocery stores began adopting the new scanners, while customers were slowly educated on their accuracy in quoting prices.

If there had been any doubts about the new system’s prospects, they were gone by the end of the 1970s. The costs of checking out customers went down; the accuracy of transactions went up; checkouts sped up by some 40 percent; and in-store inventory systems dramatically improved management of goods on hand, on order or in need of replenishment. And that was just the beginning. An immediate byproduct was the ability of stores to start tracking the buying habits of customers in general and, later, down to the individual, scanning bar coded coupons and frequent shopper cards. In the four years between 1976 and 1980, the number of grocery stores using this technology jumped from 104 to 2,207, and they were spreading to other countries.

In the 1980s, IBM and its competitors introduced the new technology to other industries (including variations of the American standard bar codes that were adopted in Western Europe). And IBM Raleigh kept improving the technology. In December 1980, IBM introduced the 3687 scanner that used holographic technologies—one of the first commercial applications of this technology. In October 1987, the IBM 7636 Bar Code Scanner was introduced–and as a result, throughout the 1980s factories adopted the IBM bar code to track in-process inventory. Libraries used it to do the same with books. In the 1990s, hand-held scanners made it easier to apply bar codes to things beyond cartons and cans and to scan them, eventually using wireless technology. Meanwhile innovation expanded in the ability of a bar code to hold more information.

These technologies make it possible for all kinds of organizations, schools, universities and companies in all industries to leverage the power of computers to manage their inventories. In many countries, almost every item now purchased in a retail store has a UPC printed on it, and is scanned. UPC led to the retirement of the manual and electro-mechanical cash registers which, as a technology, had been around since the 1880s. By the early 2000s, bar code technologies had become a $17 billion business, scanned billions of times each day.

The full text of this article can be found on IBM at 100: