Amputations or Analytics … a Call to Action for Entrepreneurs and Intrapreneurs Alike!

Doctor George Shearer practiced medicine in central Pennsylvania from 1825 to 1878 (in the Dillsburg area). He was a pillar of the community and is believed to have been an active surgeon during the Civil War. He was 61 at the time of the Gettysburg battle.

According to the National Library of Medicine, the exact number is not known, but approximately 60,000 surgeries, about three quarters of all of the operations performed during the Civil War, were amputations. Although seemingly drastic, the operation was intended to prevent deadly complications such as gangrene. There were no anti-biotics during this era.

Back then, amputation was the recommended treatment for major injuries, such as damage from gunshots or cannonballs. These amputations were performed with a handsaw, like the one Doctor Shearer used (shown below). During the war, surgeons prided themselves in the speed at which they could operate, some claiming to be able to remove a leg in under one minute. Ouch! Literally!


(Photo: Doctor George Shearer’s Actual Surgical Kit)

Keep in mind that local anesthetics were not invented until the 1880s and many procedures were performed without ether or chloroform … the only real anesthetics during the era.

In 1861, this was the best standard of care for those injuries. I think we can reasonably conclude that better treatment options (and outcomes) exist today.

Recently, The Mayo Clinic published an eye-opening report entitled, A Decade of Reversal: An Analysis of 146 Contradicted Medical Practices. The report focuses on a published medical practices and how effective they are. Things must have improved since 1861 … right?

The report examines published articles in prominent medical journals of new and established medical practices (such as a treatment guidelines or therapies), over a recent 10 year period (2001-2010). 2044 medical practice articles were reviewed. The findings are fascinating but one section of the report jumped off the page at me. Of the 363 articles that tested an existing standard of care, 40.2% reversed the original standard of care … and only 38.0% reaffirmed the original standard of care. The rest were inconclusive.

In other words, (in this case study) the current published medical standards of care are wrong MORE then often then they are correct. Wow!

I do feel obligated to point out that this is a very limited slice of the overall published standards of care … but still. It is just me … or is this mind-blowing!

I am not talking about gulping down some Jack Daniels so I don’t feel my leg being sawed off. This is researched and tested medical standards of care within the last 13 years. And yet … over 40% of the time, it’s WRONG. In fairness I should point out that they were right 38% of the time. No wonder the US Healthcare system checks in as the 37th best worldwide despite outspending everyone else by a huge margin (per capita).

It’s 150 years later, has the standard of care improved enough? We may not be sawing legs off at the same rate these days, but maybe it’s time for a new approach. Why are other industries so much farther ahead in leveraging their data with analytics to improve quality, reduce costs and improve outcomes? What could be more important then saving life and limb?

Years of data have been piling up in electronic medical records systems. Genomics is not new anymore. Isn’t it about time we brought analytics to this set of opportunities?

Some leading organizations already are … innovative solutions and companies are popping up to meet this opportunity. Entrepreneurs like Scott Megill, co-founder and CEO of Coriell Life Sciences, is a great example. Coriell Life Sciences is an offshoot of the Coriell Institute for Medical Research, a 60-year-old non-profit research organization. In 2007, the Institute launched an effort to bring genomic information to bear on health management. Coriell Life Sciences was established to commercialize the results of that research. Vast amounts of genetic information about individual patients has been available for a number of years, but it has been difficult to get at and expensive. “This company bridges the gap,” said Dr. Michael Christman, the Institute’s CEO.

Coriell’s approach is so innovative, they recently walked away with the coveted “IBM Entrepreneur of the Year” award.

Intrapreneurs at IBM have been busy commercializing the breakthrough innovation, IBM Watson – that originally debuted on Jeopardy! in 2011. Watson is based on a cognitive computing model.

Grabbing a few less headlines is IBM Patient Similarity Analytics, which uses traditional data driven predictive analysis combined with new similarity algorithms and new visualization techniques to identify personalized patient intervention opportunities (that were not previously possible).

These are a couple of obvious examples for me, but in reality we are just at the beginning of leveraging big data. New analytics and visualization tools must become the “handsaw” of today. We need these tools to be at the root of today’s modern standards of care.   If Dr. Shearer were alive today, you can bet his old surgical kit would be on the shelf, having been replaced by analytics that he could bring to the point of care.

For many Entrepreneurs and Intrapreneurs, the journey is just beginning, but there is a long way to go. A 2011 McKinsey report estimated that the healthcare industry can realize as much as $300 billion in annual value through analytics. Yowza!

What are you waiting for?

As always, leave my your thoughts below.

Playing The Healthcare Analytics Shell Game

When I think of how most healthcare organizations are analyzing their clinical data today … I get a mental picture of the old depression era shell game – one that takes place in the shadows and back alleys. For many who were down and out, those games were their only means of survival.

The shell game (also known as Thimblerig) is a game of chance. It requires three walnut shells (or thimbles, plastic cups, whatever) and a small round ball, about the size of a pea, or even an actual pea. It is played on almost any flat surface. This conjures images of depression era men huddled together … each hoping to win some money to buy food … or support their vices. Can you imagine playing a shell game just to win some money so you could afford to eat? A bit dramatic I know – but not too far off the mark.

The person perpetrating the game (called the thimblerigger, operator, or shell man) started the game by putting the pea under one of the shells. The shells were quickly shuffled or slid around to confuse and mislead the players as to which of the shells the pea is actually under … and the betting ensued. We now know, that the games were usually rigged. Many people were conned and never had a chance to win at all. The pea was often palmed or hidden, and not under any of the shells … in other words, there were no winners.

Many healthcare analytics systems and projects are exactly like that today – lots of players and no pea. The main component needed to win (or gain the key insight) is missing.  The “pea” … in this case, is unstructured data. And while it’s not a con game, finding the pea is the key to success … and can literally be the difference between life and death. Making medical decisions about a patient’s health is pretty important stuff. I want my care givers using all of the available and relevant information (medical evidence) as part of my care.

In healthcare today, most analytics initiatives and research efforts are done by using structured data only (which only represents 20% of the available data). I am not kidding.

This is like betting on a shell game without playing with the pea – it’s not possible to win and you are just wasting your money. In healthcare, critical clinical information (or the pea) is trapped in the unstructured data, free text, images, recordings and other forms of content. Nurse’s notes, lab results and discharge summaries are just a few examples of unstructured information that should be analyzed but in most cases … are not.

The reason used to be (for not doing this) … it’s too hard, too complicated, too costly, not good enough or some combination of the above. This was a show stopper for many.

Well guess what … those days are over. The technology needed to do this is available today and the reasons for inaction no longer apply.

In fact – this is now a healthcare imperative! Consider that over 80% of information is unstructured. Why would you even want to do analysis on only 1/5th of your available information?

I’ve written about the value of analyzing unstructured data in the past with Healthcare and ECM – What’s Up Doc? (part 1) and Healthcare and ECM – What’s Up Doc? (part 2).

Let’s look at the results from an actual project involving the analysis of both structured and unstructured data to see what is now possible (when you play “with the pea”).

Seton Family Healthcare is analyzing both structured and unstructured clinical (and operational) data today. Not surprisingly, they are ranked as the top health care system in Texas and among the top 100 integrated health care systems in the country. They are currently featured in a Forbes article describing how they are transforming healthcare delivery with the use of IBM Content and Predictive Analytics for Healthcare. This is a new “smarter analytics” solution that leverages unstructured data with the same natural language processing technology found in IBM Watson.

Seton’s efforts are focused on preventing hospital readmissions of Congestive Heart Failure (CHF) patients through analysis and visualization of newly created evidence based information. Why CHF?  (see the video overview)

Heart disease has long been the leading cause of death in the United States. The most recent data from the CDC shows that heart disease accounted for over 27% of overall mortality in the U.S. The overall costs of treating heart disease are also on the rise – estimated to have been $183 billion in 2009. This is expected to increase to $186 billion in 2023. In 2006 alone, Medicare spent $24 billion on heart disease. Yikes!

Combine those staggering numbers with the fact that CHF patients are the leading cause of readmissions in the United States. One in five patients suffer from preventable readmissions, according to the New England Journal of Medicine. Preventable readmissions also represent a whopping $17.4 billion in expenditures from the current $102.6 billion Medicare budget. Wow! How can they afford to pay for everything else?

They can’t … beginning in 2012, those hospitals with high readmission rates will be penalized. Given the above numbers, it shouldn’t be a shock that the new Medicare penalties will start with CHF readmissions. I imagine every hospital is paying attention to this right now.

Back to Seton … the work at Seton really underscores the value of analyzing your unstructured data. Here is a snapshot of some of the findings:

The Data We Thought Would Be Useful … Wasn’t

In some cases, the unstructured data is more valuable and more trustworthy then the structured data:

  • Left Ventricle Ejection Fraction (LVEF) values are found in both places but originate in text based lab results/reports. This is a test measurement of how much blood your left ventricle is pumping. Values of less than 50% can be an indicator of CHF. These values were found in just 2% of the structured data from patient encounters and 74% of the unstructured data from the same encounters.
  • Smoking Status indicators are also found in both places. I’ve written about this exact issue before in Healthcare and ECM – What’s Up Doc? (part 2). Indicators that a patient was smoking were found in 35% of the structured data from encounters and 81% of the unstructured data from the same encounters. But here’s the kicker … the structured data values were only 65% accurate and the unstructured data values were 95% accurate.

You tell me which is more valuable and trustworthy.

In other cases, the key insights could only be found from the unstructured data … as was no structured data at all or enough to be meaningful. This is equally as powerful.

  • Living Arrangement indicators were found in <1% of the structured data from the patient encounters. It was the unstructured data that revealed these insights (in 81% of the patient encounters). These unstructured values were also 100% accurate.
  • Drug and Alcohol Abuse indicators … same thing … 16% and 81% respectively.
  • Assisted Living indicators … same thing … 0% and 13% respectively. Even though only 13% of the encounters had a value, it was significant enough to rank in the top 18 of all predictors for CHF readmissions.

What this means … is that without including the unstructured data in the analysis, the ability to make accurate predictions about readmissions is highly compromised. In other words, it significantly undermines (or even prevents) the identification of the patients who are most at risk of readmission … and the most in need of care. HINT – Don’t play the game without the pea.

New Unexpected Indicators Emerged … CHF is a Highly Predictive Model

We started with 113 candidate predictors from structured and unstructured data sources. This list was expanded when new insights were surfaced like those mentioned above (and others). With the “right” information being analyzed the accuracy is compelling … the predictive accuracy was 49% at the 20th percentile and 97% at the 80th percentile. This means predictions about CHF readmissions should be pretty darn accurate.

18 Top CHF Readmission Predictors and Some Key Insights

The goal was not to find the top 18 predictors of readmissions … but to find the ones where taking a coordinated care approach makes sense and can change an outcome. Even though these predictors are specific to Seton’s patient population, they can serve as a baseline for others to start from.

  • Many of the highest indicators of CHF are not high predictors of 30-day readmissions. One might think LVEF values and Smoking Status are also high indicators of the probability of readmission … they are not. This could  only be determined through the analysis of both structured and unstructured data.
  • Some of the 18 predictors cannot impact the ability to reduce 30-day admissions. At least six fall into this category and examples include … Heart Disease History, Heart Attack History and Paid by Medicaid Indicator.
  • Many of the 18 predictors can impact the ability to reduce 30-day admissions and represent an opportunity to improve care through coordinated patient care. At least six fall into this category and examples include … Self Alcohol / Drug Use Indicator, Assisted Living Indicator, Lack of Emotion Support Indicator and Low Sodium Level Indicator. Social factors weigh heavily in determining those at risk of readmission and represent the best opportunity for coordinated/transitional care or ongoing case management.
  • The number one indicator came out of left field … Jugular Venous Distention Indicator. This was not one of the original 113 candidate indicators and only surfaced through the analysis of both structured and unstructured data (or finding the pea). For the non-cardiologists out there … this is when the jugular vein protrudes due to the associated pressure. It can be caused by a fluids imbalance or being “dried out”. This is a condition that would be observed by a clinician and would now be a key consideration of when to discharge a patient. It could also factor into any follow-up transitional care/case management programs.

But Wait … There’s More

Seton also examined other scenarios including resource utilization and identifying key waste areas (or unnecessary costs). We also studied Patient X – a random patient with 6 readmission encounters over an eight-month period. I’ll save Patient X for my next posting.

Smarter Analytics and Smarter Healthcare

It’s easy to see why Seton is ranked as the top health care system in Texas and among the top 100 integrated health care systems in the country. They are a shining example of an organization on the forefront of the healthcare transformation. The way they have put their content in motion with analytics to improve patient care, reduce unnecessary costs and avoid the Medicare penalties is something all healthcare organizations should strive for.

Perhaps most impressively, they’ve figured out how to play the healthcare analytics shell game and find the pea every time.  In doing so … everyone wins!

As always, leave me your comments and thoughts.

Healthcare and ECM – What’s Next Doc? (part 2 of 2)

In my last blog posting Healthcare and ECM – What’s Up Doc?, I wrote about using ECM based content analytics technology to help accelerate decision making in an industry in transition.

But why stop there … how powerful would it be to turn those new insights (from unstructured information) into action by combining content analytics with predictive analytics or other business analytics?

This is transformational … by unlocking the 80% of information not currently being leveraged (explained in part 1) we unlock new ways to use information. More compellingly, we unlock never seen before trends and patterns in both clinical and operational data.

Think about it … do we know everything we need to know about healthcare and how to identify and treat diseases? Or can we benefit from new insights? The answer is obvious.

Combining content and predictive analytics enables:

  • Accurate extraction of medical facts and relationships from unstructured data in clinical and operational sources – not easy, cost effective, or even possible in the past.
  • Never seen before trends, patterns and anomalies are revealed – connections or relationships between diseases, patients and outcomes (and even costs) are now able to be surfaced and acted upon. Think of the medical research possibilities!
  • The ability to predict future outcomes based on past and present scenarios – optimizing resource allocation and patient outcomes. One organization reduced cardiac surgery patient morbidity from 2.9% to 1.3% by doing this.
  • New insights can be surfaced to any clinical or operational knowledge based on their respective role – this could be through dashboards, case management/care coordination system, EMR, claims processing or any number of other ways – enabling better decision making and action across the organization.
  • The ability to leverage these new insights with other systems such as data warehouses, master patient data – maximizing and befitting from the use of other systems.

In my last posting, I commented that it was now an imperative to leverage clinical information and operational data in new ways … and that are obvious things to do to improve quality of care, patient satisfaction and business efficiency.

There are at least nine areas where this opportunity exists. The clinical scenarios are:

  • Diagnostic Assistance: Highly correlated symptom to health/disease analysis issues visualized with predictive guidance on diagnosis to improve treatment and outcomes … with predicted or forecasted costs.
  • Clinical Treatment Effectiveness: Examine patient-specific factors against the effectiveness of a healthcare organizations specific treatment options and protocols … including comparisons to industry wide outcomes and best practices.
  • Critical Care Intervention: Early detection of unmanageable or high risk cases in the hospital that leads to interventions to reduce costs and maintain or improve clinical conditions … including case based interventions.
  • Research for Improved Disease Management: Perform analysis and predict outcomes by extracting discreet facts from text, such as: patient smoking status, patient diet and patient exercise regime to find new and better treatment options … use a mechanism for differentiation or to secure research grants.

Operational scenarios include:

  • Claims Management: All claims involve unstructured data and manually intensive analysis. Analyze claims information documented in cases, forms and web content to understand new trends and patterns to identify areas … perfect for process improvement, cost reduction and optimal service delivery.
  • Fraud Detection and Prevention: Uncover eligibility, false assertions and fraud patterns trapped in the unstructured data to reduce risk before payments are made … usually represented by a word or combination of words in text that can’t be detected with just structured data.
  • Voice of the Patient: Include unstructured data and sentiment analysis from surveys and web forms in analysis of patient and member satisfaction … this will be key as the industry moves to a value based model.
  • Prevention of Readmissions: Discover key indicators which are indicative of readmission to alert healthcare organizations to these so that protocols can be altered to avoid readmission … this is key as new Medicare payment penalties go into effect in 2012.
  • Patient Discharge and Follow-up Care: Understand and monitor patient behavior to proactively inform preventative and follow-up care coordinators before situations get worse.

According to the New England Journal of Medicine, one in five patients suffer from preventable readmissions. This represents $17.4 billion of the current $102.6 billion Medicare budget. Beginning in 2012, hospitals will be penalized for high readmission rates with reductions in Medicare discharge payments. Seton Healthcare Family is already ahead of the game.

“IBM Content and Predictive Analytics for Healthcare uses the same type of natural language processing as IBM Watson, enabling us to leverage our unstructured information in new ways not possible before,” said Charles J. Barnett, FACHE, President/Chief Executive Officer, Seton Healthcare Family. “With this solution, we can access an integrated view of relevant clinical and operational information to drive more informed decision making. For example, by predicting readmission candidates, we can reduce costly and preventable readmissions, decrease mortality rates, and ultimately improve the quality of life for our patients.”

This week at IOD … IBM is launching a new solution specifically designed to reveal clinical and operational insights in the high impact overlap between clinical and operational use cases – enabling low cost accountable care.

IBM Content and Predictive Analytics for Healthcare, a synergistic solution to IBM Watson, helps transform healthcare clinical and operational decision making for improved outcomes by uniquely applying multiple analytics services to derive and act on new insights in ways not previously possible … which is exactly what Seton Healthcare Family is doing.  Dr. David Ramirez, Medical Director at Seton shares his perspective here.

IBM Content and Predictive Analytics for Healthcare (ICPA) is Watson Ready and is designed to complement and leverage IBM Watson for Healthcare through the ability to analyze and visualize the past, understand the present, and predict future outcomes.

ICPA, as the first Watson Ready offering, not only provides assurance of Watson solution interoperability but extends the value ultimately delivered to clients. For example, using input from ICPA outcomes, IBM Watson will be able to provide better diagnostic recommendation and treatment protocols as well as learn from the confidence based responses.

The press release is available here for those seeking more information. I will be doing a high level main stage demo of ICPA on Wednesday which will be streamed live. I will post the replay when available.

But it’s not just healthcare … every industry is impacted by the explosion of information and has the same opportunity to leverage the 80+ percent that is unstructured to turn insights into action.

As always, leave me your thoughts and comments here.

TV Re-runs, Watson and My Blog

When I was a wee lad … back in the 60s … I used to rush home from elementary school to watch the re-runs on TV.  This was long before middle school and girls.  HOMEWORK, SCHMOMEWORK !!!  … I just had to see those re-runs before anything else.  My favorites were I Love Lucy, Batman, Leave It To Beaver and The Munsters.  I also watched The Patty Duke Show (big time school boy crush) but my male ego prevents me from admitting I liked it.  Did you know the invention of the re-run is credited to Desi Arnaz?  The man was a genius even though Batman was always my favorite.  Still is.  I had my priorities straight even back then.

I am reminded of this because I have that same Batman-like re-run giddiness as I think about the upcoming re-runs of Jeopardy! currently scheduled to air September 12th – 14th.

You’ve probably figured out why I am so excited, but in case you’ve been living in a cave, not reading this blog, or both … IBM Watson competed (and won) on Jeopardy! in February against the two most accomplished Grand Champions in the history of the game show (Ken Jennings and Brad Rutter).  Watson (DeepQA) is the world’s most advanced question answering machine that uncovers answers by understanding the meaning buried in the context of a natural language question.  By combining advanced Natural Language Processing (NLP) and DeepQA automatic question answering technology, IBM was able to demonstrate a major breakthrough in computing.

Unlike traditional structured data, human natural language is full of ambiguity … it is nuanced and filled with contextual references.  Subtle meaning, irony, riddles, acronyms, idioms, abbreviations and other language complexities all present unique computing challenges not found with structured data.  This is precisely why IBM chose Jeopardy! as a way to showcase the Watson breakthrough.

Appropriately, I’ve decided that this posting should be a re-run of my own Watson and content analysis related postings.  So in the sprit of Desi, Lucy, Batman and Patty Duke … here we go:

  1. This is my favorite post of the bunch.  It explains how the same technology used to play Jeopardy! can give you better business insight today.  “What is Content Analytics?, Alex”
  2. I originally wrote this a few weeks before the first match was aired to explain some of the more interesting aspects of Watson.  10 Things You Need to Know About the Technology Behind Watson
  3. I wrote this posting just before the three day match was aired live (in February) and updated it with comments each day.  Humans vs. Watson (Programmed by Humans): Who Has The Advantage?
  4. Watson will be a big part of the future of Enterprise Content Management and I wrote this one in support of a keynote I delivered at the AIIM Conference.   Watson and The Future of ECM  (my slides from the same keynote are posted here).
  5. This was my most recent posting.  It covers another major IBM Research advancement in the same content analysis technology space.  TAKMI and Watson were recognized as part of IBM’s Centennial as two of the top 100 innovations of the last 100 years.  IBM at 100: TAKMI, Bringing Order to Unstructured Data
  6. I wrote a similar IBM Centennial posting about IBM Research and Watson.  IBM at 100: A Computer Called Watson
  7. This was my first Watson related post.  It introduced Watson and was posted before the first match was aired.  Goodbye Search … It’s About Finding Answers … Enter Watson vs. Jeopardy!

Desi Arnaz may have been a genius when it came to TV re-runs but the gang at IBM Research have made a compelling statement about the future of computing.  Jeopardy! shows what is possible and my blog postings show how this can be applied already.  The comments from your peers on these postings are interesting to read as well.

Don’t miss either re-broadcast.  Find out where and when Jeopardy! will be aired in your area.  After the TV re-broadcast, I will be doing some events including customer and public presentations.

On the web …

  • I will presenting IBM Watson and the Future of Enterprise Content Management on September 21, 2011 (replay here).
  • I will be speaking on Content Analytics in a free upcoming AIIM UK webinar on September 30, 2011 (replay here).

Or in person …

You might also want to check out the new Smarter Planet interview with Manoj Saxena (IBM Watson Solutions General Manager)

As always, your comments and thoughts are welcome here.

IBM at 100: TAKMI, Bringing Order to Unstructured Data

As most of you know … I have been periodically posting some of the really fascinating top 100 innovations of the past 100 years as part of IBM’s Centennial celebration.

This one is special to me as it represents what is possible for the future of ECM.  I wasn’t around for tabulating machines and punch cards but have long been fascinated by the technology developments in the management and use of content.  As impressive as Watson is … it is only the most recent step in a long journey IBM has been pursuing to help computers better understood natural language and unstructured information.

As most of you probably don’t know … this journey started over 50 years ago in 1957 when IBM published the first research on this subject entitled A Statistical Approach to Mechanized Encoding and Searching of Literary InformationFinally … something in this industry older then I am!

Unstructured Information Management Architecture (UIMA)

Another key breakthrough by IBM in this area was the invention of UIMA.  Now an Apache Open Source project and OASIS standard, UIMA is an open, industrial-strength platform for unstructured information analysis and search.  It is the only open standard for text based processing and applications.  I plan to write more on UIMA in a future blog but I mention it here because it was an important step forward for the industry, Watson and TAKMI (now known as IBM Content Analytics).


In 1997, IBM researchers at the company’s Tokyo Research Laboratory pioneered a prototype for a powerful new tool capable of analyzing text. The system, known as TAKMI (for Text Analysis and Knowledge Mining), was a watershed development: for the first time, researchers could efficiently capture and utilize the wealth of buried knowledge residing in enormous volumes of text. The lead researcher was Tetsuya Nasukawa.

Over the past 100 years, IBM has had a lot of pretty important inventions but this one takes the cake for me.  Nasukawa-san once said,

“I didn’t invent TAKMI to do something humans could do, better.  I wanted TAKMI to do something that humans could not do.”

In other words, he wanted to invent something humans couldn’t see or do on their own … and isn’t that the whole point and value of technology anyway?

By 1997, text was searchable, if you knew what to look for. But the challenge was to understand what was inside these growing information volumes and know how to take advantage of the massive textual content that you could not read through and digest.

The development of TAKMI quietly set the stage for the coming transformation in business intelligence. Prior to 1997, the field of analytics dealt strictly with numerical and other “structured” data—the type of tagged information that is housed in fixed fields within databases, spreadsheets and other data collections, and that can be analyzed by standard statistical data mining methods.

The technological clout of TAKMI lay in its ability to read “unstructured” data—the data and metadata found in the words, grammar and other textual elements comprising everything from books, journals, text messages and emails, to health records and audio and video files. Analysts today estimate that 80 to 90 percent of any organization’s data is unstructured. And with the rising use of interactive web technologies, such as blogs and social media platforms, churning out ever-expanding volumes of content, that data is growing at a rate of 40 to 60 percent per year.

The key for the success was natural language processing (NLP) technology. Most of the data mining researchers were treating English text data as a bag of words by extracting words from character strings based on white spaces. However, since Japanese text data does not contain white spaces as word separators, IBM researchers in Tokyo applied NLP for extracting words, analyzing their grammatical features, and identifying relationships among words. Such in-depth analysis led to better results in text mining. That’s why the leading-edge text mining technology originated in Japan.

The complete article on TAKMI can be found at

Fast forward to today.  IBM has since commercialized TAKMI as IBM Content Analytics (ICA), a platform to derive rapid insight.  It can transform raw information into business insight quickly without building models or deploying complex systems enabling all knowledge workers to derive insight in hours or days … not weeks or months.  It helps address industry specific problems such as healthcare treatment effectiveness, fraud detection, product defect detection, public safety concerns, customer satisfaction and churn, crime and terrorism prevention and more.

I’d like to personally congratulate Nasukawa-san and the entire team behind TAKMI (and ICA) for such an amazing achievement … and for making the list.  Selected team members who contributed to TAKMI are Tetsuya Nasukawa, Kohichi Takeda, Hideo Watanabe, Shiho Ogino, Akiko Murakami, Hiroshi Kanayama, Hironori Takeuchi, Issei Yoshida, Yuta Tsuboi and Daisuke Takuma.

It’s a shining example of the best form of innovation … the kind that enables us to do something not previously possible.  Being recognized along with other amazing achievements like the UPC code, the floppy disk, magnetic stripe technology, laser eye surgery, the scanning tunneling microscope, fractal geometry, human genomics mapping is really amazing.

This type of enabling innovation is the future of Enterprise Content Management.  It will be fun and exciting to see if TAKMI (Content Analytics) has the same kind of impact on computing as the UPC code has had on retail shopping … or as laser eye surgery has had on vision care.

What do you think?  As always, leave for your thoughts and comments.

Other similar postings:

Watson and The Future of ECM

“What is Content Analytics?, Alex”

10 Things You Need to Know About the Technology Behind Watson

Goodbye Search … It’s About Finding Answers … Enter Watson vs. Jeopardy! 

IBM … 100 Years Later

Nearly all the companies our grandparents admired have disappeared.  Of the top 25 industrial corporations in the United States in 1900, only two remained on that list at the start of the 1960s.  And of the top 25 companies on the Fortune 500 in 1961, only six remain there today.  Some of the leaders of those companies that vanished were dealt a hand of bad luck.  Others made poor choices. But the demise of most came about because they were unable simultaneously to manage their business of the day and to build their business of tomorrow.

IBM was founded in 1911 as the Computing Tabulating Recording Corporation through a merger of four companies: the Tabulating Machine Company, the International Time Recording Company, the Computing Scale Corporation, and the Bundy Manufacturing Company.  CTR adopted the name International Business Machines in 1924.  The distinctive culture and product branding has given IBM the nickname Big Blue.

As you read this, IBM begins its 101st year.  As I look back at the last century, there is a path that led us to this remarkable anniversary which has been both rich and diverse.  The innovations IBM has contributed includes products ranging from cheese slicers to calculators to punch cards – all the way up to game-changing systems like Watson.

But what stands out to me is what has remained unchanged.  IBM has always been a company of brilliant problem-solvers.  IBMers use technology to solve business problems.  We invent it, we apply it to complex challenges, and we redefine industries along the way.

This has led to some truly game-changing innovation.  Just look at industries like retail, air travel, and government.  Where would we be without UPC codes, credit cards and ATM machines, SABRE, or Social Security?  Visit the IBM Centennial site to see profiles on 100 years of innovation.

We haven’t always been right though … remember OS/2, the PCjr and Prodigy?

100 years later, we’re still tackling the world’s most pressing problems.  It’s incredibly exciting to think about the ways we can apply today’s innovation – new information based systems leveraging analytics to create new solutions, like Watson – to fulfill the promise of a Smarter Planet through smarter traffic, water, energy, and healthcare.  This promise of the future … is incredibly exciting and I look forward to helping IBM pave the way for continued innovation.

Watch the IBM Centennial film “Wild Ducks” or read the book.  IBM officially released a book last week celebrating the Centennial, “Making the World Work Better: The Ideas that Shaped a Century and a Company”.  The book consists of three original essays by leading journalists. They explore how IBM” has pioneered the science of information, helped reinvent the modern corporation and changed the way the world actually works.

As for me … I’ve been with IBM since the 2006 acquisition of FileNet and am proud to be associated with such an innovative and remarkable company.

IBM at 100: SAGE, The First National Air Defense Network

This week was a reminder of how technology can aid in our nation’s defense as we struck a major blow against terrorism.  Most people don’t realize IBM contributed to our nation’s defense in the many ways it has.  Here is just one example from 1949.

When the Soviet Union detonated their first atomic bomb on August 29, 1949, the United States government concluded that it needed a real-time, state-of-the-art air defense system.  It turned to Massachusetts Institute of Technology (MIT), which in turn recruited companies and other organizations to design what would be an online system covering all of North America using many technologies, a number of which did not exist yet.  Could it be done?  It had to be done.  Such a system had to observe, evaluate and communicate incoming threats much the way a modern air traffic control system monitors flights of aircraft.

This marked the beginning of SAGE (Semi-Automatic Ground Environment), the national air defense system implemented by the United States to warn of and intercept airborne attacks during the Cold War.  The heart of this digital system—the AN/FSQ-7 computer—was developed, built and maintained by IBM.  SAGE was the largest computer project in the world during the 1950s and took IBM squarely into the new world of computing.  Between 1952 and 1955, it generated 80 percent of IBM’s revenues from computers, and by 1958, more than 7000 IBMers were involved in the project.  SAGE spun off a large number of technological innovations that IBM incorporated into other computer products.

IBM’s John McPherson led the early conversations with MIT, and senior management quickly realized that this could be one of the largest data processing opportunities since winning the Social Security bid in the mid-1930s.  Thomas Watson, Jr., then lobbying his father and other senior executives to move into the computer market quickly, recalled in his memoirs that he wanted to “pull out all the stops” to be a central player in the project.  “I worked harder to win that contract than I worked for any other sale in my life.”  So did a lot of other IBMers: engineers designing components, then the computer; sales staff pricing the equipment and negotiating contracts; senior management persuading MIT that IBM was the company to work with; other employees collaborating with scores of companies, academics and military personnel to get the project up and running; and yet others who installed, ran and maintained the IBM systems for SAGE for a quarter century.

The online features of the system demonstrated that a new world of computing was possible—and that, in the 1950s, IBM knew the most about this kind of data processing.  As the ability to develop reliable online systems became a reality, other government agencies and private companies began talking to IBM about possible online systems for them.  Some of those projects transpired in parallel, such as the development of the Semi-Automated Business Research Environment (Sabre), American Airlines’ online reservation system, also built using IBM staff located inPoughkeepsie,New York.

In 1952, MIT selected IBM to build the computer to be the heart of SAGE. MIT’s project leader, Jay W. Forrester, reported later that the company was chosen because “in the IBM organization we observed a much higher degree of purposefulness, integration and “esprit de corps” than in other firms, and “evidence of much closer ties between research, factory and field maintenance at IBM.”  The technical skills to do the job were also there, thanks to prior experience building advanced electronics for the military.

IBM quickly ramped up, assigning about 300 full-time IBMers to the project by the end of 1953. Work was centered in IBM’s Poughkeepsie and Kingston, NY facilities and in Cambridge, Massachusetts, home of MIT.  New memory systems were needed; MITRE and the Systems Development Corporation (part of RAND Corporation) wrote software, and other vendors supplied components.  In June 1956, IBM delivered the prototype of the computer to be used in SAGE.  The press release called it an “electronic brain.”  It could automatically calculate the most effective use of missiles and aircraft to fend off attack, while providing the military commander with a view of an air battle. Although this seems routine in today’s world, it was an enormous leap forward in computing.  When fully deployed in 1963, SAGE included 23 centers, each with its own AN/FSQ-7 system, which really consisted of two machines (one for backup), both operating in coordination.  Ultimately, 54 systems were installed, all collaborating with each other. The SAGE system remained in service until January 1984, when it was replaced with a next-generation air defense network.

Its innovative technological contributions to IBM and the IT industry as a whole were significant.  These included magnetic-core memories, which worked faster and held more data than earlier technologies; a real-time operating system (a first); highly disciplined programming methods; overlapping computing and I/O operations; real-time transmission of data over telephone lines; use of CRT terminals and light pens (a first); redundancy and backup methods and components; and the highest reliability of computer systems (uptime) of the day.  It was the first geographically distributed, online, real-time application of digital computers in the world.  Because many of the technological innovations spun off from this project were ported over to new IBM computers in the second half of the 1950s by the same engineers who had worked on SAGE, the company was quickly able to build on lessons learned in how to design, manufacture and maintain complex systems.

Fascinating to be sure … the full article can be accessed at