Playing The Healthcare Analytics Shell Game


When I think of how most healthcare organizations are analyzing their clinical data today … I get a mental picture of the old depression era shell game – one that takes place in the shadows and back alleys. For many who were down and out, those games were their only means of survival.

The shell game (also known as Thimblerig) is a game of chance. It requires three walnut shells (or thimbles, plastic cups, whatever) and a small round ball, about the size of a pea, or even an actual pea. It is played on almost any flat surface. This conjures images of depression era men huddled together … each hoping to win some money to buy food … or support their vices. Can you imagine playing a shell game just to win some money so you could afford to eat? A bit dramatic I know – but not too far off the mark.

The person perpetrating the game (called the thimblerigger, operator, or shell man) started the game by putting the pea under one of the shells. The shells were quickly shuffled or slid around to confuse and mislead the players as to which of the shells the pea is actually under … and the betting ensued. We now know, that the games were usually rigged. Many people were conned and never had a chance to win at all. The pea was often palmed or hidden, and not under any of the shells … in other words, there were no winners.

Many healthcare analytics systems and projects are exactly like that today – lots of players and no pea. The main component needed to win (or gain the key insight) is missing.  The “pea” … in this case, is unstructured data. And while it’s not a con game, finding the pea is the key to success … and can literally be the difference between life and death. Making medical decisions about a patient’s health is pretty important stuff. I want my care givers using all of the available and relevant information (medical evidence) as part of my care.

In healthcare today, most analytics initiatives and research efforts are done by using structured data only (which only represents 20% of the available data). I am not kidding.

This is like betting on a shell game without playing with the pea – it’s not possible to win and you are just wasting your money. In healthcare, critical clinical information (or the pea) is trapped in the unstructured data, free text, images, recordings and other forms of content. Nurse’s notes, lab results and discharge summaries are just a few examples of unstructured information that should be analyzed but in most cases … are not.

The reason used to be (for not doing this) … it’s too hard, too complicated, too costly, not good enough or some combination of the above. This was a show stopper for many.

Well guess what … those days are over. The technology needed to do this is available today and the reasons for inaction no longer apply.

In fact – this is now a healthcare imperative! Consider that over 80% of information is unstructured. Why would you even want to do analysis on only 1/5th of your available information?

I’ve written about the value of analyzing unstructured data in the past with Healthcare and ECM – What’s Up Doc? (part 1) and Healthcare and ECM – What’s Up Doc? (part 2).

Let’s look at the results from an actual project involving the analysis of both structured and unstructured data to see what is now possible (when you play “with the pea”).

Seton Family Healthcare is analyzing both structured and unstructured clinical (and operational) data today. Not surprisingly, they are ranked as the top health care system in Texas and among the top 100 integrated health care systems in the country. They are currently featured in a Forbes article describing how they are transforming healthcare delivery with the use of IBM Content and Predictive Analytics for Healthcare. This is a new “smarter analytics” solution that leverages unstructured data with the same natural language processing technology found in IBM Watson.

Seton’s efforts are focused on preventing hospital readmissions of Congestive Heart Failure (CHF) patients through analysis and visualization of newly created evidence based information. Why CHF?  (see the video overview)

Heart disease has long been the leading cause of death in the United States. The most recent data from the CDC shows that heart disease accounted for over 27% of overall mortality in the U.S. The overall costs of treating heart disease are also on the rise – estimated to have been $183 billion in 2009. This is expected to increase to $186 billion in 2023. In 2006 alone, Medicare spent $24 billion on heart disease. Yikes!

Combine those staggering numbers with the fact that CHF patients are the leading cause of readmissions in the United States. One in five patients suffer from preventable readmissions, according to the New England Journal of Medicine. Preventable readmissions also represent a whopping $17.4 billion in expenditures from the current $102.6 billion Medicare budget. Wow! How can they afford to pay for everything else?

They can’t … beginning in 2012, those hospitals with high readmission rates will be penalized. Given the above numbers, it shouldn’t be a shock that the new Medicare penalties will start with CHF readmissions. I imagine every hospital is paying attention to this right now.

Back to Seton … the work at Seton really underscores the value of analyzing your unstructured data. Here is a snapshot of some of the findings:

The Data We Thought Would Be Useful … Wasn’t

In some cases, the unstructured data is more valuable and more trustworthy then the structured data:

  • Left Ventricle Ejection Fraction (LVEF) values are found in both places but originate in text based lab results/reports. This is a test measurement of how much blood your left ventricle is pumping. Values of less than 50% can be an indicator of CHF. These values were found in just 2% of the structured data from patient encounters and 74% of the unstructured data from the same encounters.
  • Smoking Status indicators are also found in both places. I’ve written about this exact issue before in Healthcare and ECM – What’s Up Doc? (part 2). Indicators that a patient was smoking were found in 35% of the structured data from encounters and 81% of the unstructured data from the same encounters. But here’s the kicker … the structured data values were only 65% accurate and the unstructured data values were 95% accurate.

You tell me which is more valuable and trustworthy.

In other cases, the key insights could only be found from the unstructured data … as was no structured data at all or enough to be meaningful. This is equally as powerful.

  • Living Arrangement indicators were found in <1% of the structured data from the patient encounters. It was the unstructured data that revealed these insights (in 81% of the patient encounters). These unstructured values were also 100% accurate.
  • Drug and Alcohol Abuse indicators … same thing … 16% and 81% respectively.
  • Assisted Living indicators … same thing … 0% and 13% respectively. Even though only 13% of the encounters had a value, it was significant enough to rank in the top 18 of all predictors for CHF readmissions.

What this means … is that without including the unstructured data in the analysis, the ability to make accurate predictions about readmissions is highly compromised. In other words, it significantly undermines (or even prevents) the identification of the patients who are most at risk of readmission … and the most in need of care. HINT – Don’t play the game without the pea.

New Unexpected Indicators Emerged … CHF is a Highly Predictive Model

We started with 113 candidate predictors from structured and unstructured data sources. This list was expanded when new insights were surfaced like those mentioned above (and others). With the “right” information being analyzed the accuracy is compelling … the predictive accuracy was 49% at the 20th percentile and 97% at the 80th percentile. This means predictions about CHF readmissions should be pretty darn accurate.

18 Top CHF Readmission Predictors and Some Key Insights

The goal was not to find the top 18 predictors of readmissions … but to find the ones where taking a coordinated care approach makes sense and can change an outcome. Even though these predictors are specific to Seton’s patient population, they can serve as a baseline for others to start from.

  • Many of the highest indicators of CHF are not high predictors of 30-day readmissions. One might think LVEF values and Smoking Status are also high indicators of the probability of readmission … they are not. This could  only be determined through the analysis of both structured and unstructured data.
  • Some of the 18 predictors cannot impact the ability to reduce 30-day admissions. At least six fall into this category and examples include … Heart Disease History, Heart Attack History and Paid by Medicaid Indicator.
  • Many of the 18 predictors can impact the ability to reduce 30-day admissions and represent an opportunity to improve care through coordinated patient care. At least six fall into this category and examples include … Self Alcohol / Drug Use Indicator, Assisted Living Indicator, Lack of Emotion Support Indicator and Low Sodium Level Indicator. Social factors weigh heavily in determining those at risk of readmission and represent the best opportunity for coordinated/transitional care or ongoing case management.
  • The number one indicator came out of left field … Jugular Venous Distention Indicator. This was not one of the original 113 candidate indicators and only surfaced through the analysis of both structured and unstructured data (or finding the pea). For the non-cardiologists out there … this is when the jugular vein protrudes due to the associated pressure. It can be caused by a fluids imbalance or being “dried out”. This is a condition that would be observed by a clinician and would now be a key consideration of when to discharge a patient. It could also factor into any follow-up transitional care/case management programs.

But Wait … There’s More

Seton also examined other scenarios including resource utilization and identifying key waste areas (or unnecessary costs). We also studied Patient X – a random patient with 6 readmission encounters over an eight-month period. I’ll save Patient X for my next posting.

Smarter Analytics and Smarter Healthcare

It’s easy to see why Seton is ranked as the top health care system in Texas and among the top 100 integrated health care systems in the country. They are a shining example of an organization on the forefront of the healthcare transformation. The way they have put their content in motion with analytics to improve patient care, reduce unnecessary costs and avoid the Medicare penalties is something all healthcare organizations should strive for.

Perhaps most impressively, they’ve figured out how to play the healthcare analytics shell game and find the pea every time.  In doing so … everyone wins!

As always, leave me your comments and thoughts.

10 thoughts on “Playing The Healthcare Analytics Shell Game

  1. Pingback: From MDM to Ready for Watson | Mastering Data Management

  2. Pingback: Big Data Value Potential Index « Harvesting "Information Excellence"

  3. Craig, excellent post, and I’m happy to see the progress of both healthcare organizations in being able to tap into the vast knowledge (formerly) trapped in unstructured content, and in IBM’s progress in bringing Watson capabilities to the marketplace.

  4. There is never going to be enough data captured to make predictors 100% accurate. However healthcare organizations have to start somewhere to indentify any trends with the ultimate goal of providing better/smarter healthcare for limited resources. Finally data cannot capture the human interaction – patients not being compliant, physicians not being able to access medical records from a variety of healthcare organizations or providers, and the well-being of the patient once discharged to an environment outside the provider’s control.

  5. Hi Craig, really liked your informative post. However, I have a question – do you think products like IBM Content Analytics can supplement or add value to CIS or EPR systems like Cerner Millennium , etc? Doesn’t Cerner or other CIS applications handles very little unstructured data – and content like discharge summary and clinical notes are handled as structured documents. If this scenario is true, then how can IBM Content and Predictive Analytics solution complement these healthcare information systems with native intelligence capabilities and very little unstructured content

    • Manzoor – Structured documents contain loads of unstructured data. As an example .. Most discharge instructions are several pages of text. Even though the content is well organized, it is not structured data (aka – database rows and columns). The IBM Content Analytics based solutions should be seen as complementary to Cerner or any EMR / clinical information system.

  6. Doing some market research: Who are the leading suppliers of health care analytics, such as Truven, and what percent of the market do they own? How do they fit into the emergence of accountable care organizations, medical homes and incentive payment systems? Thanks

    • Market share questions are best left to the analyst firms (IDC, Gartner, Forrester, etc.) who track such things. There are specialty vendors such as MDdatacor, Truven who have solutions as well as the large analytics vendors such as my employer (IBM).

  7. This sort of healthcare discussion is one that needs to be happening more regularly. I think it is important to state actual analysis of facts, rather than just say that the government is responsible for the coming costs of healthcare. I think that we need to analyse all the factors involved. Thank you for the post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s