How Do Data Loopholes Slow Down the Treatment of Breast Cancer?

Considering it’s Breast Cancer Awareness Month, the timing of this post is hopefully helping a very important cause.  For reasons I won’t go into here, I’ve recently become more familiar with breast cancer then I would have otherwise.  When confronted with a new topic of interest, it’s my nature to dig in and learn everything I can about it.

The National Cancer Institute provides a wealth of information on breast cancer but being a “software guy” … the way a mammogram results combined with a clinical breast exam can detect early signs of cancer stood out to me as an important information issue.

I began to wonder where that information was captured and stored (after the test and examination) … and how it was ultimately used in follow-up care with the patient.  I didn’t expect to learn what I did.

The American College of Radiology (ACR) has established a uniform way for radiologists to describe mammogram findings.  The system is called BI-RADS and includes standardized structured codes or values.  Each BI-RADS code has a follow-up plan associated with it to help radiologists and other physicians manage a patient’s care.  These values are often used to trigger notifications of the findings or other follow-up steps.  This makes perfect sense to me except there is a (big data loophole) problem.

The BI-RAD findings (or values) are typically found on a text based report … or determined by the examining physician.  They are then captured or manually transcribed in the EMR as free text notes that are added to the medical record as text … unstructured data living in a structured data environment.  This is the loophole!  It’s technically there but not able to be used.

Sometimes this step can be missed completely and the results are not put into the EMR system at all (human error) … or, more likely, the BI-RAD value is not transcribed in the right place as a structured data field.  There are just two of the reasons reasons this loophole can be caused.

You may not be aware, but an Electronic Medical Records (EMR) system is generally optimized for structured data.  Most EMRs don’t leverage text based unstructured data (test results, physician notes, observations, findings, etc.) in ways that they could.  It’s a known weakness of many of today’s EMR systems.

To net this out … it’s entirely possible that cancer is detected using the BI-RADS value but the information does not find it’s way into the right place in the EMR system because it’s text based and the EMR cannot recognize it.  This EMR system limitation has no way of determining what the text based information is, or how to use it.

The impact of this is staggering.  Let’s think about this in terms of timely follow-up on cancer detection.  A system that is not able to use the BI-RAD value could mean patients are not being followed-up on properly (or at all) – even though they are diagnosed with breast cancer.  Yes, this  can actually happen if the value is buried in the text and not being used by the EMR.  The unstructured data loophole is a big deal!

Don’t take my word for it.  University of North Carolina Health Care (UNCH) has announced new findings from mining clinical data to improve the accuracy of its 2012 Physician Quality Reporting System (PQRS) measures, achieving double digit quality improvements in the areas of mammogram, colon cancer and pneumonia screening.  They are taking steps to close data loopholes.

The new findings indicate mammogram values are present in structured data 52% of the time … and present in unstructured data 48% of the time.  Almost half the time the unstructured data is not presented with the rest of the structured data.  Ouch, that’s a big data loophole.

The new findings also indicate CRC screening (colon cancer) values are present in structured data just 17% of the time … and present in unstructured data 83% of the time.  As a man of a certain age, this scares me in words that can’t be published.  Another big data loophole.

Thankfully leading organizations like UNCH are closing these data loopholes today with solutions that understand unstructured data and can “structure it” for use in EMR systems … pasted from an IBM press release dated today:

Timely Follow-up of Abnormal Cancer Screening Results:  Follow-up care for patients with abnormal tests is often delayed because the results are buried in electronic medical records.  Using IBM Content Analytics, UNCHC can extract abnormal results from cancer screening reports such as mammograms and colonoscopies and store the results as structured data.  The structured results are used to generate alerts immediately for physicians to proactively follow-up with patients that have abnormal cancer screening results.

This is an example of what IBM calls Smarter Care … where advanced analytics and cognitive computing can enable more holistic approach to individuals’ care, and can lead to an evolution in care delivery, with the potential for more effective outcomes and lower costs.  If an ounce of prevention is worth a pound of cure, an ounce of perspective extracted from a ton of data is priceless in potential savings.  IBM Content Analytics is part of the IBM Patient Care and Insights solution suite.

I’ve written several previous blogs on related topics that you might find interesting:

I am also speaking at the PCPCC Annual Fall Conference next Monday October 14th at 10am and will be discussing Smarter Care, UNCH’s findings and more.  Hope to see you there.

As always, leave me your feedback, questions and suggestions.

4 thoughts on “How Do Data Loopholes Slow Down the Treatment of Breast Cancer?

  1. Hey Craig,

    Great post on big data loopholes. Per our many conversations on this topic, you know I completely agree with your blog’s premise that unstructured text and data often holds vital information that never informs a clinician’s diagnosis or care plan. The idea that this incomplete picture of the patient could lead to misdiagnosis and the wrong treatment is troubling.

    A recent article on Cancer Misdiagnosis that appeared in Boston Magazine cites a BMJ Quality and Safety Journal study that suggests as many as 28% of cancer cases are incorrectly diagnosed. When physicians were surveyed as to why the number might be so high, the top reason was “fragmented or missing information across medical information systems.(38.5 percent),”

    The fact that technology, available today from IBM and others, is being implemented at forward thinking hospitals such as UNCH is fantastic and I applaud IBM’s efforts in deploying Watson and Watson-like solutions to help lower health risks associated with medical errors.

    Meanwhile, I guess I’m just a little impatient. I see so much potential in the healthcare space for cognitive computing and analytics solutions beyond what is being done today. Putting Watson to work for consumers to personalize wellness programs is just one example.

    I also see the critical lack of funding for integration of healthcare information data silos which makes me sick at heart when so much emphasis and money is being placed on the importance of EMR solutions which, as you have pointed out, are often woefully inadequate for aggregating EHR and other forms of unstructured patient data.

    I understand IBM is addressing the healthcare data integration issue on a case by case basis with healthcare visionaries that already “get it”. But overall, there is a healthcare data dis-integration crisis in this country today that shows signs of intractability – for a variety of reasons be they policy, process or financial related.

    How do you think IBM and others can help accelerate the process of data integration industry wide?


    • Gary – I don’t agree with your characterization of IBM’s data integration focus 🙂 In my opinion, it takes a village to tackle issues like this. Vendors, customers, standards (i.e. HL-7), consultants, ROI all play a role in driving the integration of silos. IBM is a leader in data integration already and we plan to play a role here .. including working with others to help tackle these challenges. These data loopholes needs to be closed once and for all.

  2. Unstructured data loopholes definitely need to be closed. But if history teaches us anything, pushing standards like HL7 are not going to speed the process.

    IBM is attacking the problem from the top down interacting with the innovators and the bleeding edge companies in the healthcare space. I don’t think that’s an unfair characterization of IBM;s approach nor is it a judgment on my part. It’s good business for IBM.

    I just feel like the process needs to be supercharged. You mention HL7 which has been around for more than a decade and has had little or no positive impact on interoperability for un-structured data in the form of continuity of care documents .and other key text-based information sources.

    Meanwhile, schema-less or schema-agnostic databases are able to ingest virtually any kind of document format and in the process are leapfrogging the need for standards such as HL7 while enabling document interoperability leveraging XML, JSON and other much more broadly used and accepted standards.

    It seems to me that the healthcare industry feels the need to add unnecessary layers of complexity – for whatever reason – while the rest of the world, comparatively speaking, is traveling at warp speed. Thus, data loopholes are harder to close and improving quality and outcomes continues to be more of a challenge than it should be.

    That’s my view.

Leave a Reply to Gary MacFadden Cancel reply