Introducing IBM at 100: Patents and Innovation

With the looming Jeopardy! challenge competition involving IBM Watson, I am feeling proud of my association with IBM.  In part because IBM is an icon of business.  As a tribute, I plan to re-post a few of the notable achievements by IBM and IBMers from the past 100 years as an attempt to put the company’s contributions years into perspective.   Has IBM made a difference on our world … our planet?  What kind of impact has IBM had on the world?  Is it really a smarter planet as a result of the past 100 years?

I hope to answer these and other questions through these posts.  A dedicated website has these postings and much more about IBM’s past 100 years.   There is also a great overview video.  Check back often.  New stories will be added throughout the centennial year.  Let’s start with Patents and Innovation … a cornerstone of IBM’s heritage and reputation.

IBM’s 100 Icons of Progress

In the span of a century, IBM has evolved from a small business that made scales, time clocks and tabulating machines to a globally integrated enterprise with 400,000 employees and a strong vision for the future. The stories that have emerged throughout our history are complex tales of big risks, lessons learned and discoveries that have transformed the way we work and live. These 100 iconic moments—these Icons of Progress—demonstrate our faith in science, our pursuit of knowledge and our belief that together we can make the world work better.

Patents and Innovation

By hiring engineer and inventor James W. Bryce in 1917, Thomas Watson Sr. showed his commitment to pure inventing. Bryce and his team established IBM as a long-term leader in the development and protection of intellectual property. By 1929, 90 percent of IBM’s products were the result of Watson’s investments in R&D. In 1940, the team invented a method for adding and subtracting using vacuum tubes—a basic building block of the fully electronic computers that transformed business in the1950s. This pattern—using innovation to create intellectual property—shaped IBM’s history.

On January 26, 1939, James W. Bryce, IBM’s chief engineer, dictated a two-page letter to Thomas J. Watson, Sr., the company’s president. It was an update on the research and patents he had been working on. Today, the remarkable letter serves as a window into IBM’s long-held role as a leader in the development and protection of intellectual property.

Bryce was one of the most prolific inventors in American history, racking up more than 500 U.S. and foreign patents by the end of his career. In his letter to Watson, he described six projects, each of which would be considered a signature life achievement for the average person. They included research into magnetic recording of data, an investigation into the use of light rays in computing and plans with Harvard University for what would become one of the first digital computers. But another project was perhaps most significant. Wrote Bryce: “We have been carrying on an investigation in connection with the development of computing devices which do not employ the usual adding wheels, but instead use electronic effects and employ tubes similar to those used in radio work.”

The investigation bore fruit. On January 15, 1940, Arthur H. Dickinson, Bryce’s top associate and a world-beating inventor in his own right, submitted an application for a patent for “certain improvements in accounting apparatus.” In fact, the patent represented a turning point in computing history. Dickinson, under Bryce’s supervision, had invented a method for adding and subtracting using vacuum tubes—a basic building block of the fully electronic computers that began to appear in the 1940s and transformed the world of business in the 1950s.

This pattern—using innovation to create intellectual property—is evident throughout IBM’s history. Indeed, intellectual property has been strategically important at IBM since before it was IBM.

The full text of this article can be found on IBM at 100: http://www.ibm.com/ibm100/us/en/icons/patents/

“What is Content Analytics?, Alex”

“The technology behind Watson represents the future of data management and analytics.  In the real world, this technology will help us uncover insights in everything from traffic to healthcare.”

– John Cohn, IBM Fellow, IBM Systems and Technology Group

How can the same technology used to play Jeopardy! give you better business insight?

Why Watson matters

You have to start by understanding that IBM Watson DeepQA is the world’s most advanced question answering machine.  It uncovers answers by understanding the meaning buried in the context of a natural language question.  By combining advanced Natural Language Processing (NLP) and DeepQA automatic question answering technology, Watson represents the future of content and data management, analytics, and systems design.  IBM Watson leverages core content analysis, along with a number of other advanced technologies, to arrive at a single, precise answer within a very short period of time.  The business applications for this technology is limitless starting with clinical healthcare, customer care, government intelligence and beyond.  I covered the technology side of Watson in my previous posting 10 Things You Need to Know About the Technology Behind Watson.

Amazingly, Watson works like the human brain to analyze the content of a Jeopardy! question.  First, it tries to understand the question to determine what is being asked.  In doing so, it first needs to analyze the natural language text.  Next, it tries to find reasoned answers, by analyzing a wide variety of disparate content mostly in the form of natural language documents.  Finally, Watson assesses and determines the relative likelihood that the answers found, are correct based on a confidence rating.

A great example of the challenge is described by Stephen Baker in his book Final Jeopardy: Man vs. Machine and the Quest to Know Everything: ‘When 60 Minutes premiered, this man was U.S. President.  ‘ Traditionally it’s been difficult for a computer to understand what ‘premiered’ means and that it’s associated with a date.  To a computer, ‘premiere’ could also mean ‘premier’.  Is the question about a person’s title or a production opening?  Then it has to figure out the date when an entity called ’60 Minutes’ premiered, and then find out who was the ‘U.S. President’ at that time.  In short, it requires a ton of contextual understanding.

I am not talking about search here.  This is far beyond what search tools can do.  A recent Forrester report, Take Control Of Your Content, states that 45% of the US workforce spends three or more hours a week just searching for information.  This is completely inefficient.  See my previous posting Goodbye Search … It’s About Finding Answers … Enter Watson vs. Jeopardy! for more on this topic.

Natural Language Processing (NLP) can be leveraged in any situation where text is involved. Besides answering questions, it can help improve enterprise search results or even develop an understanding of the insight hidden in the content itself.  Watson leverages the power of NLP as the cornerstone to translate interactions between computers and human (natural) languages.

NLP involves a series of steps that make text understandable (or computable).  A critical step, lexical analysis is the process of converting a sequence of characters into a set of tokens.  Subsequent steps leverage these tokens to perform entity extraction (people, places, things), concept identification (person A belongs to organization B) and the annotation of documents with this and other information.  A feature of IBM Content Analytics (known as LanguageWare) is performing the lexical analysis function in Watson as part of natural language processing.

Why this matters to your business

Jeopardy! poses a similar set of contextual information challenges as those found in the business world today:

  • Over 80 percent of information being stored is unstructured (is text based).
  • Understanding that 80 plus percent isn’t simple.  Like Jeopardy! … subtle meaning, irony, riddles, acronyms, abbreviations and other complexities all present unique computing challenges not found with structured data in order to derive meaning and insight. This is where natural language processing (NLP) comes in.

The same core NLP technology used in Watson is available now to deliver business value today by unlocking the insights trapped in the massive amounts of unstructured information in the many systems and formats you have today.  Understanding the content, context and value of this unstructured information presents an enormous opportunity for your business.  This is already being done today in a number of industries by leveraging IBM Content Analytics.

IBM Content Analytics (ICA) itself is a platform to derive rapid insight.  It can transform raw information into business insight quickly without building models or deploying complex systems.  Enabling all knowledge workers to derive insight in hours or days … not weeks or months.  It helps address industry specific problems such as healthcare treatment effectiveness, fraud detection, product defect detection, public safety concerns, customer satisfaction and churn, crime and terrorism prevention and more.  Here are some actual customer examples:

Healthcare Research – Like most healthcare providers, BJC Healthcare, had a treasure trove of historical information trapped in unstructured clinical notes, diagnostic reports containing essential information for the study of disease progression, treatment effectiveness and long-term outcomes.  Their existing Biomedical Informatics (BMI) resources were disjointed and non-interoperable, available only to a small fraction of researchers, and frequently redundant, with no capability to tap into the wealth of research information trapped in unstructured clinical notes, diagnostic report and the like.

With IBM Content Analytics, BJC and university researchers are now able to analyze unstructured information to answer key questions that were previously unavailable.  Questions like: Does the patient smoke?, How often and for how long?, If smoke free, how long? What home medications is the patient taking? What is the patient sent home with? What was the diagnosis and what procedures performed on patient?  BJC now has deeper insight into medical information and can uncover trends and patterns within their content, to provide better healthcare to their patients.

Customer Satisfaction – Identifying customer satisfaction trends about products, services and personnel is critical to most businesses.  The Hertz Corporation and Mindshare Technologies, a leading provider of enterprise feedback solutions, are using IBM Content Analytics software to examine customer survey data, including text messages, to better identify car and equipment rental performance levels for pinpointing and making the necessary adjustments to improve customer satisfaction levels.

By using IBM Content Analytics, companies like Hertz can drive new marketing campaigns or modify their products and services to meet the demands of their customers. “Hertz gathers an amazing amount of customer insight daily, including thousands of comments from web surveys, emails and text messages. We wanted to leverage this insight at both the strategic level and the local level to drive operational improvements,” said Joe Eckroth, Chief Information Officer, the Hertz Corporation.

For more information about ICA at Hertz: http://www-03.ibm.com/press/us/en/pressrelease/32859.wss

Research Analytics – To North Carolina State University, the essence of a university is more than education – it is the advancement and dissemination of knowledge in all its forms.  One of the main issues faced by NC State was dealing with the vast number of data sources available to them.  The university sought a solution to efficiently mine and analyze vast quantities of data to better identify companies that could bring NC State’s research to the public.  The objective was a solution designed to parse the content of thousands of unstructured information sources, perform data and text analytics and produce a focused set of useful results.

Using IBM Content Analytics, NC State was able to reduce the time needed to find target companies from months to days.  The result is the identification of new commercialization opportunities, with tests yielding a 300 percent increase in the number of candidates.  By obtaining insight into their extensive content sources, NC State’s Office of Technology Transfer was able to find more effective ways to license technologies created through research conducted at the university. “What makes the solution so powerful is its ability to go beyond conventional online search methods by factoring context into its results.” – Billy Houghteling, executive director, NC State Office of Technology Transfer.

For more information about ICA at NC State: http://www-01.ibm.com/software/success/cssdb.nsf/CS/SSAO-8DFLBX?OpenDocument&Site=software&cty=en_us

You can put the technology of tomorrow to work for you today, by leveraging the same IBM Content Analytics capability helping to power Watson.  To learn more about all the IBM ECM products utilizing Watson technology, please visit these sites:

IBM Content Analytics: http://www-01.ibm.com/software/data/content-management/analytics/

IBM Classification Module: http://www-01.ibm.com/software/data/content-management/classification/

IBM eDiscovery Analyzer: http://www-01.ibm.com/software/data/content-management/products/ediscovery-analyzer/

IBM OmniFind Enterprise Edition: http://www-01.ibm.com/software/data/enterprise-search/omnifind-enterprise/

You can also check out the IBM Content Analytics Resource Center or watch the “what it is and why it matters” video.

I’ll be at the Jeopardy! viewing party in Washington, DC on February 15th and 16th … hope to see you there.  In the mean time, leave me your thoughts and questions below.

WikiLeaks Disclosures … A Wakeup Call for Records Management

Earlier in my professional career, I used to hit the snooze button 4 or 5 times every morning when the alarm went off. I did this for years until I realized it was the root cause of being late to work and getting my wrists slapped far too often. It seems simple, but we all hit the snooze button even though we know the repercussions. Guess what … the repercussions are getting worse.

For years, the federal government has been hitting the snooze button on electronic records management. The GAO has been critical of the Federal Government’s ability to manage records and information saying there’s “little assurance that [federal] agencies are effectively managing records, including e-mail records, throughout their life cycle.” During the past few administrations, similar GAO reports and/or embarrassing public information mismanagement incidents have reminded us (and not in a good way) of the importance of good recordkeeping and document control. You may recall incidents over missing emails involving both the Bush and Clinton administrations. Now we have Wikileaks blabbing to the world with embarrassing disclosures of State Department and military documents. This is taking the impact of information mismanagement to a whole level of public embarrassment, exposure and risk. Although it should not be surprising to anyone that this is happening considering the previous incidents and GAO warnings it has still caused quite a stir and had a measurable impact. Corporations should see this as a cautionary tale and a sign of things to come … so start preparing now.

Start by asking yourself, what would happen if your sensitive business records were made publicly available and the entire world was talking, blogging and tweeting about it. For most organizations, this is a very scary thought. Fortunately, there are solutions and best practices available today to protect enterprises from these scenarios.

Implement Electronic Records Management: Update your document control policies to include the handling of sensitive information including official records. Do you even have an Information Lifecycle Governance strategy today? Start by getting the key stakeholders from Legal, Records and IT involved, at a minimum, and ensure you have top down executive support. Implement an electronic records program and system based on an ECM repository you can trust (see my two earlier blogs on trusting repositories). This will put the proper controls, security and policy enforcement in place to govern information over it’s lifespan including defensible disposition. Getting rid of things when you are supposed to dramatically reduces the risk of improper disclosure. Although implementing a records management system has many benefits, including reducing eDiscovery costs and risks, it is also the cornerstone of preventing information from falling into the wrong hands. Standards (DoD 5015.02-STD, ISO 15489), best practices (ARMA GARP) and communities (CGOC) exist to guide and accelerate the process. Records management can be complimented by Information Rights Management and/or Digital Loss Prevention (DLP) technology for enhanced security and control options.

Leverage Content Analytics: Use content analytics to understand employee sentiment and as well as detect any patterns of behavior that could lead to intentional disclosure of information. These technologies leverage text and content analytics to identify disgruntled employees before an incident occurs enabling proactive investigation and management of potentially troublesome situations. They can also serve as background for any investigation that may happen in the event of an incident. Enterprises should proactively monitor for these risks and situations … as an ounce of prevention is worth a pound of cure. Content analytics can also be extended with predictive analytics to evaluate the probably of an incident and the associated exposure.

Leverage Advanced Case Management: Investigating and remediating any risk or fraud scenario requires advanced case management. These case centric investigations are almost always ad-hoc processes with unpredictable twists and turns. You need the ad-hoc and collaborative nature of advanced case management to serve as a process backbone as the case proceeds and ultimately concludes. Having built-in audit trails, records management and governance ensures transparency into the process and minimizes the chance of any hanky-panky. Enterprises should consider advanced case management solutions that integrate with ECM repositories and records management for any content-centric investigation.

This adds up to one simple call to action … stop hitting the snooze button and take action. Any enterprise could be a target and ultimately a victim. The stakes are higher then ever before. Leverage solutions like records management, content analytics and advanced case management to improve your organizations ability to secure, control and retain documents while monitoring and remediating for potential risky disclosure situations.

Leave me your thoughts and ideas. I’ll read and respond later … after I am done hitting the snooze button a few times (kidding of course).

Top 10 ECM Pet Peeve Predictions for 2011

It’s that time of the year when all of the prognosticators, futurists and analysts break out the crystal balls and announce their predictions for the coming year.  Not wanting to miss the fun, I am taking a whack at it myself but with a slightly more irreverent approach … with a Top 10 of my own.  I hope this goes over as well as the last time I pontificated about the future with Crystal Ball Gazing … Enterprise Content Management 2020.

I don’t feel the need to cover all of the cool or obvious technology areas that my analyst friends would.  A number of social media, mobile computing and cloud computing topics would be on any normal ECM predictions list for 2011.  I do believe that social media, combined with mobile computing, delivered from the cloud will forever change the way we interact with content but this list is more of my own technology pet peeve list.  I’ve decided to avoid this set of topics as there is plenty being written about all three topics already.  I’ve also avoided all of the emerging fringe ECM technology topics such as video search, content recommendation engines, sentiment analysis and many more.  There is plenty of time to write about those topics in the future.  Getting this list to just 10 items wasn’t easy … I really wanted to write something more specific on how lousy most ECM meta data is but decided to keep the list to these 10 items.  As such, ECM meta data quality is on the cutting room floor.  So without further a do … Craig’s Top 10 Pet Peeve Predictions for 2011:

 
Number 10:  Enterprise Search Results Will Still Suck
Despite a continuing increase in software sales and an overall growing market, many enterprises haven’t figured out that search is the ultimate garbage in, garbage out, model.  Most end-users are frustrated at their continued inability to find what they need when they need it.  Just ask any room full of people.  Too many organizations simply decide to index everything thinking that’s all you need to do … bad idea.  There is no magic pill here, search results will ultimately improve when organizations (1) eliminate the unnecessary junk that keeps cluttering up search results and (2) consistently classify information, based on good meta data, to improve findability.  Ultimately, enterprise search deployments with custom relevance models can deliver high quality optimal results, but that’s a pipedream for most organizations today.  The basics need to be done first and there is a lot of ignorance on this topic.  Unfortunately, very little changes in 2011, but we can hope.
 
Number 9:  Meaning Based Technologies Are Not That Meaningful
Meaningful to whom?  It’s the user, business or situation context that determines what is meaningful.  Any vendor, with a machine based technology claiming that it can figure out meaning without understanding the context of the situation is stretching the truth.  Don’t be fooled by this brand of snake oil.  Without the ability to customize to specific business and industry situations these “meaning” based approaches don’t work … or are of limited value.  Vendors currently making these claims will “tone down” their rhetoric in 2011 as the market becomes more educated and sophisticated on this  topic.  People will realize that the emperor has no clothes in 2011.
 
Number 8:  Intergalactic Content Federation Is Exposed As A Myth
The ability to federate every ECM repository for every use case is wishful thinking.  Federation works very well when trying to access, identify, extract and re-use content for applications like search, content analytics, or LOB application access.  It works poorly or inconsistently when trying to directly control content in foreign repositories for records management and especially eDiscovery.  There are too many technology hurdles such as security models, administrator access, lack of API support, incompatible data models that make this very hard.  For use cases like eDiscovery, many repositories don’t even support placing a legal hold.  Trying to do unlimited full records federation or managing enterprise legal holds in place isn’t realistic yet … and may never be.  It works well in certain situations only.  I suppose, all of this can be solved with enough time and money but you could say that about anything – it’s simply not practical to try to use content federation for every conceivable use case and that won’t change in 2011.  This is another reason why we need the Content Management Interoperability Standard (CMIS).
 
Number 7:  CMIS Adoption Grows, Will Be Demanded From All Content, Discovery and Archive Vendors
Good segue, huh?  If federation is the right approach (it is), but current technology prevents it from becoming a reality, then we need a standard we can all invest in and rely on.  CMIS already has significant market momentum and adoption.  Originally introduced and sponsored by IBM, EMC, Alfresco, OpenText, SAP and Oracle, it is now an OASIS standard where the list of members has expanded to many other vendors.  IBM is already shipping CMIS enabled solutions and repositories, as are many others.  However, some vendors still need encouragement.  None of the archiving or eDiscovery point solution vendors have announced support for CMIS yet.  I expect to see market pressure in 2011 on any content related vendor not supporting CMIS … so get ready Autonomy, Symantec, Guidance Software, and others who are not yet supporting CMIS.  The days of closed proprietary interfaces are over.  
 
Number 6:  ACM Blows Up BPM (in a good way)
Advanced Case Management will forever change the way we build, deploy and interact with process and content centric (or workflow if you are stuck in the ’90s) applications.  Whether you call it Advanced Case Management, Adaptive Case Management or something else, It’s only a matter of time before the old “wait for months for your application model” is dead.  Applications will be deployed in days and customized in hours or even minutes.  IT and business will have a shared success model in the adoption and use of these applications.  This one is a no-brainer.  ACM takes off in a big way in 2011.
 
Number 5:  Viral ECM Technologies without Adequate Governance Models Get Squeezed
In general, convenience seems to trump governance, but not this year.  The viral deployment model is both a blessing and a curse.  IT needs to play a stronger role in governing how these collaborative sites get deployed, used and eventually decommissioned.  There is far too much cost associated with eDiscovery and the inability to produce documents when needed for this not to happen.  There are way too many unknown collaborative sites containing important documents and records.  Many of these have been abandoned causing increased infrastructure costs and risk.  The headaches associated with viral deployments force IT to put its foot down in 2011.  The lack of governance around these viral collaborative sites becomes a major blocker to their deployment starting in 2011.
 
Number 4:  Scalable and Trusted Content Repositories Become Essential
Despite my criticism of AIIM’s labeling of the “Systems of Engagement” concept in my last blog, they’ve nailed the basic idea.  “Systems or Repositories of Record” will be recognized as essential starting in 2011.  We expect 44 times the growth of information in 10 years with 85% being unstructured, yikes!  We’re going to need professional, highly scalable, trusted, defensible repositories of record to support the expected volume and governance requirements, especially as ECM applications embrace content outside the firewall.  Check out my two postings earlier this year on Trusted Content Repositories for more on this topic (Learning How To Trust … and Step 1 – Can You Trust Your Repository?)
 
Number 3:  Classification Technology Is Recognized As Superior To Human Based Approaches
For years, I’ve listened to many, many debates on human classification versus machine based classification.  Information is growing so out of control that it’s simply not possible to even read it all … much less decide how it should be classified and actually do it correctly.  The facts are simple; studies show humans are 92% accurate at best.  The problem is that humans opt out sometimes.  We get busy, get sick, have to go home or simply refuse to do certain things.  When it comes to classification, we participate about 33% of the time on average.  Overall, this makes our effective accuracy more like 30% and not 92%.  Context technology based approaches have consistently hit 70-80% over the years and recently we’ve seen accuracy levels as high as 98.7%.  Technology approaches cost less too.  2011 is the year of auto-classification.
 
Number 2:  Business Intelligence Wakes Up – The Other 85% Does Matter
It’s a well known fact that ~85% of the information being stored today is unstructured.  Most BI or data warehouse deployments focus on structured data (or only 15% of the available information to analyze).  What about the rest of it?  The explosion of content analysis tools over the last few years has made the 85% more understandable and easy to analyze then ever before and that will continue into 2011.  BI, data warehouse and analytics solutions will increasingly include all forms of enterprise content whether inside or outside the firewall.
 
Number 1:  IT Waste Management Becomes a Top Priority
The keep everything forever model has failed.  Too many digital dumpsters litter the enterprise.  It’s estimated over 90% of info being stored today is duplicated at least once and 70% is already past its retention date.  It turns out buying more storage isn’t cheaper, once you add in the management staff, admin costs, training, power and so forth.  One customer told me they’d have to build a new data center every 18 months just to keep storing everything.  In 2011, I expect every organization to more aggressively start assessing and decommissioning unnecessary content as well as the associated systems.  The new model is keep what you need to keep … for only as long as you need to keep it based on value and/or obligation … and defensibly dispose of the rest.
 
I hope you enjoyed reading this as much as I enjoyed writing it.  I hope you agree with me on most of these.  If not, let me know where you think I am wrong or list a few predictions or technology pet peeves of your own.  
 

It’s Back to the Future, Not Crossing the Chasm When it Comes to AIIM’s “Systems of Record” and “Systems of Engagement”

Pardon the interruption from the recent Information Lifecycle Governance theme of my postings but I felt the need to comment on this topic.  I even had to break out my flux capacitor for this posting to remind me as I was certain I had seen this before.

Recently at the ARMA Conference and currently in the AIIM Community at large, there is a flood of panels, webinars, blog postings and tweets on a “new” idea from Geoffrey Moore (noted author and futurist) differentiating “Systems of Record” from “Systems of Engagement.” This idea results from a project at AIIM where Geoffrey Moore was hired as a consultant to give the ECM industry a new identity among other things. One of the drivers of the project has been the emergence and impact of social media on ECM. The new viewpoint being advocated is that there is a new and revolutionary wave of spending emerging on “Systems of Engagement” – a wave focused directly on knowledge worker effectiveness and productivity.

Let me start by saying that I am in full agreement with the premise behind the idea that there are separate “Systems of Record” and “Systems of Engagement.” I am also a big fan of Geoffrey Moore. I’ve read most of his books and have drank the Chasm, Bowling Alley, Tornado and Gorilla flavors of his Kool-Aid. In fact, Crossing the Chasm is mandatory reading on my staff.

Most of the work from the AIIM project involving Moore has been forward thinking, logical and on target. However, this particular outcome does not sit well with me. My issue isn’t whether Moore and AIIM are right or wrong (they are right). My issue is that this concept isn’t a new idea. At best, Geoffrey has come up with a clever new label. The concept of “System of Record” is nothing new and a “System of Engagement” is a catchy way of referring to those social media systems that make it easier to create, use, and interact with content.

Here is where AIIM and Moore are missing the point. Social Media is just the most recent, not the first “System of Engagement.” Like those before it, these previous engagement systems were not capable of also being “Systems of Record” … so we need both … we’ve always needed both. It’s been this way for years. Apparently though, we needed a new label as everyone seems to have jumped on the bandwagon except me.

Let me point out some of the other “Systems of Engagement” over the years. For years, we’ve all been using something called Lotus Notes and/or Microsoft Exchange as a primary system to engage with our inner and outer worlds. This engagement format is called email … you may have heard of it. Kidding aside, we use email socially and always have. We use email to engage with others. We use email as a substitute for content management. Ever send an email confirming a lunch date? Ever communicate project details in the body of an email? Ever keep your documents in your email system as attachments so you know where they are? You get the idea. Email is not exactly a newfangled idea and no one can claim these same email systems also serve any legitimate record keeping purpose. There is enough case law and standards to fill a warehouse on that point (pardon the paper pun). More recently, instant messaging has even supplanted email for some of those same purposes especially as a way to quickly engage and collaborate to resolve issues. No one is confused about the purpose of instant messaging systems. It can even be argued that certain structured business systems like SAP are used in the same model when coupled with ECM to manage key business processes such as accounts payable. The point being, you engage in one place and keep records or content in another place. Use the tool best suited to the purpose.

Using technology like email and instant messaging to engage with, collaborate and communicate on content related topics with people is not a new idea. Social media is just the next thing in the same model. On one hand, giving social media and collaboration systems a proper label is a good thing. On the other hand, give me a break … any Records Manager doing electronic records embraced the concept of “record making applications” and “record keeping systems” a long time ago. It’s a long standing proven model for managing information. Let’s call it what it is.

I applaud AIIM and Moore for putting this idea out there but I also think they have both missed the mark. “Systems of Engagement” is a bigger, different and proven idea than how both currently talking about it. Maybe I am Luddite, but this seems to me like this simply a proven idea that got a fresh coat of paint.

As AIIM and Moore use words like “revolution” and “profound implications” in their promotional materials I think I’ll break out my Back to the Future DVD and stay a little more grounded.  Like a beloved old movie, I am still a fan of both Moore and AIIM.  However, I recommend you see this particular movie for yourself and try to separate the hype from the idea itself.  If you do, let me know whether you agree … is this an original idea or a simply a movie sequel?

Why Information Lifecycle Management (ILM) Failed But Needs an Updated Look

If you know me, you know I advocate something called Information Lifecycle Governance (ILG) as the proper model for managing information over its’ lifespan.  I was reminded recently (at IOD) during a conversation with Sheila Childs, who is a top Gartner analyst in this subject area, of a running dialogue we have on the differences between governance at the storage layer and using records management and retention models as an alternative approach.  This got me thinking about the origins of the ILG model and I decided to take a trip in the “way-back” machine for this posting.

 

Accordingly to Wikipedia as of this writing, Information Lifecycle Management refers to a wide-ranging set of strategies for administering storage systems on computing devices.  Searchstorage.com (an online storage magazine) offers the following explanation:  Information life cycle management (ILM) is a comprehensive approach to managing the flow of an information system’s data and associated metadata from creation and initial storage to the time when it becomes obsolete and is deleted. Unlike earlier approaches to data storage management, ILM involves all aspects of dealing with data, starting with user practices, rather than just automating storage procedures, as for example, hierarchical storage management (HSM) does. Also in contrast to older systems, ILM enables more complex criteria for storage management than data age and frequency if access. ILM products automate the processes involved, typically organizing data into separate tiers according to specified policies, and automating data migration from one tier to another based on those criteria. As a rule, newer data, and data that must be accessed more frequently, is stored on faster, but more expensive storage media, while less critical data is stored on cheaper, but slower media. However, the ILM approach recognizes that the importance of any data does not rely solely on its age or how often it’s accessed. Users can specify different policies for data that declines in value at different rates or that retains its value throughout its life span. A path management application, either as a component of ILM software or working in conjunction with it, makes it possible to retrieve any data stored by keeping track of where everything is in the storage cycle.

If you were able to get all the way through that (I had to read it 3 times) you probably concluded that (1) it was way too complicated (2) was very storage centric and likely too costly (3) was incomplete.  These are all reasons why this concept never took hold and is widely considered a failed concept.

But hold on … let’s not throw the baby out with the bath water quite yet.  The underlying idea is sound but needs modification.  In my opinion, here is what was wrong with the notion of ILM when it came to prominence in 2002 or so:

It’s incomplete:  Frequency of access does not determine the usefulness of information.  Any set of policies need to include the value of the information to the business itself and the legal and regulatory obligations.  Only calculating how recently files were accessed and used is an incomplete approach.  Wouldn’t it make sense to understand all of the relevant facets of information value (and obligations) along with frequency of access? 

It’s inefficient and leads to error:  Managing policies at the device level is a bad idea.  As an example, many storage devices require setting the retention policy at the device itself.  This seems crazy to me as a general principle.  Laws and obligations change, policies changes, humans make errors … all of which leads to a very manual time-consuming and error prone policy administration process.  Wouldn’t a centrally managed policy layer make more sense?

It’s not well understood and can be too costly:  This model has led to the overbuying of storage.  Many organizations have purchased protected storage when it was not necessary.  These devices are referred to as NENR (Non Erasable, Non Rewritable) or WORM (Write Once, Read Many).  These devices come in multiple flavors:  WORM Optical, WORM Tape and Magnetic Disk WORM (Subystem) and can include multiple disks with tiered tape support.  Sample vendors include: EMC Centera, Hitachi HCAP, IBM DR550, NetApp Snaplock and IBM Information Archive.  This class of storage costs more then other forms of storage primarily because of the perception of safety.  Certain storage vendors (who will remain nameless) have latched onto this market confusion and even today try to “oversell” storage devices as a substitute for good governance.  This is often to uninformed or ill-advised buyers.  The fact is, only the SEC 17a-4 regulation requires WORM storage.  Using WORM for applications other then SEC 17a-4 usually means you are paying too much for storage and creating retention conflicts (more on this in a future posting).  The point is … only buy protected storage when appropriate to your requirements or obligations.

If we could just fix those issues, is the ILM concept worth re-visiting?  It’s really not that hard of a concept.  When information is born, over 90% is born digital.  Over 95% expires and needs to be disposed of.  Here is a simple concept to consider:

A simple model for governing information over its' lifespan

I will go deeper in this very concept (and model) in my next posting.  In the mean time, leave me your thoughts on the topic. 

I am also curious to know if you have been approached by an overly zealous vendor trying to sell you WORM based storage as a replacement for good governance or records management.  I will publish the results.

IBM Acquires PSS Systems – You Might Be Asking Why?

In case you missed it, IBM announced today the acquisition of PSS Systems.

You might be asking why?  Organizations are striving for rigorous discovery, more effective information retention, and legally-defensible data disposal because of rising eDiscovery pressures and exponential information growth.  According to Information Week, a whopping 17% – and rising – of organizations’ IT budgets is now spent on storage.   A new Compliance, Governance and Oversight Council (CGOC) Benchmark Report on information governance revealed fewer than 25% of organizations are able to dispose of data because they lack rigorous legal hold practices or effective record retention programs.  eDiscovery costs average over $3 million per case yet an estimated 70% of information is often needlessly retained; as with escalating IT costs, the root cause of escalating eDiscovery cost is the inability to dispose of information when it is no longer needed.

Organizations struggle with these issues.  What has been missing up until now are: 1) a way to coordinate policy decisions for legal hold and retention management across stakeholders; and 2) a way to systematically execute those policy decisions on high volumes of information that are often residing in disparate systems.  To effectively determine what is eligible for disposal, organizations must determine and associate the legal obligations for information and its specific business value with information assets.  With multiple stakeholders, litigation intensity and information diversity across the enterprise, it is essential to coordinate and formalize policy decisions in real time as they are made by legal, records and business groups and automate the execution of those policies on information across the enterprise.

These problems are of high importance to legal and IT executives; 57% have established executive committees to drive better legal and lifecycle governance outcomes but less than 1/3 of organizations have achieved the desired cost and risk reduction results.

Organizations lack sufficient internal competency or resources to quantify the cost and risk business case and define the program structures necessary to achieve their defensible disposal goals.  While 98% of organizations cite defensible disposal as the results they are seeking, only 17% believe they have the right people resources at the table1.  The analysts predict that the market for these kinds of governance solutions will experience significant growth through 2014, they also point out that internal cooperation and competencies are barriers today.

Now with the acquisition of PSS Systems, only IBM provides a comprehensive and integrated enterprise solution for legal and information lifecycle governance, along with the business expertise that customers need to reduce legal risk and lower discovery and information and content management costs. The PSS Atlas legal information governance solutions complement and extend IBM’s existing Information Lifecycle Governance strategy and integrated suite of solutions.  This joint olution and approach is unlike others that address only a single silo such as legal, which fail to systematically link legal decisions to corresponding information assets and therefore don’t fully mitigate risk or actually increase the cost of compliance.

Until now, organizations’ choices were limited, and reinforced their problems by failing to systematically link legal obligations and business value to information assets. Often initial selection of tactical eDiscovery applications left organizations with high risk and compliance cost and no path forward to defensible disposal because these tactical applications don’t integrate holistically with records and retention management, email archiving, advanced classification and enterprise content management systems and infrastructure.

Those days are over !!  If you can’t tell … I am excited about the future of how we plan to help customers tackle these problems in concert with our new colleagues from PSS Systems.