Spring Cleaning for Information

I find myself wondering (as I plan to clean out the garage today) what time of year we’re supposed to throw out all that unnecessary information we keep around.  Since cleaning out the garage doesn’t qualify as fun in my book, I would sure be easier just to add space to my garage.  That way, I’d never have to throw anything away.  It would cost alot … and make it much harder to find important stuff among all of the clutter but it would be easier.  Maybe I should just call a contractor (5 minutes) rather then actually clean out the garage (at least an hour or more).  Hhhmmm …

It’s funny that when it comes to this aspect of information management we seem to always take the path of least resistance.  I’ve lost count of many times I’ve heard “storage is cheap” or other reasons why organizations don’t properly manage the lifespan of their information.  Most organizations don’t have a responsible program to properly dispose of electronically stored information.  How is this possible when those same organizations usually have good control over and properly dispose of paper based information?

Sure it’s harder to properly organize, retain and dispose of electronically stored information but the keep everything forever model has failed.  Buying more storage is not the answer.  Storage already consumes (on average) 17% of IT budgets and information will continue to explode … eventually gobbling up increasing percentages of IT budgets.  When does it end?  It won’t by itself.  Left unattended, this information explosion will eventually consume all remaining IT budget dollars and cripple or prevent any strategic investments by IT.

If that weren’t sobering enough, valued information is already buried beneath too much unnecessary information.  Much of it is over-retained, irrelevant and duplicated.  This is causing runaway storage and infrastructure costs and exacerbating power, space and budget challenges.  It’s also creating an inability to find and produce critical information, especially under punitive scenarios and deadlines.  How can anyone find and leverage the useful and trustworthy information lost among all the junk?

This sounds exactly like my garage … the power went out the other night and I was desperate to find that really cool flashlight I bought last year in case of a power outage.  Couldn’t find it, which ended up being my motivation to clean out the garage and throw out all of the unneccessary stuff that is piling up.  No garage extension for me!  No offsite storage facility either!  The fact is, I don’t want to spend more money on simply storing random unnecessary stuff.  I have higher value activities to spend my budget on … like golf 🙂

Isn’t it time every organization did their own information spring cleaning?  It would reduce storage/infrastructure costs, improve findability of information, reduce legal risks and increase usefulness and re-use of information.

Maybe you are already planning to clean out your garage of enterprise information.  Leave me your thoughts on the topic or visit us at the upcoming National Conference on Managing Electronic Records in Chicago.  We’ll be doing a special session on Content Assessment and how to use Content Analytics to identify and defensibly decommission and routinely dispose of unnecessary information.

Crystal Ball Gazing … Enterprise Content Management 2020

With a few big ECM related announcements over the past couple of weeks … Microsoft SharePoint 2010 and IBM Advanced Case Management topping the list, I thought I would do a little crystal ball gazing and set my sights on the future.  This is always fun and a bit risky at the same time.  Consider Thomas J. Watson, Sr. who despite very scant evidence is widely credited with saying (in 1943): “I think there is a world market for maybe five computers”.   I supppose we’ll never know for sure if Watson said it or not, but despite the risk of being wrong, I share my perspective on what ECM will look like in 10 years.

In today’s ECM … You find the document you need

The Document is King

ECM as we know it today started as a way to control paper and evolved to electronic documents.  From there it grew into something that could support the document sharing and creation processes and then into electronic business process creation, management and optimization.  It has increasingly been enhanced and expanded in a number ways … most notably with better search, process and compliance technologies.  Even today, nearly 30 years after the founding of FileNet (who is largely credited for inventing the industry in the 1980s), the document is the center of the universe with everything else in supporting roles.  It’s become so easy to create and share documents we’ve lost a critical value point along the way … the context.  Endless hours are spent searching for documents without the ability to know which versions are trusted and are an accurate representation of the business context.  We’re at a tipping point of a business realization and content utility – while critical to the success of a business, a document is still only a communication medium useful for the ability to track thinking and provide historical value.  A document is a form of communication, not the end goal of a business.  Workers, in their roles, are what supports the end goal of business today.  Successful businesses and their employees need technology to support their roles and to enable deliverly of savings and new revenue.  Today … these same workers are expected to produce results AND manage / find documents.   This dynamic will change over the next 10 years, driven in part by advances in collaborative, social, case management and other ECM related technologies. 

 The Context (or worker role) is King

In 2020 … The document you need finds you.  ECM will evolve and minimally exist as we understand it today.  The concept of ECM will have changed because it’s no longer about “content,” rather it’s “worker role” or “context” as the central planning aspect.  ECM as we knew it in 2010, will have become more than content repositories, process, records and searching.  Workers, and their business roles are the central aspect, with all processes and communication flows either inputs our outputs in context to these roles.  Some examples are:

  • Inbound and outbound communication are expanded to include voice and text from any source (unified communications becomes table-stakes).
  • Processes are defined, developed and optimized around supporting role execution.
  • Composite information dashboards are the interface across systems and processes and deliver in-business-context information on demand.
  • Information is organized, trusted, proactively managed and self described.
  • Certain basic ECM problems have been “fixed” and no longer top priority ECM issues (retention, disposition, eDiscovery, search result quality).
  • Institutional knowledge is managed as knowledge, not documents.
  • Content Analytics is the norm where Ontologies describe trusted semantic relationships from both internal and external information sources.
  • Processes and system interactions become truly dynamic and can “learn” from historical execution to recommend streamlining options.
  • On-premise, appliance, cloud and hybrid delivery models all interoperate and are invisible to end-users.

I could go on for pages but if anything remotely like this crystal ball vision comes true … the implications for all ECMers are significant.  Technologies like Microsoft SharePoint, IBM Lotus Quickr, IBM Advanced Case Management, IBM Content Analytics are amomng those that will drive the next generation of ECM usage and adoption within business context of businesses, processes, workers and roles.

Time will tell if I am right or not.  In the mean time, leave me your feedback … what does ECM look like in 2020 in your crystal ball?  I’d love to see what everyone else thinks about the future of ECM (right or wrong) … after all it was the same Thomas Watson who said: “The way to succeed is to double your error rate”.

Creating and Managing Trusted Content

I remember when I was in elementary school (don’t laugh), that my best friend Danny tried to change one of the grades on his quarterly report card.   We used to walk home from school together and on this day we stopped at the corner drug store where he bought some office supplies and went about “altering” his report card.  Ahhh … the things you think you can get away with in 5th grade … so foolish.  It was a great plan for Danny right up until the point his Mom spotted the obvious change.  Needless to say, Danny’s report cards could no longer be trusted as an accurate representation of his school performance.  It completely backfired and his report cards got more scrutiny then he could have ever wanted, all the way through high school.  I think he makes fake passports today (kidding).  He actually works for a large financial institution (not kidding).

This one incident made Danny’s parent’s suspicious of the entire school grade reporting process and they never trusted report cards again.  He ruined it for his younger sister too.  It’s the same with documents.  We need a better process (and technology) to ensure our documents and records can be trusted for business decision making.  The implications in business are far more catastrophic.

Consider the large distributor who has multiple versions of contracts and supplier agreements.  The business fails to reference the correct version of a contract addendum that materially changes key terms and conditions between the parties.  This results in a dispute and has trickle down implications of disrupting shipments which customer complaints and cancelled orders … all because someone used the wrong content.  In short, it’s paramount to have trust in out content.

Here are three strategies you can take to bring trust to your content:

Clean-up the backlog … assess and separate trusted content from suspect content.  Decommission and dispose of what is not necessary to keep.  Preserve and exploit your trusted content from your trusted content repositories (discussed in a previous posting).

Instrument ad-hoc and controlled document creation and approval processes … establish event and process based steps (or KPIs) to measure, trigger, review and monitor the accuracy of content that is designated as trusted.

Enhance meta data and leverage master data … to clean up dirty document meta data and reference trusted data sources within the enterprise.  Ensure an accurate 360-degree view all information assets and meta data.

Obviously there are a number of ways to make content quality better and improve document based decision making.  The trick is … how to do it without burdening the business users.  Manual methods are thought to be easy but always fail as human beings are inconsistent, sometimes inaccurate and can refuse to cooperate.  In some rare cases … humans take matters into their own hands.  Don’t take a “Danny” approach to trusting your content. 

Choose one or more of the above paths and increase the accuracy of content based decisions in your organization.  If you don’t, I may have to send Danny’s Mom out to have a talk with you.

Learning How To Trust …

Before my digression last posting into a perspective on ECM systems integrators … I was describing the characteristics of trusted ECM repositories (see Step 1 – Can You Trust Your Repository?).  Picking up from there …

Since choosing the right repository or content storage location is so important, how can we objectively evaluate the repositories we have?  Use this scoring model to assess and designate your content storage options (including ECM repositories) as Trusted Content Repositories (TCRs)


Level 0 – is missing key capabilities like security, basic content services and APIs.  This category represents file shares, CDs and other relatively unsecure locations.  These environments are flexible and useful but the missing capabilities cause us to lose confidence (or trust) in the content we keep there.  Imagine building an application that delivers critical documents only to have an end-user delete the underlying files.

Level 1 – Missing key capabilities like repository governance and lineage.  This category represents SharePoint, wikis and blogs and other environments with user controlled governance.  These environments are fantastic for collaboration and are easy to deploy but are missing essential capabilities when the environment itself can’t be properly governed and secured in accordance with IT standards (including the ability to meet SLAs).  Imagine building an application that depends on critical documents only to have an end-user retire the SharePoint site that used to content the needed documents or records.

Level 2 – Missing a few key capabilities to instrument and automate workflows like event management and content federation.  This category represents most ECM repositories from major vendors like IBM, EMC, OpenText and selected others.  The missing capabilities enable us to have confidence the right documents are designed as “trusted” so they can be found, automated and consumed with confidence.

Level 3 – Has all of the key capabilities.  This is the optimal level for trusted content applications.  Only IBM FileNet P8 has all of these characteristics today.

Remember … if you can’t trust your repository you can’t trust what is in it, can you?  Critical content must be stored in Trusted Content Repositories … it’s that simple.  Next time we’ll explore what it takes to create and maintain trusted content.  In the mean time, leave me your feedback on the model.

ECM Integrators … Should They Be Vendor Neutral Any More?

I am taking a one-week break from the 4 step governance approach to comment on a related topic.

An ECM consulting firm I hold in high regard (name withheld) recently published an article justifying a vendor neutral ECM consulting / system integrator strategy.  As I read this article, it struck me as a very 1980s point of view.  Back when the vendor landscape had hundred’s firms and the technology was less mature, this may have been an enlightened perspective and strategy.  In this article, the firm laid out all the reasons why their vendor neutral strategy made sense but failed to point out the reasons why it no longer makes sense.

I’ll focus on a single reason … access to information and certifications.  Having access to essential information is critical to successful ECM solution delivery and value creation. 

Let me explain … I would imagine any vendor is willing to make a certain level of information available to any consulting or system integration firm that inquires.  In the case of IBM, that information is limited to what is publicly available (as you might expect).  Firms that have “official” relationships with IBM are entitled to another level of information, much of which is confidential and not publicly available for obvious reasons.  IBM partners are entitled to, and depend on (to deliver customer value), access to detailed product plans, training materials and most importantly product and solution certifications.

Customers tell us they only want to deal with certified partners.  They insist on partners having access to the latest plans and technical info … and they prefer those integrators who have invested in skilled and certified personnel to ensure high quality and high value solutions.  When deployments become problematic, or fail, it almost always is due to lack of knowledge or skill by the integrating firm.  This might seem obvious but it happens.

I know this particular firm protects itself against scenarios of this nature somehow, but I still fail to see how any “vender neutral” firm can provide proper guidance to any customer without access to critical information such as detailed product plans, technical resources and most importantly … product and solution certifications.  What do they do … make it up? guess?

Of course you can still make “vendor neutral” recommendations IF you partner with the ECM vendors you make recommendations about.  That way, you have access to information, tools and resources and an informed point of view.

It might seem harsh but from where I sit, what was once enlightened is no longer so.  The consulting firms and integrators that deliver true value to customers have access to the latest information and are certified on the solutions they recommend and deliver. 

I loved the 80s but times have changed … the market and vendors have consolidated … technologies are much more mature … it’s time to move on.  Whether your vendor is IBM, or any of the other viable ECM vendors, only use certified consultants amd system integrators.

Step 1 – Can You Trust Your Repository?

The 4 steps to enable ECM to participate in information governance starts with choosing the right repository.  In short, Trusted Information needs to reside in trusted environments.  In the case of content, this means Trusted Content Repositories.  By definition, if you can’t trust the environment, you can’t trust the information itself.  Imagine building an application that delivers critical information (including documents or images) only to have an end-user delete the underlying files.  This can easily happen if your reference documents live on file systems or in improperly governed environments.  Unlike structured applications and databases, the users have a large majority of the control over content storage environments.  Imagine an end-user decommissioning a SharePoint team room only to have applications “break” that need to access the content that was residing in the now missing environment.  This can happen when important content is stored on file systems, wikis and other systems with inadequate governance and security controls.  Critical content must be stored in Trusted Content Repositories.

Some Key Actions to Take:

  • Evaluate your candidate content repositories to determine viability for use as a repository of record or Trusted Content Repository (TCR)
  • Designate Trusted Content Repositories for use in essential applications and only store critical content in TCRs.
  • Update operational practices to increase confidence and assurance of trusted information including a Trusted Content strategy supported by TCRs.

But how de we define what a Trusted Content Repository (TCR) is?  The following are characteristics of TCRs:

 Performance, Scalability and HA/DR

  • Support for billions of objects and thousands of users across the enterprise.
  • Support for SLA levels of disaster recovery and business continuity.

Preservation and Lineage Capabilities

  • Confident and assured immutability of content, structure, lineage and context over time.

Interoperability and Extensibility

  • Support for industry leading RDBMSs, application servers and operating systems.
  • Open and robust APIs including support for CMIS.

Content Capabilities

  • Basic ECM capabilities including versioning, meta data management, classification, content based retrieval, content transformation, etc.

Repository Governance

  • Deployments can be managed and controlled to protect against information supply chain breakdowns.

Information Lifecycle Governance

  • Support for all lifecycle events and processes including eDiscovery and records disposition.

Security, Access and Monitoring

  • Controls to promote access to authorized users and controls to prevent unauthorized access.
  • Auditing and monitoring for all related activities.

Physical Capabilities

  • Ability to support references to physical objects and entities.

Federation and Replication

  • Federation capabilities to provide a common meta data catalog across multiple repositories.

Business Process Management

  • Integrated business process management.

Events Based Architecture

  • Internal and external event support with trigger and subscription model.

All of these capaabilities are required to enable content participation to meet Information Governance requirement.  Next we’ll explore what it takes to create and maintain trusted content. In the mean time, do you agree with these characteristics?

A 4 Step Model for Trusted Content

Continuing in my recent theme of information governance and trusted information including enterprise content … we know that unstructured information (or enterprise content) is inherently different and requires a slightly different approach within the traditional data/information governance context.  Organizations need to take 4 key steps to include the unstructured side of things:

  1. Identify and designate trusted ECM Repositories of record
  2. Create, control and maintain trusted content
  3. Consume, leverage and exploit trusted information
  4. Govern the information lifecycle including archiving, recording and preserving information and evidence of transactions, processes and events

Why these 4 items? … imagine that you are the General Counsel of a publicly traded firm who is the defendant in a major lawsuit. What if …

  • You can’t find the information you are obligated, under court order, to produce?
  • You can find the information … and it actually exonerates you … but it can’t be trusted as an accurate representation of the facts (spoliation) and can’t / won’t be admitted as evidence.

How can you prove you behaved in a compliant and/or lawful manner if you can’t use your own information to defend yourself because it isn’t trustworthy?

This is just one example that illustrates the 4 necessary steps to enable ECM to participate in information governance initiatives.  Do you agree with all 4 steps?  Next week, we’ll go into more detail starting with the importance of choosing the right ECM repository.

What Happens When We Fail to Govern Content?

In my last posting I discussed the need for Trusted Information. Continuing in that theme … I am going to focus on the need for trust and governance in Enterprise Content Management (ECM).  Industry estimates claim that 80-85% of an organization’s stored information typically is unstructured data (or content).  I’ve always thought of this in simple terms.  Data tells me the who, what, where and when.  Content tells me the how and why. Together they represent the full business context of information.

As an ECM guy, I tend to think about Content first and I recently spent time thinking about what happens when we don’t have trust or governance in ECM.

When content is stored in the wrong place …

A large design firm needs to access legacy design documents that could be: forgotten on file shares, reside in abandoned team rooms, are needed after employees leave, lost in the shuffle of mergers and acquisitions, or are inadvertently deleted during storage clean-up, despite legal and regulatory requirements to control and retain.  This creates information risk and requires tedious re-creation across teams before design development can continue, among other challenges.

When the wrong content is considered trustworthy and is consumed …

A large distributor has multiple versions of contracts and supplier agreements that are disorganized and the business fails to use a contract addendum that materially changes the terms and conditions.  This may result in customer and partner dissatisfaction, causing disputes and costly litigation, disruption of key deliveries, a flurry of customer complaints, stopped “in process” orders and lower sales, among other challenges.

When content is not governed over its lifetime …

Maintenance records on an interstate pipeline are destroyed or misplaced and are not available to investigators after an accident.  This forces government regulators to suspend operations causing production, delays at refineries, higher costs to consumers and potential shortages to vital industries, among other challenges.

A large chemical manufacturer fails to destroy content and records in accordance with their corporate retention policy and are now burdened with the high cost of managing storage and eDiscovery with no visibility into what to destroy and when.  This requires them to continually manage and review information they shouldn’t even have, causing higher storage and other operational costs and more legal risk, among other potential challenges.

What scenarios come to mind for you when we fail to govern our information properly?

What is Trusted Information?

I’ve always defined information as the combination of two types which can be expressed as a Data + Content = Information.  Data, also known as structured data is what can usually found in rows and tables of databases.  Content is defined as unstructured data (or everything not living in databases).  In simple terms, content can be images, media files, documents, spreadsheets, PDFs … you get the idea.  I’ve heard it said before that structured data can tell you the who, what, where and when of what happened but only unstructured can tell you the how and why … which is usually the most important part.  Together they represent the full context of any information scenario.  If you are reading this, you probably already subscribe to this line of thinking and also agree the two worlds are colliding as enterprises mature in how they manage and govern both types of information.  Over the next few blog postings, I plan to discuss Information Governance … the how … and the why … the two worlds are coming together and key strategies to address it.

You’ve probably all seen the statistics … 42% of managers say they inadvertently use the wrong information at least once per week and so on.  In this day and age … how is this possible?  Would you want your doctor making a decision about your life based on the wrong, or older version, of treatment guidance?  “Oh nurse, where did we put that updated information on how to treat this illness?”  It sounds absurd, but that’s exactly how most enterprises manage their information assets, particularly their content.  What are we missing here?

The answer is trust.  We need trust.  We need to be able to find the correct (or trusted) information, at the right time, when we are making decisions.  Considering the average information worker spends 14.5 hours reading and answering email, 13.3 hours creating documents, 9.6 hours searching for information, 9.5 hours analyzing information … this is a big deal.

So let’s start by defining it … Trusted Information is information that has business value requiring its governance and retention.  It has the properties of:

  • Authority:  it is up-to-date and recognized as the reference copy of the relevant information.
  • Authenticity:  it is what it says it is and can be linked back to its source.
  • Reliability:  it can be trusted as a full and accurate representation of the relevant facts, transaction or business process.
  • Integrity:  it is complete, unaltered and preserves context and chain of custody.
  • Usability:  it is accessible, and can be located, retrieved, presented and interpreted.

Trusted information must also be governed and lifecycle-managed from trusted environments such as repositories of record.  If the environment itself can’t be trusted, then neither can the information.

I’ll be going into more detail in the coming weeks, starting with storing information in trusted repositories, but in the mean time, do you agree with the above or have a different definition of trusted information?

Impact of the Cloud on ECM … a New Perspective

I spent a fair abount of time with one of our major Enterprise Content Management (ECM) customers this week on “the cloud”.  Not a shocker … but this topic has alot of hype associated with it.  However, this customer has a clear vision for using “the cloud” and sees it as a mechanism to elminate many of their internal barriers to broader ECM adoption (and greater ROI).  They plan to move most of their ECM projects into the cloud as soon as they possibly can.  What?  Why?  How?

Some background … this customer has invested substantially in ECM technologies over the years and has a variety of vendors deployed.  IBM including FileNet, Documentum, Mobius and a few others.  They have grown by acquisition, and like many acquisitive companies, have a less then ideal spaghetti-like IT infrastructure with too many overlapping ECM vendors and solutions.  They are doing everything from imaging, workflow automation, enterprise report magement, document management to records management.  Most notably, they have a top down mandate to take significant costs out a large number of business processess and to go paperless.  Due to their acquisition pace and a number of other factors, this customer has been unable to mature their ECM practice and infrastrucuture to the point of offering a shared services platform, standardized provisioning and packages, internal billing, etc … in short everthing is a one-off.

Enter the cloud … this customer is basically at a fork in the road … they can figure out a shared services model, provisioning, billing and a whole host of things themselves (sounds like fun) not to mention the capex obstacles they are facing … or … leverage prepackaged private cloud services with all that built-in and ready to go.  In their case … the path forward is clear.  The cloud will simplify and acclerate adoption of ECM whle also accelerating the cost reduction and paper elimination requirements they need to achieve in the support of the mandate.

What do you think … are they a trend setter and an indicator of things to come or an anomaly?