In recent weeks I ran across several instances of “Content Debt”. What is content debt? Like technical debt, content debt slows our content management systems down and impacts our end users in the event we do not follow common best practices and not constantly monitoring for outdated content on our web sites. The most recent example of content debt I encountered was actually from the STPCon Web Site (I love STPCon – but unfortunately they ran into a very classic issue). www.stpcon.com is hosted on WordPress which makes a lot of sense for these content rich websites. Unfortunately, their landing page is suffering from Content Debt – check out the following screenshot taken with Chrome Dev Tools (Hit F12 in your browser) and watch the size information for the individual elements:

The really nice looking background image is 8.5MB large. That’s one example of “Content Debt”
The really nice looking background image is 8.5MB large. That’s one example of “Content Debt”

I call it Content Debt because it is the debt that content creators (marketing departments, others) pile up when adding more content onto their systems without understanding the consequences of their actions. These users typically don’t have — and shouldn’t need — insights into Web Performance Best Practices which have been around for years. It should be the systems – such as WordPress – that have built-in sanity checks to prevent these things from happening.

In addition to uploading high-resolution images, other examples include:

  • Broken links and missing content that frustrates users and impacts your SEO when hit by Search Bots
  • Content that nobody is interested in -> Identify this content and delete it
  • Content that nobody finds -> update your content based on most common search terms
  • Outdated content that is not useful -> update or remove content that is no longer valid

Let me give you some tips and tricks on how to pro-actively address content debt problems:

#1: Oversized Content

Whatever Content Management System you use – there are typically plugins that crop/compress/thumbnail image uploads. So if you are an administrator of such a system check the marketplace of your CMS. This eliminates mistakes as highlighted above.

In addition to image file size, you should also think about the option of restricting upload sizes or at least find a plugin that warns the user before uploading attachments of any kind (Word documents, PowerPoint presentations, etc) that are too large.

For existing systems I suggest you start monitoring your Web Server logs and keep an eye on the Content-Length header. Conduct regular sanity checks for resource downloads that exceed a certain threshold. If you happen to use Dynatrace App Monitoring you can either look at the Web Requests Dashlet sorted by Avg Bytes Sent or setup a Business Transaction with a filter on the Web Request Bytes Sent Measure using a Threshold of e.g: 5MB. You can then set up an Incident to get notified when new large content is getting downloaded:

Setup some monitoring to identify large resources downloaded by your end users. Dynatrace provides different options
Setup some monitoring to identify large resources downloaded by your end users. Dynatrace provides different options

 #2: Broken Links & Missing Content

Keeping links current is challenging, particularly when, in the past, you distributed links to pages that are no longer valid. The following is an example from an online auction platform that struggles with Google still trying to index pages and images that no longer exist. They asked me to examine the data they shared through my “Share Your PurePath Program” and here is what we found:

Watch out for HTTP 4xx requests and also check where they are coming from. Maybe from outdated links in other sites, broken links on your site or from bots.
Watch out for HTTP 4xx requests and also check where they are coming from. Maybe from outdated links in other sites, broken links on your site or from bots.

Broken links have multiple negative implications content debt implications:

  • They frustrate your real end users as they don’t get the content they want
  • They will impact your SEO ranking if the search bots can no longer find your content
  • Bad requests often cost more from a resource perspective than successful requests because they often cause extra logging on your servers or special error handling that causes higher CPU usage

#3: Outdated or Unused Content

If you have a content management system and you just keep adding more content but never remove old or irrelevant content you are just consuming more storage space (of the live content + backups) than necessary. These extra costs will keep growing over time. The more irrelevant content the slower other areas of your system will be.  In other words, your search engines need to crawl through more content, generating larger indices even though most of the content is irrelevant.

Here are two things I recommend you do to manage content debt:

  1. Monitor access to content. Use web analytics tools and categorize your content into: HOT (Top X Pages), RELEVANT and TOBEREMOVED (those that are only sporadically accessed or not accessed at all). Disable access to the TOBEREMOVED category and wait whether anybody is complaining or noticing it. If not -> remove them for good.
  2. Analyze Search Queries: Web analytics can also help you determine what people are searching for and which content they find. It takes a bit of  effort to then follow the click paths of these users to see whether they are happy with that content or whether they keep searching. Offering a “This was useful for me” option on your content pages makes this task easier.

Here are some screenshots from Dynatrace User Experience Management showing how individual users navigate through the site, as well as identifying which pages are frequently accessed:

Understanding what users search for and what content they really want allows you to optimize your offering/content.
Understanding what users search for and what content they really want allows you to optimize your offering/content.

Of course it doesn’t make sense to conduct this exercise for every single user on the system – that would be too tedious. That’s why we can aggregate the data and find common click streams. What’s interesting is to pick out individual “power/outlier users” and try to understand what they trying to achieve. With Dynatrace you can even capture the user’s name (if they must log in) which allows you to pick up the phone or send them an email to discuss their experience further.

We also use this capability internally to determine which content on our Confluence Installation — hosting our community portal — is still useful or should soon be removed!

Agree with me? Is Content Debt a problem?

I would be interested in hearing from you how you deal with the issue I describe as content debt. Any stories supporting or refuting my position are very welcome. Maybe you want to make the point that storage cost is so cheap and we shouldn’t worry about outdated content. Perhaps you think your search engines are smart enough that they can find the relevant content. Or, your content creators and editors just always produce the right content that never is out of date!  🙂 Let me know.