A few months ago, a client called us about a content audit. For a site with hundreds of millions of pages. That’s 100,000,000+ pages. Yep, those zeros are correct.
Now, old-timey (2007) content strategy logic says you need to audit all of your content. You need to see it all with your own human eyes. And if you have a small site (less than 5,000 pages or pieces of content), you probably still should review every piece.
But manual audits get more and more unrealistic as sites get bigger. Who has the time to review 25,000, 100,000, or 1,000,000 pieces of content? For our client with 100,000,000 pages, it would take 20 people working full-time for 270 years to manually view all of those pages. Obviously, that’s not an option. Unless you could hire an army of robots. Speaking of …
Can’t robots just do it?
It depends on what kind of audit you’re performing. If you are doing a quantitative audit—simply finding out how much content you have, where it lives, and associated keywords; yes, there are technical tools that can help.
But, if you’re doing a qualitative audit—where you’re trying to learn about the substance, accuracy, and quality of your content, robots can’t help you out. Well, maybe if you had a robot like C-3PO (fluent in six million forms of communication), or this guy:
Image by Sean Tubridy. All rights reserved.
But you don’t. So, what else can you do?
You can pick a sample—a subset of your content—to review. Although a sample doesn’t replace a total site audit, it does help you reduce uncertainty about your content. Scientific and marketing researchers have been doing sampling for years, and when done correctly, sampling can give you a fairly accurate indication of your overall content situation.
You can choose your sample randomly or base it on various factors, such as user segments, product categories, content purpose, location on the site, etc. It all depends on why you’re doing the audit and what you want to learn.
How much is enough?
There’s no rule or benchmark to use. It would seem like the more content you could review, the better off you’d be. That is somewhat true, but mostly you just have to look at enough content to see patterns emerge. On a relatively small site (i.e., 10,000 pieces of content), you might need to look at half of the content before the patterns become obvious.
On a million-page site—you might look at only 0.01% of the content. That’s still 10,000 pages … so you’re not exactly off the hook. But hopefully you’d recognize some kind of valuable patterns by then. You might not have the same level of certainty as you did with the smaller site audit, but you’ll have some ideas. And you probably aren’t going to learn anything else by auditing another 1,000 or 10,000 items—the percentage of items reviewed is still so low that the change in the margin of error is microscopic.
So where do I begin?
Your sample depends on the size of your site. Here’s a rough table of suggested sample sizes (adapted from market research sampling guidelines):
|Total number of pages/pieces
Sampling doesn’t lead to a perfect picture of your content. But a sample audit can provide useful information to support arguments for funding, make the case for content work, or demonstrate progress.