Content Pruning for SEO: Less is More

As I’m attempting to take my blog and writing more seriously, I decided it was time to do some content pruning for SEO purposes. Going into this, I knew that the SEO benefits of content pruning, like much of the dark arts of SEO, are anecdotal at best. Google once said it was a good thing, and more recently made a statement against it after CNet did a major round of content pruning.

With nearly 900 posts migrated to WordPress, the desire to prune content wasn’t purely for SEO purposes. I’m working towards improving my old content, and it would take me a lifetime to go through and spruce up that many posts. Reducing the overall footprint of my blog would help eliminate the anxiety of not knowing where to even begin my editing efforts.

Gathering Data for Content Pruning

To work through this mountain of content, I had to start somewhere. I’m a big fan of making educated decisions based on the data I have available. The data I had available to me was Google Analytics and the Google Search Console.

I opted to use Google Search Console exclusively, primarily because Google Analytics for my site was spread across two properties. One property was for the old Universal Analytics, and the other one for Google Analytics 4.

Within Google Search Console, I went to the Performance on Search results page. This reports gives you insight to what people are searching for and clicking on to get your site from Google. I started by looking at the last 6 months data to help make some decisions around content pruning and SEO.

I went ahead and exported this report to Google Sheets, so I could do my only manipulation of the data, track some notes, and such.

Identifying Zero-Impression Content

I actually started by trying to identify low performing content, but then I had a realization. What about content that’s so poor that it doesn’t even show up in search at all? Thus, it wouldn’t even be on the spreadsheet of data I was working with.

Because the posts weren’t in the spreadsheet at all, there wasn’t an easy way to identify which content needed pruning for SEO. To help things along, ChatGPT and I wrote a small Node.js script to parse a CSV export of the Pages sheet and the XML export from WordPress:

import fs from 'fs';
import xml2js from 'xml2js';

const parser = new xml2js.Parser();

const pagesData = fs.readFileSync('pages.csv', 'utf8');
const lines = pagesData.split('\n');
const pages = {};

lines.forEach((line) => {
  const columns = line.split(',');
  
  if (columns.length) {
    pages[columns[0]] = true;
  }
});

const postsData = fs.readFileSync('posts.xml');
let postsWithoutImpressions = 0;

parser.parseString(postsData, (err, result) => {
  if (err) {
    console.error('Error parsing the XML:', err);
    return;
  }

  const posts = result.rss.channel[0].item;

  posts.forEach((post) => {
    const postUrl = post.link[0];

    if (!pages[postUrl]) {
      console.log(postUrl);
      postsWithoutImpressions += 1;
    }
  });
});

console.log('Posts without impressions:', postsWithoutImpressions);
JavaScript

Pareto Principle in Action

Turns out, I had quite a bit of trash to take out. Roughly 20% of my blog posts had received zero impressions on Google Search in the last 6 months. The content either had little to no interest to the world, or the posts weren’t even indexed at all. The content I was pruning mostly fell into two groups.

The first group were posts that were just old. So old the topic of discussion probably wasn’t even around anymore. Stuff like the 5.x version of PHP. Ubuntu releases that were in the single digits. One post was about issues with Firefox 3.0 on OS X. Non-evergreen content that aged out.

The other group was mostly posts that were in the “Personal” category of my blog. These posts were usually rants and op-ed pieces. Part of me was sad that these were the posts that weren’t doing well, due to my personal attachment to some of the topics. Another part of me, the part that did the content pruning, was cringing a ton.

Second-hand embarrassment towards your younger self is evidently a thing.

Analyzing Low Performers: Content Pruning for SEO Efficiency

Having already pruned a substantial amount of content, I was feeling pretty good about things. The band-aid was ripped off, and the sting of removing content that I had once held so dear, was waning.

Before diving into the next round of pruning, I decided to fetch more recent data from Google Search Console. I tightened the data range that I was working with from six months down to three months.

Then, I messed around with trying to come up with some magical scoring system. That way I could grade the remaining posts to figure out which low performing content needed to be pruned.

Nevertheless, my scoring system didn’t reveal anything new that the data I had available was able to tell me. The data Google Search Console provides are the clicks, impressions, click-through rate (CTR), and position. Naturally, content with more impressions and a high search position usually received more clicks and had a higher CTR.

To identify low performing content, I relied heavily on the number of impressions, followed by the number of clicks. Content with low impression and low to clicks were now eligible for pruning.

Deciding Which Content to Prune

This particular round of content pruning didn’t require any scripts, just sorting data and analyzing what I had in front of me. There were a few outlier posts that had high impressions and a low number of clicks, which I’ve retained to attempt to improve in the future.

Not for the faint of heart, I ended up pruning another 40% or so of all posts. The themes of the content pruned on this round were posts on the more esoteric command-line topics and posts that were extremely thin.

Of the thin posts, I didn’t go as scorched earth as you may think. I went over the topics and made some decisions around if the content could be expanded or not. Some posts can only be fluffed so far.

Removing Redundancies: Streamlining SEO Through Content Pruning

With over half of my blog content pruned, I was seriously considering just deleting the rest. Starting my blog over from scratch wouldn’t negatively affect SEO, right?

While pruning low performing content, a new commonality was beginning to emerge from the remaining content. I had a lot of duplicate content that appeared to be cannibalizing itself. Running monthly posts like my VPS Showdown series, as well as my regular posts on installing Node.js, contributed pretty heavily to these content redundancies.

While I left some posts alone, I felt the monthly VPS Showdown posts should all be pruned, save the latest one from May of 2022.

This may seem like an extremely aggressive move, the data actually backed up this decision quite yet. Some of those posts, particularly the older ones, were already pruned for no to low impressions. The post with the most traffic was the latest one, and all of the older posts all had a consistent drop off.

This exercise actually had inspired me to restart my VPS Showdown series. Instead of individual monthly posts, I’m planning to do one post, and keep it updated. More to come on that front.

With the final round of content pruning complete, I was down to around 333 posts, down from nearly 900 posts. Banishing ~63% of my posts had me feeling like Matt Hardy.

Matt Hardy - Delete! Delete! Delete!

I’ve mentioned deleting content throughout this post. When I say “deleting” I actually mean deleting the post and sending it to the trash in WordPress.

I did explore a few other options, primarily as a way to maintain the posterity of the years of content I had amassed. Instead of deleting posts, I could have simply marked the posts as drafts or marked the posts as private.

Both options would have accomplished the removal of content, but I would have had to setup a direct manually for each post. With this volume of data, that would have been a nightmare. Meanwhile, the Yoast SEO plugin’s premium option makes redirects trivial, but only when you delete a post.

Because of this, deleting posts made the most sense. I trash the post, and Yoast SEO would ask me if I wanted to serve up a 410 Gone status code, or redirect to another URL.

Not every post had a viable post to redirect to, so many posts, especially the lower quality posts, I would serve up a 410 HTTP status code. For things like my monthly VPS Showdown posts, it made more sense to redirect to the latest one.

Afterthoughts on Pruning Content for SEO

Breaking up with my content was both a terrifying and exciting experience. Obviously, I’m scared that I’ve done more harm than good. At the same time, clearing out the noise and the cringe-worthy content has helped me refocus.

I would never say that writing a blog post every week for a decade was a bad decision. Generating that amount of content has definitely improved my writing, which was the goal I set out to do.

Throwing that many darts has allowed me to figure out what content works and what content falls flat. Pruning the content that doesn’t fall into those buckets streamlines everything, and makes it easier for me to focus on the next phase, improving the remaining content that wasn’t pruned.

Ultimately, I couldn’t live with the decision

While I tried to stay strong, after a few weeks I gave in and restored the content I had removed. At the end of the day, this blog is my personal account of things over the years. Typos, duplicate topics, attempts at being clever, the whole lot of it. I leave you with this quote:

These are not scraps. These are the historic remains of a once great society of hair.

George Costanza
Josh Sherman - The Man, The Myth, The Avatar

About Josh

Husband. Father. Pug dad. Musician. Founder of Holiday API, Head of Engineering and Emoji Specialist at Mailshake, and author of the best damn Lorem Ipsum Library for PHP.


If you found this article helpful, please consider buying me a coffee.