A Web Diet: Converting WordPress Sites Over to Static Sites

Over the years, my main course web project, PR Pubs, has became one sprawling beast. For the most part, people know prpubs.us as the homepage for the course, but I haven’t actively used that space for a few semesters. Thus, in May I made it one of my summer goals to rework prpubs.us in such a way that both narrates and preserves the history of the course and the space. The story of Pubs is an epic one with many twists and turns. Once upon a time, it started as a blog feed, morphed into a full open course, vacationed for a summer on the Jekyll CMS, and is now more integrated with Canvas, our LMS. Nothing really captures this story well and for good reason: I’ve tried counting and I believe it’s existed in eight separate places since 2014. In fact, out of all the spaces, my own personal blog is probably the best representation of the evolution:

I got interested in archiving a bit more while visiting Middlebury College last Fall where they’ve started a project out of their library to preserve student web work at the request of students. I should also mention that Kin Lane has been a major inspiration in helping me see the benefit of static sites. The point being that I’ve known good and well that no CMS is in for the long term. I’m a data pack rat so I’m always thinking about the long term.

At the heart of every course site has been the blog feed powered by the FeedWordPress plugin. Students are writing between 250-500 total blog posts per class per semester. I’ve systematized the process of preparing for the next batch of PR Pubsters. Every semester, I clone a clean version of my syndication hub which is already preloaded with theme, plugins, and custom code that I need to make it work. Over the past couple years, I’ve probably done this a dozen or so times across various courses and thus end up with a ton of WordPress instances.

Eventually, the semester ends and these 250-500mb spaces of content become dormant. There are tasks that I’ve done in the past to close a course site which basically involves unsubscribing to student feeds. But recently I’ve decided that for better preservation purposes, I would rather have a fully static HTML version of each course site. In a lot of ways, it feels like I’m putting it sites on a diet. “Why consume all of those data-dense databases?! Stick your macronutrients: HTML, CSS, and JS! Get rid of your addiction to Cigawordpress!”

What are the upsides to doing this?

  1. You know no longer need WordPress or any other CMS to be the engine of the site. The biggest benefit is that you are less vulnerable to becoming infected through an out-of-date theme or plugin. If you aren’t actively updating the site, you are making yourself susceptible to a lot of mean people on the web.
  2. You can host it on any type of web server.
  3. You can even just keep it locally on your computer and access it via your web browser.
  4. Because of it’s portability, it’s much easier to share a static site as an open education resource (OER). You could even host them on Github allowing people to create forks of the site if they so choose.

Jim Groom turned me on to a tool called SiteSucker a few months back because that guy is always thinking a step ahead of me… SiteSucker does exactly what I laid out earlier. And Jim lays out a strong argument:

I don’t pay for that many applications, but this is one that was very much worth the $5 for me. I can see more than a few uses for my own sites, not to mention the many others I help support. And to reinforce that point, right after I finished sucking this site, a faculty member submitted a support ticket asking the best way to archive a specific moment of a site so that they could compare it with future iterations. One option is cloning a site in Installatron on Reclaim Hosting, but that requires a dynamic database for a static copy, why not just suck that site? And while cloning a site using Installatron is cheaper and easier given it’s built into Reclaim offerings, it’s not all that sustainable for us or them. All those database driven sites need to be updated, maintained, and protected from hackers and spam.

Side note: Isn’t it always a let down when you are trying to write a blog post and you realize that someone has already made your argument and in a much more succinct fashion I might add? That Groom! But, nevertheless, I’ll continue on in hopes of imparting a little bit more wisdom…

Sitesucker grabs your site contents and converts it into HTML, CSS, and JS. You can also set how many links deep you want to pull content. For me, I wanted to grab all my students blog posts, but I didn’t necessarily want the links they were referencing in their blog posts, so I went three levels deep (front page, pages, blog posts).

What are the downsides?

  1. Because it is a static site, it can no longer make dynamic calls. Dynamic calls are when pieces of the web resource are being constructed when the URL is first called. This includes comments, searches, and other organization features like categories and tags that are native to WordPress. Now SiteSucker will generate a copy of these dynamic calls and turn them into static, but after that they will cease to function. None of the content disappears but it can’t be regenerated, so no new comments. This isn’t a big deal for me considering the sites are completely dormant, but it does sting a bit to lose search functionality.
  2. You need to understand basic HTML and CSS to make any significant edits to the site after it’s in it’s static state. Remember, you longer have access to the nifty WordPress WYSIWIG editor. This is where the OER argument gets tricky. Yes, it’s more portable, but potentially less editable depending on the user’s knowledge.

John Stewart was kind enough to test it for me with prpubs.us and it worked like a charm. I then went and grabbed static versions of the other course sites followed by hitting that scary “delete” button in Installatron which made the WordPress instances go away.

Last, I redesigned the prpubs.us front page to better tell the historical narrative of the course. There you can find images of past versions, full information on the technologies that powered each, and links to the archived versions.

Hopefully this is a much more helpful resource for visitors and student alike. Either way, I feel like the state of the health PR Pubs is at an all-time high. Here’s to surviving.