Mike Slinn
Mike Slinn

HTTP 301 Redirects with Jekyll and AWS S3

Published 2022-05-01. Last modified 2022-05-03.
Time to read: 5 minutes.

This site is categorized under AWS, Azure, Jekyll.

I just reorganized this website. One of the major changes was to consolidate all the Jekyll-related posts into a dedicated collection.

Transferring the link equity from the old URLs to the new URLs was important to me. For this, I needed to generate HTTP 301 redirects from the old URLs to the new URLs.

The jekyll-redirect-from plugin does a good job of handling redirects, without visibly cluttering up the generated site. However, it does not generate HTTP 301 redirects, and this is essential to transfer link equity from old URLs to new URLs. For more information, ARCLAB, whom I have no affiliation with, has a good explanation.

As far as I know, there is no way for a site hosted on GitHub to generate HTTP 301 redirects for individual pages, so when GitHub created Jekyll they were not motivated to provide a feature that they did not intend to support. Websites are either evolving or dying. This website has almost 300 pages; I move some around every week. My requirements exceed what the jekyll-redirect-from plugin by itself can provide.

Note that GitHub gets the SEO rewards for the sites that they host. If you deploy your Jekyll site to GitHub, and you do not register your own domain, then most of the benefits of SEO will not be available to you. If you deploy to a domain that you control, you can begin to reap the benefits of SEO.

Because the jekyll-redirect-from plugin uses http meta-refresh, using this plugin without taking special precautions on more than 10% of your website's pages, then Google will degrade your SEO ranking.

Furthermore, Google’s web crawler assumes malicious intent when there is a high percentage of temporary redirects in a website.

Many Platforms Support HTTP 301 Redirects

Not to worry! The jekyll-redirect-from Jekyll plugin generates a file called _site/redirects.json, which is a map of old to new URLs.

Most of the major web hosting platforms offer the ability to generate HTTP 301 redirects, including Microsoft Azure, Google Cloud and AWS.

When the website is deployed live, _site/redirects.json can be used to set metadata for files that moved. Here is mine, formatted by jq:

/var/sitesUbuntu/www.mslinn.com
{
  "/blog/2022/02/12/ruby-setup.html": "http://0.0.0.0:4001/jekyll/500-ruby-setup.html",
  "/blog/2017/01/08/setting-up-github-pages.html": "http://0.0.0.0:4001/jekyll/1000-jekyll-setup.html",
  "/blog/2020/08/16/new-jekyll-post.html": "http://0.0.0.0:4001/jekyll/2100-new-jekyll-post.html",
  "/blog/2021/12/20/publishing-drafts.html": "http://0.0.0.0:4001/jekyll/2200-publishing-drafts.html",
  "/blog/2020/10/03/jekyll-plugins.html": "http://0.0.0.0:4001/jekyll/3000-jekyll-plugins.html",
  "/blog/2020/12/28/custom-logging-in-jekyll-plugins.html": "http://0.0.0.0:4001/jekyll/4100-custom-logging-in-jekyll-plugins.html",
  "/blog/2022/03/06/rubocop-install.html": "http://0.0.0.0:4001/jekyll/5000-rubocop-install.html",
  "/blog/2022/02/13/jekyll-gem.html": "http://0.0.0.0:4001/jekyll/6400-jekyll-gem.html",
  "/blog/2022/02/13/jekyll-gem2.html": "http://0.0.0.0:4001/jekyll/6500-jekyll-gem2.html",
  "/blog/2022/02/21/jekyll-debugging.html": "http://0.0.0.0:4001/jekyll/6600-jekyll-debugging.html",
  "/blog/2022/03/27/jekyll-plugin-background.html": "http://0.0.0.0:4001/jekyll/10100-jekyll-plugin-background.html",
  "/blog/2022/03/28/jekyll-plugin-template-collection.html": "http://0.0.0.0:4001/jekyll/10400-jekyll-plugin-template-collection.html"
}

The AWS S3 Metadata Dance

I need to be able to set redirect metadata on files stored in an S3 bucket. The purpose of the metadata is to cause HTTP 301 redirects to be issued when a web browser requests those files. AWS S3 only allows 50 redirect rules per bucket.

I also need to be able to set metadata on more than 50 files stored in an S3 bucket, so per-object metadata must be applied, instead of writing redirect rules.

For AWS S3, adding a x-amz-website-redirect-location metadata header to an S3 object causes an HTTP 301 redirect to be generated whenever that object is fetched.

The AWS CLI does not directly support simply changing metadata on previously uploaded files. However, the metadata can be added for certain operations when the website-redirect option is used.

When this metadata header is set, AWS S3, and CloudFront will issue HTTP 301 redirects, so the stub pages are never actually served, and the HTTP meta-refresh directives are never seen by Google’s web crawler. The CloudFront origin must be properly specified for this to work.

The AWS CLI only supports setting metadata during a copy; this deficiency of the AWS CLI and SDKs is unfortunate, but the REST interface that underlies all the language bindings apparently does not provide the facility. I have seen several ways of performing copies, including (re)copying a file on top of itself and setting the metadata during the copy.

Here is one such way, which (re)copies old_file.html (a stub) to an S3 bucket at the old location, and sets the desired metadata (a header that causes an HTTP 301 Redirect to the actual file at a new location):

Shell
$ aws s3 cp \
old_file.html \
  s3://bucket_name/old_file.html \
  --website-redirect /new_file.html 

Awkward, but workable. If CloudFront is employed, a cache invalidation will be required before the 301 redirect will actually be sent to the web browser:

Shell
$ aws cloudfront create-invalidation \
  --distribution-id "$AWS_CLOUDFRONT_DIST_ID" \
  --paths "old_file.html"

No problem there. A little patience for the invalidation to happen, and the redirect eventually works.

Things appear to get harder when using the Ruby v3 binding. Seems there is no direct analog for the above awkwardness.

Option 1: Redirect From Empty Page

One viable technique using AWS S3 is to create an empty page with redirect metadata associated with it, using the AWS CLI:

Shell
$ aws s3api cp \
  --key oldpage.html \
  --website-redirect newpage.html

In the above code, a key called oldpage.html is created in the S3 bucket, but no value is provided. This causes an empty page to be created. The --website-redirect option adds a x-amz-website-redirect-location header to the object, which generates the desired HTTP 301 redirect for the empty page to newpage.html.

Option 2: Redirect from Generated Stub Page

The jekyll-redirect-from plugin generates little stub pages for every old URL you specify with the redirect_from Jekyll front matter variable. When uploading those pages to AWS S3, the --website-redirect option as in Option 1 adds a x-amz-website-redirect-location header to the stub page, which of course generates an HTTP 301 redirect. When using CloudFront, the cache entry for the stubbed old URL must be invalidated, or it will seem like nothing changed.

I tested one page (https://www.mslinn.com/blog/2017/01/08/setting-up-github-pages.html) using the AWS CLI before coding the equivalent Ruby program using the AWS Ruby SDK. Some things are horrible to write in Bash, but a joy to write in Ruby, and the logic around JSON handling is one of them. The complete AWS Ruby SDK consists of 49 gems, but I only needed aws-sdk-s3 and aws-sdk-cloudfront. The AWS docs say "Alternatively, the aws-sdk gem contains every available AWS service gem. This gem is very large; it is recommended to use it only as a quick way to migrate from V2 or if you depend on many AWS services.".

Shell
$ aws s3 cp \
  blog/2017/01/08/setting-up-github-pages.html \
  s3://www.mslinn.com/blog/2017/01/08/setting-up-github-pages.html \
  --website-redirect /670nm.html

$ aws cloudfront create-invalidation \
  --distribution-id "$AWS_CLOUDFRONT_DIST_ID" \
  --paths "/blog/2017/01/08/setting-up-github-pages.html"

I chose this option because it was easier for me to manage.

Equivalent Ruby Code Fragment

Bash is not a good choice for programs that need to manipulate collections. Because the page redirects are a collection, I decided to write a Ruby program to implement HTTP 301 redirects instead of a Bash script.

The AWS docs sometimes explain things in ways that are more complicated than necessary, and leave out essential aspects. Following is all that is required to upload a local file (old_path) to a bucket, and redirect requests for that file to another file (new_path), which was previously uploaded. The code assumes that ~/.aws/config exists. Otherwise, the call to Aws::S3::Client.new will raise an exception.

Ruby
def redirect(bucket, old_path, new_path)
  s3 = Aws::S3::Client.new
  s3.put_object({
    body: IO.read(old_path),
    bucket: bucket,
    key: old_path,
    website_redirect_location: new_path,
  })
end

Completed Ruby Redirect Program

Here is the final script, written in Ruby because Bash gets really painful when working with collections. I use this script as part of this website’s deployment process.

#!/usr/bin/env ruby

# frozen_string_literal: true

require 'aws-sdk-cloudfront'
require 'aws-sdk-s3'
require 'colorator'
require 'fileutils'
require 'git'
require 'json'
require 'pathname'
require 'yaml'

# See http://localhost:4001/jekyll/10500-redirects.html
class Redirector
  def initialize
    config = YAML.load_file('_config.yml')
    aws_config = config['aws']
    @distribution_id = aws_config['cloudfront']['distributionId']
    @bucket = aws_config['bucket']

    @s3 = Aws::S3::Client.new
    @cloudfront = Aws::CloudFront::Client.new
  end

  def process
    redirects = File.read("redirects.json")
    json = JSON.parse(redirects)
    old_paths = []
    json.each do |old_path, new_url|
      redirect old_path, new_url
      old_paths << old_path
    end
    invalidate old_paths
  end

  # Useful for testing and one-offs
  def process_one(old_path, new_url)
    redirect(old_path, new_url)
    invalidate [old_path]
  end

  def invalidate(files) # rubocop:disable Metrics/MethodLength
    files = files.map { |file| file.start_with?("/") ? file : "/#{file}" }
    # See https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/CloudFront/Client.html#create_invalidation-instance_method
    puts "Invalidating:\n  #{files.join("\n  ")}"
    @cloudfront.create_invalidation({
      distribution_id: @distribution_id,
      invalidation_batch: {
        paths: {
          quantity: files.length,
          items: files,
        },
        caller_reference: Time.now.to_i.to_s,
      },
    })
  end

  # Upload a local file (old_path) to @bucket, and redirect requests for that file to another file (new_path),
  # which was previously uploaded.
  # See https://stackoverflow.com/questions/23040651/aws-ruby-sdk-core-upload-files-to-s3
  # See https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/S3/Object.html#copy_from-instance_method
  def redirect(old_path, new_url)
    new_path = URI(new_url).path
    puts "Uploading #{old_path}, which should redirect to https://#{@bucket}#{new_path}"
    old_path = ".#{old_path}" if old_path.start_with? '/'
    @s3.put_object({
      body: IO.read(old_path),
      bucket: @bucket,
      key: old_path,
      website_redirect_location: new_path,
    })
  end
end

# Invoke this code this way:
# $ ruby _bin/redirects
if __FILE__ == $PROGRAM_NAME
  begin
    project_root = Pathname.new(__dir__).parent.to_s
    puts "Executing from #{project_root}".cyan
    redirector = Redirector.new

    site_root = Pathname.new "#{project_root}/_site"
    abort "Error: The _site/ directory does not exist." unless site_root.exist?
    Dir.chdir site_root
    redirector.process
    # redirector.process_one "redirect_test.html", "http://mslinn.com/articles/cadcook/index.html"
  rescue SystemExit, Interrupt
    puts "\nTerminated".cyan
  rescue StandardError => e
    puts e.message.red
  end
end