Cloud-Hosted Sitemap

Sitemap S3 CloudFont heroku

Check it out!
Sitemap generator

Bypass Heroku's write-access limitations by hosting an automatically generated sitemap using Amazon CloudFront+S3.

Hosting An Automatically-Generated Sitemap Off-Site

Heroku Apps impose a read-only limitation on hosted applications, for the most part. While you can write files as part of a singular process they are necessarily temporary. I wanted to be able to automatically generate an XML sitemap, notify major search engines of it, and have it available within the URL structure of logangraba.com.

Sitemap_Generator

The gem I chose to use is Sitemap_Generator: "a framework-agnostic XML Sitemap generator written in Ruby with automatic Rails integration..." On its own it is simple enough to install and utilize. If I were to have realtime write-access to my site's hosted filesystem, it would be as simple as running a rake sitemap:create

I added it to my gemfile and quickly laid out the necessary links for this site in sitemap.rb:

# Set the host name for URL creation
SitemapGenerator::Sitemap.default_host = "http://www.logangraba.com"
# The link structures
SitemapGenerator::Sitemap.create do
  add '/projects', 'changefreq': 'weekly'
  Project.find_each do |project|
    add project_path(project), lastmod: project.updated_at
  end
  add '/resume', 'changefreq': 'weekly'
  add '/contacts/new', 'changefreq': 'weekly'
  add '/about', 'changefreq': 'weekly'
end

I tested the sitemap generation locally and was pleased.

However, due to the aforementioned limitations of Heroku Apps I needed to find another solution for hosting my sitemap. This also comes with a secondary problem: major search engine webmaster tools like Google Webmaster Tools only accept sitemaps that are hosted within the domain name of the account's site. For me, this meant I had to make the generated sitemap.xml file available at logangraba.com/sitemap.xml.

This leads me to the next gem I used...

Fog

Fog is a Ruby cloud services library. Since I already had an Amazon S3 account and bucket running for image and resume uploads for this site via the Paperclip gem, I decided that I could upload the generated sitemap to a directory within my S3 bucket. Fog is a large library that contains the necessary functionality to do this. Note that Fog-AWS is the module for Fog to support Amazon Web Services.

Initially I tried following these instructions for "Using Fog." I configured all the environmental variables on both my development machine and Heroku:

Unfortunately Fog alone wouldn't allow me to upload the sitemap.xml.gz file to my S3 bucket. I'm still not sure why it didn't work and would be interested in any answers as to why. I even made sure my bucket was publicly available with the following JSON bucket policy:

{
    "Version": "2012-10-17",
    "Id": "Policy1444202223626",
    "Statement": [
        {
            "Sid": "Stmt1444202222010",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::logangraba/sitemaps/*"
        }
    ]
}

So I continued on and installed the CarrierWave gem.

CarrierWave

CarrierWave is the "classier" solution to file uploading (according to their project description). It really was easy, though. Here is my final sitemap.rb file:

# config/sitemap.rb

# Set the host name for URL creation
SitemapGenerator::Sitemap.default_host = "http://www.logangraba.com"
# Inform the map cross-linking where to find the other maps
SitemapGenerator::Sitemap.sitemaps_host = "http://#{ENV['FOG_DIRECTORY']}.s3.amazonaws.com/"
# pick a place safe to write the files
SitemapGenerator::Sitemap.public_path = 'tmp/'
SitemapGenerator::Sitemap.sitemaps_path = 'sitemaps/'
SitemapGenerator::Sitemap.adapter = SitemapGenerator::WaveAdapter.new

SitemapGenerator::Sitemap.create do
  add '/projects', 'changefreq': 'weekly'
  Project.find_each do |project|
    add project_path(project), lastmod: project.updated_at
  end
  add '/resume', 'changefreq': 'weekly'
  add '/contacts/new', 'changefreq': 'weekly'
  add '/about', 'changefreq': 'weekly'
end

I then created an initializer for CarrierWave:

# config/initializers/carrierwave.rb
CarrierWave.configure do |config|
  config.cache_dir = "#{Rails.root}/tmp/"
  config.storage = :fog
  config.permissions = 0666
  config.fog_credentials = {
    :provider               => ENV['FOG_PROVIDER'],
    :aws_access_key_id      => ENV['AWS_ACCESS_KEY_ID'],
    :aws_secret_access_key  => ENV['AWS_SECRET_ACCESS_KEY'],
  }
  config.fog_directory  = ENV['S3_BUCKET_NAME']
end

Note the usage of environmental variables - These are the ones I set on both my local machine and Heroku. The S3 bucket name, AWS access key ID, and secret access key are provided to you upon creation of an AWS S3 bucket. The FOG_PROVIDER is simply "AWS" for Amazon Web Services.

I was now able to run rake sitemap:refresh and get my temporarily-held sitemap.xml.gz file from Heroku to my S3 bucket!

Routing

All that was left to do was some simple routing. I created a sitemap controller with rails g controller sitemap and wrote a show method to redirect HTTP requests to the sitemap location within my S3 bucket:

class SitemapController < ApplicationController
    def show
        # Redirect to CloudFront + S3 for the sitemap
        redirect_to "http://logangraba.s3.amazonaws.com/sitemaps/sitemap.xml.gz"
    end
end

I added a matching route get '/sitemap.xml.gz' => 'sitemap#show' in my routes.rb file and then tested everything on my local machine. It seems like everything works for now and my sitemap has been verified by Google Webmaster Tools. Follow the "Check it out!" link above to see the redirect route in action - you'll automatically download the sitemap.xml.gz file residing on my S3 bucket.

Why is this useful?

While I won't express the importance of having a sitemap for modern search capability, I do want to note this: Heroku is fast, reproducible, and comes with great (CLI) tools. Having a way to separate functional code and the rest of your files is a sound practice that is useful for continuous development, testing, and deployment on Heroku and other cloud services. It can be an integral part of your workflow and ultimate asset pipeline with ease. For example, I will be able to automatically produce and upload an updated sitemap each time I post a new project or resume now!

Projects Check out some of my latest projects.
PHP, Ruby on Rails, Python, Painting, and Go