Dealing With Your Sitemap

Ale Fernandes Antunes

Developer for Neon Roots

For those of you who aren’t familiar with the term, a sitemap is a sort of list of all the pages composing your site, that shows the hierarchy and flow of your website.

Try this; google the following:  site: <your_site_url_here>

The results displayed are the landing pages that the search engine finds, and therefore the pages that any person that searches for your site will see. This is an important aspect to be considered when you are building an image for your company.

Google will index your site using crawlers; you can trace the path for those “little spiders”, give each page a priority, and define how often they should check for updates on certain entrances.

For further reading on sitemaps:
https://support.google.com/webmasters/answer/183668?hl=en

If your site isn’t very complex, you can use online tools, or any text editor to create or update your sitemap.xml. It is basically an xml file which uses certain tags.

I’ve come across the issue that the site that I was working on (a rails based project) had a lot of dynamic content. Any admin can add entire pages to the site using CMS; so as I couldn’t predict what was going to be added, I needed some kind of tool that checked the db and, using the obtained data, generated the sitemap.

I found just the right tool for the job, a gem by kjvarga called sitemap_generator (https://github.com/kjvarga/sitemap_generator). It turns out to be a real time saver; you will spend some time configuring it but that will be all.

How I chose to implement the sitemap_generator gem

First of all I added the gem to the project gemfile and ran bundle install. Next I used a rake task shipped with the gem to generate a template for a configuration file:

rake sitemap:install

This task created a new file in the config folder called sitemap.rb. Here comes the beauty of this gem: in this file you can configure every aspect of the resulting sitemap using ruby syntax to access your project data. From now on we will focus on two particular files config/routes.rb and config/sitemap.rb.

First, let’s take a look at the config/routes.rbfile for the project:

Railsroot::Application.routes.draw do
root ‘site#index’
get ‘case-studies/fakeone’ => ‘site#fakeone’, as: :fakeone
get ‘case-studies/faketwo’ => ‘site#faketwo’, as: :faketwo
get ‘../home’ => ‘site#index’
resources :case_study
resources :calendar, only: [:index]
end

Notice how we have some “harcoded” paths like ‘case-studies/fakeone’ but also a resource entrance for case_study model for the dynamically created case studies. We cannot predict the route these latter will have.

Now let’s create a config/sitemap.rb to map our projects routes:

#first the project url as default_host
SitemapGenerator::Sitemap.default_host = “http://www.mysite.com”

SitemapGenerator::Sitemap.create do
#the home path comes first, notice priority flag here
add home_path, :priority => 1

#other resources
add :calendar
#the hardcoded case studies
add :fakeone
add :faketwo

#dealing with the dynamic aspect
CaseStudy.find_each do |case_study|
add case_study_path(case_study),
#we are using :updated_at to set lastmod flag
:lastmod => case_study.updated_at
end

Now we are all set. We have created asitemap.rb file using our routes.rb; we can now execute the following rake task:

rake sitemap:refresh:no_ping

That will produce a gzipped file in our project’s public folder: public/sitemap.xml.gz, with our sitemap inside:

<?xml version=”1.0″ encoding=”UTF-8″?>
<urlset xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd” xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″ xmlns:image=”http://www.google.com/schemas/sitemap-image/1.1″ xmlns:video=”http://www.google.com/schemas/sitemap-video/1.1″ xmlns:geo=”http://www.google.com/geo/schemas/sitemap/1.0″ xmlns:news=”http://www.google.com/schemas/sitemap-news/0.9″ xmlns:mobile=http://www.google.com/schemas/sitemap-mobile/1.0
xmlns:pagemap=”http://www.google.com/schemas/sitemap-pagemap/1.0″ xmlns:xhtml=”http://www.w3.org/1999/xhtml”>
<url>
<loc>http://www.example.com</loc>
<lastmod>2015-01-03T10:02:03-03:00</lastmod>
<changefreq>always</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>http://www.example.com/home</loc>
<lastmod>2015-01-03T10:02:03-03:00</lastmod>
<changefreq>weekly</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>http://www.example.com/case-studies/fakeone</loc>
<lastmod>2015-03-07T13:12:03-03:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>http://www.example.com/case-studies/faketwo</loc>
<lastmod>2015-03-07T14:02:03-03:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>http://www.example.com/calendar</loc>
<lastmod>2015-01-07T12:02:03-03:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>http://www.example.com/case_study/this-is-case-one</loc>
<lastmod>2015-08-17T11:02:03-03:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority><
/url>
<url>
<loc>http://www.example.com/case_study/this-is-case-two</loc>
<lastmod>2015-08-07T13:02:03-03:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>
</url>
</urlset>

Each <url> block is self explanatory, but notice the last two (“this-is-case-one”, “this-is-case-two”) those are case studies that were created by the admin, we just told the gem to iterate case study model registries and add them to the resulting sitemap.xml.

Now we are ready to deploy our project, and ask google to be kind enough to take our brand new sitemap into account; simply by going to google.com/addurl and filling out the form.

Final thoughts

With this basic implementation, you are good to go on a standard project. Of course later on you can tweak it to better meet your needs, a good one would be adding the rake:refresh task to your deploy chain, so you generate an updated sitemap.xml every time you deploy.

Sorry, no posts matched your criteria.