For those of you who aren’t familiar with the term, a sitemap is a sort of list of all the pages composing your site, that shows the hierarchy and flow of your website.
Try this; google the following: site: <your_site_url_here>
The results displayed are the landing pages that the search engine finds, and therefore the pages that any person that searches for your site will see. This is an important aspect to be considered when you are building an image for your company.
Google will index your site using crawlers; you can trace the path for those “little spiders”, give each page a priority, and define how often they should check for updates on certain entrances.
For further reading on sitemaps:
If your site isn’t very complex, you can use online tools, or any text editor to create or update your sitemap.xml. It is basically an xml file which uses certain tags.
I’ve come across the issue that the site that I was working on (a rails based project) had a lot of dynamic content. Any admin can add entire pages to the site using CMS; so as I couldn’t predict what was going to be added, I needed some kind of tool that checked the db and, using the obtained data, generated the sitemap.
I found just the right tool for the job, a gem by kjvarga called sitemap_generator (https://github.com/kjvarga/sitemap_generator). It turns out to be a real time saver; you will spend some time configuring it but that will be all.
First of all I added the gem to the project gemfile and ran bundle install. Next I used a rake task shipped with the gem to generate a template for a configuration file:
This task created a new file in the config folder called sitemap.rb. Here comes the beauty of this gem: in this file you can configure every aspect of the resulting sitemap using ruby syntax to access your project data. From now on we will focus on two particular files config/routes.rb and config/sitemap.rb.
First, let’s take a look at the config/routes.rbfile for the project:
get ‘case-studies/fakeone’ => ‘site#fakeone’, as: :fakeone
get ‘case-studies/faketwo’ => ‘site#faketwo’, as: :faketwo
get ‘../home’ => ‘site#index’
resources :calendar, only: [:index]
Notice how we have some “harcoded” paths like ‘case-studies/fakeone’ but also a resource entrance for case_study model for the dynamically created case studies. We cannot predict the route these latter will have.
Now let’s create a config/sitemap.rb to map our projects routes:
#first the project url as default_host
SitemapGenerator::Sitemap.default_host = “http://www.mysite.com”
#the home path comes first, notice priority flag here
add home_path, :priority => 1
#the hardcoded case studies
#dealing with the dynamic aspect
CaseStudy.find_each do |case_study|
#we are using :updated_at to set lastmod flag
:lastmod => case_study.updated_at
Now we are all set. We have created asitemap.rb file using our routes.rb; we can now execute the following rake task:
That will produce a gzipped file in our project’s public folder: public/sitemap.xml.gz, with our sitemap inside:
<?xml version=”1.0″ encoding=”UTF-8″?>
<urlset xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd” xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″ xmlns:image=”http://www.google.com/schemas/sitemap-image/1.1″ xmlns:video=”http://www.google.com/schemas/sitemap-video/1.1″ xmlns:geo=”http://www.google.com/geo/schemas/sitemap/1.0″ xmlns:news=”http://www.google.com/schemas/sitemap-news/0.9″ xmlns:mobile=http://www.google.com/schemas/sitemap-mobile/1.0
Each <url> block is self explanatory, but notice the last two (“this-is-case-one”, “this-is-case-two”) those are case studies that were created by the admin, we just told the gem to iterate case study model registries and add them to the resulting sitemap.xml.
Now we are ready to deploy our project, and ask google to be kind enough to take our brand new sitemap into account; simply by going to google.com/addurl and filling out the form.
With this basic implementation, you are good to go on a standard project. Of course later on you can tweak it to better meet your needs, a good one would be adding the rake:refresh task to your deploy chain, so you generate an updated sitemap.xml every time you deploy.