Click here to Skip to main content
15,867,308 members
Articles / Web Development / HTML

Generating sitemap.xml in Jekyll, Without Using Plugins

Rate me:
Please Sign up or sign in to vote.
0.00/5 (No votes)
11 Aug 2015CPOL4 min read 8.9K   2  
Generating sitemap.xml in Jekyll, without using plugins

After my blog was already online for a while, I discovered Google Webmaster Tools and sitemaps while reading about the SEO basics.

According to the link, a sitemap in its simplest form is just an XML file like this, with one <url><loc> element per URL on the site:

XML
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> 
  <url>
    <loc>http://www.example.com/foo.html</loc> 
  </url>
</urlset>

Using Jekyll and depending on the structure of the site, it's relatively easy to create this in a dynamic way, so it's updated automatically when adding new posts or pages to the site.

Note that if you're fine with using plugins, there's a plugin for generating sitemaps that works on GitHub Pages.
I'm not using GitHub Pages, but I still wanted to find a solution without plugins, because so far I managed to achieve everything I tried in Jekyll without plugins.

The <url> Element

Since the <url><loc>...</loc></url> part needs to be repeated for each link, it will go into an include file:

/_includes/sitemapxml.html

HTML
<url>
    <loc>{{ site.url }}{{ include.url }}</loc> 
</url>

site.url refers to the site's config file. For this blog, it contains the following line:

HTML
url: http://christianspecht.de

The links in the sitemap must contain the full URL (http://christianspecht.de/foo instead of just /foo), so all links need to be prefixed with the base URL.

With this include file, it's already possible to create a simple sitemap, by providing URLs manually:

/sitemap.xml

XML
---
layout: none
---

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> 
{% include sitemapxml.html url="/foo/" %}
</urlset>

The generated XML:

HTML
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> 
  <url>
    <loc>http://christianspecht.de/foo/</loc> 
  </url>
</urlset>

Being programmers, we obviously don't want to provide URLs manually, though...so we're going to automate this in the next steps.

Getting All URLs, The Simple Way

The easiest way to get all URLs is to loop site.pages:

XML
---
layout: none
---

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> 
{% for p in site.pages %}{% include sitemapxml.html url=p.url %}
{% endfor %}
</urlset>

Note the line break inside the for loop. There must be exactly one line break after the include, so the line breaks in the resulting XML file looks exactly like in the example in the introduction above.

This approach has one big disadvantage: it only works if you actually want all pages listed in the sitemap.
"All" really means all files which contain YAML front matter...including, for example, the sitemap.xml file itself.

In reality, there are probably some pages besides the sitemap that you don't want to be listed either, so you could explicitly exclude everything in the loop which is named X, Y and Z.

It's possible to do it like this, but most of my Jekyll sites have "special" menus which rely on the important URLs being in a data file anyway, so I'll show two advanced approaches how to make use of this.

When Most of the URLs are Blog Posts or in Data Files

This is what I used for the sitemap of this blog.

All the URLs that exist here belong to one of these three categories:

So creating this sitemap file for my blog was as simple as that:

XML
---
layout: none
---

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> 
{% include sitemapxml.html url="/archive/" %}
{% include sitemapxml.html url="/projects/" %}
{% include sitemapxml.html url="/tags/" %}
{% for project in site.data.sidebarprojects %}{% include sitemapxml.html url=project.url %}
{% endfor %}{% for post in site.posts %}{% include sitemapxml.html url=post.url %}
{% endfor %}</urlset>

The three pages at the beginning are very unlikely to change, and new projects and new blog posts will be updated automatically when I add them to the rest of the site.

When You Have A Nested Data File

I'm running another site which uses the "dynamic" tree menu I described here.

There's a data file with all the URLs anyway, but the URLs are nested, so getting a list with all of them is a bit more complex.
The way to create a sitemap file here is similar to creating the menu: by using a recursive include file.

I'll show just the code here - read the blog post linked above for an in-depth explanation how this works, it's exactly the same approach.

/_includes/sitemap.html

HTML
{% for item in include.map %}{% include sitemapxml.html url=item.url %}
{% if item.subitems %}{% include sitemap.html map=item.subitems %}{% endif %}{% endfor %}

Again, the line breaks must be exactly as shown here in order to avoid unnecessary empty lines in the finished sitemap file.

And here's the actual sitemap file which uses the include, passing the data file with the menu information:

/sitemap.xml

XML
---
layout: none
---

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> 
{% include sitemap.html map=site.data.menu %}</urlset>

I won't show the generated XML here, but it looks exactly like the example in the very beginning of this post.

Telling Google about the Sitemap

Now there's only one thing missing: we need to tell the search engines about our new sitemap.

Another quote from Google's "Build a sitemap" page (from the very bottom):

Once you've made your sitemap, you can then submit it to Google with the Sitemaps page, or by inserting the following line anywhere in your robots.txt file:

Sitemap: http://example.com/sitemap_location.xml

For my sites, I immediately submitted the sitemaps to Google, but still inserted the line into robots.txt.

Apparently the link to the sitemap file needs to contain the full URL as well (not just /sitemap.xml), so created robots.txt with Jekyll as well, so I could reuse site.url:

/robots.txt

---
layout: none
---

Sitemap: {{ site.url }}/sitemap.xml

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer
Germany Germany
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
-- There are no messages in this forum --