Automatically Submit Sitemaps to Google During Gatsby Build

Hero image for Automatically Submit Sitemaps to Google During Gatsby Build. Image by Christopher Burns.
Hero image for 'Automatically Submit Sitemaps to Google During Gatsby Build.' Image by Christopher Burns.

Any seasoned website owner or 'master', or 'maintainer' will be familiar with sitemaps and the value they can bring in giving search engines a gentle nudge to look at your published content.

In brief and in its simplest form a sitemap is a file (usually in XML format called sitemap.xml) which lives in the root of your domain and lists the website structure and page URLs in a machinereadable format. Search engines can then use this file to get an overview of your website, its structure, and what URLs it should be sending Googlebot out to index. For a reallife example, this is the XML sitemap for this very site, which is generated during every build using gatsbypluginsitemap.

In the past, you could also provide a .txt format called urllist.txt which simply lists every URL on the site (see mine here), which I've talked about in the past, but this is more useful for general housekeeping than for genuine search engine optimisation in my experience.

When managing your site via Google Search Console you are able to let Google know that a sitemap exists (and where it is). With this information, Googlebot will revisit the sitemap every now and again ad use its contents to pick up on new changes to the site. The timings here are deliberately a little on the vague side: nobody really knows the inner workings of Google's indexing, but it would be safe to assume that a website which changes frequently and has a high PageRank will be revisited more frequently than one that falls lower in either criteria.

From personal experience, if left to its own devices, Googlebot will fetch your sitemap anywhere between twice a week to less than once a month, in a similar pattern to how often the rest of your site is scheduled for a recrawl.

Photograph of the Google logo from the Google homepage by Christian Wiediger on Unsplash.

Notifying Google of Changes

The problem here is that if and when you make changes to your website, you are left to wait until Google next visits the site before those changes start to (potentially) appear in search results, or affect your rankings. As you might expect, Google does offer mechanisms to tell them ahead of whenever your next GoogleBot visit may be due, that you have new content.

The Manual Way

Google has some great documentation on how to submit URLs to Google. The most straightforward approach (but most labourintensive) is to log into Search Console and manually resubmit your sitemap via Search Console.

You can also use the <lastmod> field within your XML sitemap to at least let Googlebot know which links have been updated recently. This is also a valuable practice in showing how frequently your content changes.

The Programmatic Way

Google also supplies an endpoint which you can ping programmatically to submit your sitemap. According to their documentation, it looks like this:

http://www.google.com/ping?sitemap=URL/of/file

To offer more context, this is the endpoint I would ping, to resubmit my sitemap when I've made changes:

http://www.google.com/ping?sitemap=https://johnkavanagh.co.uk/sitemap.xml

Doing It Automatically

We know that with pure static site generators like Gatsby, the content will only change during a new build*, so it would make sense to:

  1. Generate a new sitemap as part of the build with the <lastmod> fields updated to reflect any updated content, with the caveat that only the URLs that have actually changed require an updated lastmod.
  2. Ping this to Google automatically once it has been created.

I've talked about adding additional steps into your Gatsby deployment process in the past. Fundamentally it is as simple as creating an additional Node.js script and adding it to the process.

In this case, Sean Wilson has already done the heavy lifting for us with his excellent Node.js module submitsitemap, which automates the process of submitting your website to Bing and Google using their programmatic endpoints.

This process may well feel very familiar, all we are going to do is create a new node process using this file, and chain it into our deploy script:

Install Submit‑Sitemap

First things first, we need to add submitsitemap to your project by simply running yarn add submitsitemap from the project root, or npm install submitsitemap if you're more NPMinclined. This will install it into your node_modules folder, and add it as a dependency in package.json.

Write Your Ping Script

Here you have a choice: if you already have other processes you use during deployment, then you can append this into those, otherwise, you can create a new step. Create a new .js file in the root of your project. I called mine ping.js.

Then, you just include submitsitemap, and pass your sitemap to it:

var submitSitemap = require('submit-sitemap').submitSitemap;submitSitemap('https://johnkavanagh.co.uk/sitemap.xml');

There really is very little more to it than that, although you could also catch errors and output them to the console should you wish (note that I've never once seen an error crop up with this process, but your results may vary):

submitSitemap(  'https://johnkavanagh.co.uk/sitemap.xml',  function(err) { console.error(err); });

Add Ping to Your Deploy Script

The final step is to tie your new (very simple) ping script into your existing build and deploy process. As is often the case, this will come down to your own personal propensity. In my projects, I will often have a deploy script which my CI can then call via yarn deploy, and which will look something a little like this:

"scripts": {  "deploy": "gatsby clean && gatsby build && node deploy && node ping"}

This simply chains together and steps through four commands in order:

  • gatsby clean wipes out any existing cache. Generally, I would advise against this because you then need to download and process everything all over again, but for some projects, it's an imperative first step;
  • gatsby build builds the Gatsby project;
  • node deploy this comes after build has completed and can contain any postbuild tasks like for example deploying a copy of the site via FTP, or prepending backend functionality to a page;
  • node ping calls our new ping.js script and triggers a ping to Google and Bing to let them know that the sitemap has been updated.

It's important to make sure you get your ordering right here: there's no point attempting to ping your sitemaps before you have built the new version of the site! But really, it is as simple as that. Next time you call yarn deploy, your latest sitemap will be submitted to Google and Bing automatically and you will hopefully see your content appearing within the search results that little tiny bit sooner.


* Obviously there are exceptions to this!


Categories:

  1. Development
  2. FTP
  3. Gatsby
  4. Node.js
  5. Search Engine Optimisation
  6. Sitemaps