
Automatically Generate Text Sitemaps in Gatsby

A few years back, I wrote an article about generating a urllist.txt file from sitemap.xml. This used PHP to deconstruct the xml file and return a simple list of URLs on the fly, this worked well for me at the time because my hosting situation meant I could still use PHP alongside statically‑generated sites.
Now that I'm trying to do away with that by moving to a more decentralised and static host. I'm coming back across the same issue I had back then: gatsby‑plugin‑sitemap only generates a sitemap in the xml format (as sitemap.xml) and not ‑ as I still want to include ‑ text‑formatted sitemaps like urllist.txt.
What are Text Formatted Sitemaps?
A text‑based sitemap is a simple, plain text file that lists all the URLs of a website. Each URL is presented on a new line, creating a clear and concise map of the site's structure. These sitemaps are typically named sitemap.txt or urllist.txt, you can see mine here: sitemap.txt, and urllist.txt. The eagle‑eyed amongst you will recognise that they are identical, because they are ‑ it's just two different file naming conventions from two different eras of the web.
Back in the early days, text‑formatted sitemaps were used to submit your website content specifically for the Yahoo search engine. However, when Yahoo also announced support for XML sitemaps, the use of the text‑formatted versions dwindled. Nevertheless, they still play a role in SEO: search engines will still accept them alongside their XML cousins, and due to their simplicity, some argue that search engine crawlers find it easier to discover and index your pages, which could improve your site's visibility and SEO performance.
Generating sitemap.txt and urllist.txt
Stepping away from the idea of using a back‑end technology to generate these sitemaps, I've instead been focusing on what Gatsby can offer.
After all, during the build process, in gatsby‑node.js we have access to:
- Node.js, which means we have access to
fsto create and manipulate files; - GraphQL, which means we can run queries;
allSitePagewhich returns a list of all page paths within your site.
So it should be simple, right?
Implementing onPostBuild
It took me a little bit of trial and error, but it turns out the ideal place to generate our text‑formatted sitemaps is during onPostBuild. This runs after the build process is complete, which means we can be sure all the pages have been created before generating the sitemaps.
Get the Data We Need
There are two pieces of data we need:
- A list of all paths, which comes from
allSitePage; - The site URL, which comes from
siteMetadata‑ assuming you've set that up inGatsbyConfig.
My query looks like this:
{ allSitePage { nodes { path } } site { siteMetadata { siteUrl } }}Filter Out Error Pages
This is an optional step depending on whether you've opted to set up error pages or not, but for me: I have a 404 page, which is served when a user attempts to access a link that doesn't exist. I don't want that included in the sitemaps, so we can filter that out very simply:
nodes.filter(node => !node.path.includes('404'))Obviously, your application may have other URLs that you don't want to include in your sitemap too, in which case you can simply extend the filter to remove them.
Use Node.js fs to create the file
With Gatsby, anything in the public folder at build‑time will be placed in the root of the live site. We don't want to use the static folder at this point (the contents of which get copied over into public) because the copy has already happened, and for many, the static folder is version‑controlled whilst the public folder should not be.
So, we can use writeFileSync to output our new files. Something like this:
const filePath = path.join(__dirname, 'public', 'sitemap.txt');fs.writeFileSync(filePath, sitemapContent);The Full Solution
Piecing it all together, what you have is a block of code that sits at the bottom of your gatsby‑node.ts file, and looks like this:
exports.onPostBuild = async ({ graphql, reporter }) => { try { const result = await graphql(` { allSitePage { nodes { path } } site { siteMetadata { siteUrl } } } `); if (result.errors) { reporter.panic('Error in the GraphQL query for sitemap: ', result.errors); return; } const sitemapContent = result.data.allSitePage.nodes .filter((node) => !node.path.includes('404')) // Filtering out paths that contain '404' .map((node) => `${result.data.site.siteMetadata.siteUrl}${node.path}`) .join('\n'); const sitemapPath = path.join(__dirname, 'public', 'sitemap.txt'); const urllistPath = path.join(__dirname, 'public', 'urllist.txt'); // Write to sitemap.txt fs.writeFileSync(sitemapPath, sitemapContent); reporter.info('Successfully created sitemap.txt.'); // Write to urllist.txt fs.writeFileSync(urllistPath, sitemapContent); reporter.info('Successfully created urllist.txt.'); } catch (error) { reporter.panic('Failed to create sitemap and urllist files: ', error); }};What you will find at the end of your build is two new txt files in the root of your site (urllist.txt and sitemap.txt), both containing a complete list of every URL on your site.
And that's it!
Related Articles

Harnessing JavaScript's defineProperty(). Harnessing JavaScript's
Automatically Submit Sitemaps to Google During Gatsby Build. Automatically Submit Sitemaps to Google During Gatsby Build

What is Front‑End Development? What is Front‑End Development?

Some of the Most‑Misunderstood Properties in CSS. Some of the Most‑Misunderstood Properties in CSS

Removing p Tags from Contentful List Items. Removing
pTags from Contentful List Items
For...in vs. for...of in JavaScript. for...invs.for...ofin JavaScript
Parent Selectors in CSS and Sass. Parent Selectors in CSS and Sass

How Inheritance Works in the JavaScript Prototype Chain. How Inheritance Works in the JavaScript Prototype Chain

UseReducer in React. useReducerin React
Access CSS Variables from a Database via db‑connect. Access CSS Variables from a Database via
db‑connect
Using data‑* Attributes and dataset in JavaScript. Using
data‑*Attributes anddatasetin JavaScript
Optimising Website Performance with HTML, CSS, and JavaScript. Optimising Website Performance with HTML, CSS, and JavaScript