Automatically Generate urllist.txt from sitemap.xml

Hero image for Automatically Generate urllist.txt from sitemap.xml. Image by LSE Library.
Hero image for 'Automatically Generate urllist.txt from sitemap.xml.' Image by LSE Library.

Note: times have changed significantly since I wrote this article. Now, using a backend technology like PHP to generate sitemap.txt and urllist.txt if you're using a static generator like Gatsby simply isn't necessary.

The solution below does still work, but if you want to see a more modern approach to generating textformatted sitemaps for your Gatsby site, take a look at my article here:

Automatically Generate Text Sitemaps in Gatsby


urllist.txt is a simple text file, at the root of your site, which should contain every URL from your site, each one on its own line.

Realistically, this was only ever used by Yahoo, and even they now also read the more ubiquitous sitemap.xml format. With that said, I still find 404 errors looking for urllist.txt in my logs every day or two and have often found it handy to have a simple list of URLs that I can copy and paste (for example, into Copyscape).

If you have access to PHP in your hosting environment and already generate a sitemap.xml, for example, using gatsbypluginsitemap (if you're not, then I really highly recommend you do), then you can generate urllist.txt from that, automatically:

header('Content-Type: text/plain; charset=utf-8');$sitemap = file('sitemap.xml');$allMatches = array();foreach ( $sitemap as $line_number => $line ) {  $line = trim($line);  preg_match_all('/(?<=\<loc\>)(.*?)(?=\<\/loc\>)/U', $line, $matches,PREG_SET_ORDER);  if($matches){    if ( $matches[0][0] != '' ) {      $allMatches[] = $matches[0][0];    };  };};foreach ( $allMatches as $url ) {  echo $url."\r\n";};

This sets the contenttype header (because what we want to return is a plain txt file), and then loops over the sitemap.xml file, echoing out each URL onto a new line.

I save this in my website root (or rather, in the static folder of my Gatsby project) as urllist.txt, and then edit the .htaccess file so that it processes the file as PHP:

<Files urllist.txt>  AddType application/x-httpd-ea-php73 .txt</Files>

...And that's that! I would say: in an ideal world, it would be worth finding a way to incorporate this process into your Gatsby build and cache the file, rather than using PHP at all; but as a quick fix to a very minor and irregular problem, this works well too.

You can see mine here.


Categories:

  1. Google
  2. Guides
  3. htaccess
  4. PHP
  5. Search Engine Optimisation
  6. Sitemaps