A Perch XML Sitemap

One of the best things about the Perch CMS is its lack of boilerplate. In fact, Perch creator Rachel Andrew is quite opposed to such bloated, polyfill-ridden framework code. Why include a mass of default code you've yet to decide you even need? The performance hit is just not tolerable.

The trade-off (if you can call 'having tremendous fun authoring your own code' a trade-off) is that there are a number of features that are simply not supported by default. At the time of writing, a dynamic XML sitemap for Perch's blog extension is one such feature. In Wordpress, you'd simply install a plugin, click a few buttons in the admin GUI and assume that the 4.5 star community rating means it is dependable. In Perch, you have to be a real developer and do it mostly from scratch. The basic steps are as follows:

  • Create a sitemap.php page
  • Create a sitemap-item.html template
  • Configure perch_blog_custom() in sitemap.php to use the template
  • Manage extensions in the .htaccess file
  • Refer to the sitemap location in your robots.txt

Create Sitemap.php

Sitemap.php is set up almost exactly like any other Perch-enabled page on your server, except that the perch runtime include should appear after the content type header that defines the document as XML:

<?php
header ("Content-Type:text/xml");
include('perch/runtime.php');
?>
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">	
  <!-- ENTRIES HERE -->
</urlset>

With that taken care of, the next thing you ought to do is include all the static and category pages. Nothing fancy; just code them in manually using the following format:

<url>
   <loc>http://www.your-site.com/your-url</loc>
   <changefreq>monthly</changefreq>
   <priority>0.2</priority>
</url>

Priority: The priority tag is used to tell search engines which pages should be paid the most attention when there are a number of site-wide updates to your content. "1.0" is highest and is typically attributed to the homepage. On a blog that lists excerpts of recent articles on the home page (like HeydonWorks), this pattern seems logical.

Sitemap-item.html

Our next task is to create a template for our post entries so that we can inject them amongst the hand-coded <url>s we devised in the last section. Take a single <url> snippet like the one described above and place it in a new file, saving it in templates/blog as 'sitemap-item.html'. After you've inserted your Perch content template tags, the code should look something similar to this:

<url>
  <loc>http://www.your-site.com/article/<perch:blog id="postSlug" /></loc>
  <changefreq>monthly</changefreq>
  <priority>0.2</priority>
</url>

Next, we need to call the post URL information via our template in sitemap.php. This is a perfect case for the perch_blog_custom() function, configured as follows.

$opts = array(
  'template'=>'blog/sitemap-item.html',
  'sort'=>'postDateTime',
  'sort-order'=>'DESC'
);

perch_blog_custom($opts);

We've already told the sitemap.php to render as XML, but the .php extension remains. Google will be looking for 'sitemap.xml' so we'll need to make a rule in our htaccess file to serve 'sitemap.php' when 'sitemap.xml' is requested. Add the following to your htaccess:

RewriteRule (.*).xml(.*) $1.php$2 [nocase]

As well as submitting your XML sitemap to Google via Google Webmaster Tools, it's also a good idea to place a reference in robots.txt:

Sitemap: http://www.your-site.com/sitemap.xml

That should be everything you need. If not, see the commenting notice below.