Luc Shelton

SilverStripe: Generating Google Sitemaps for DataObjects

SilverStripe: Generating Google Sitemaps for DataObjects

SilverStripe: Generating Google Sitemaps for DataObjects

SilverStripe: Generating Google Sitemaps for DataObjects

Updated 5 months ago
8 Minute(s) to read
Posted 9 months ago Updated 5 months ago 8 Minute(s) to read 0 comments

Lately I've been trying to expose as much as my website as possible through the Google Sitemap that is generated automatically when you navigate to the /sitemap.xml endpoint of this website. The purpose of this file is to give an overall page hierarchy to Google, so that it can crawl the contents of your website more efficiently.

Fortunately, Silverstripe comes with good official support for generating Google Sitemaps and already has a module that is available here. The author has done a good job of making it extensible which has meant that I've been able to add additional information, photos, and screenshots relating to "events" or "projects" that I have published on here.

Installing this module is as simple as running the following command.

#!/bin/bash
composer require wilr/silverstripe-googlesitemaps

I did, however, encounter a problem whereby I wanted my non-page related DataObjects to appear in the sitemap.xml that was being generated. By default, the extension won't pick up DataObjects that don't derive from the SiteTree type. Instead you will have to manually register the DataObject yourself.

In this instance, I've had to append the following data objects to my website's _config.php file.

use Wilr\GoogleSitemaps\GoogleSitemap;

GoogleSitemap::register_dataobject(TechnologyTag::class, 'weekly');
GoogleSitemap::register_dataobject(PlatformTag::class, 'weekly');
GoogleSitemap::register_dataobject(ProgrammingLanguageTag::class, 'weekly');

As you can see, I've manually "registered" three types that derive from DataObject with the GoogleSitemap extension mentioned above. The extension will make a few assumptions about your DataObject types, which are the following:

  1. It will have a Link function defined (or overridden).
  2. It will have an AbsoluteLink function defined (or overridden).
  3. It will have the canView function overridden, and returning a value of "true".

Of course, you will want the returning links to go somewhere useful, but more importantly you will need to have PageControllers defined to serve those links.

Here's a sample snippet from TechnologyTag (my custom DataObject type).

<?php

namespace Portfolio\Models;

use SilverStripe\Assets\Image;
use SilverStripe\ORM\DataObject;

use SilverStripe\Control\Director;

class TechnologyTag extends DataObject
{
    private static $table_name = "TechnologyTag";

    private static $plural_name = "Technologies";

    private static $singular_name = "Technology";

    ...

    public function updateImagesForSitemap(&$list)
    {
        if (!isset($list)) {
            throw new Exception("The list for updating sitemap images was not defined. Unable to continue.");
        }

        if ($this->Logo && $this->Logo->exists()) {
            $list->push($this->Logo);
        }
    }

    public function Link()
    {
        return 'technology/' . $this->URLSegment;
    }

    public function AbsoluteLink()
    {
        return Director::absoluteURL($this->Link());
    }

    public function getGroupedJobs()
    {
        return GroupedList::create($this->Jobs());
    }
}

An example could be that someone navigates to /technology/php which will then render a template containing information about your TechnologyTag DataObject.

To do this, you will need to do two more things:

  1. Define PageControllers for your DataObjects.
  2. Create YAML configuration file that adds routes for the Silverstripe Director type.

The YAML configuration could look a little something like this:

---
Name: approutes
After: framework/_config/routes#coreroutes
---
SilverStripe\Control\Director:
  rules:
    'technology/$ID!': 'Portfolio\Models\TechnologyTagController'

...

This configuration ensures that this configuration is loaded after the necessary routes for Silverstripe to function are loaded, and that an parameter in place of the $ID variable is defined.

This parameter is then retrievable from the controller that is defined (which is Portfolio\Models\TechnologyTagController).

The controller looks a little something like this.

<?php

namespace Portfolio\Models;

use SilverStripe\Control\HTTPRequest;

use Portfolio\Models\TechnologyTag;

use PageController;

class TechnologyTagController extends PageController
{
    private static $allowed_actions = [
        'projects',
        'jobs'
    ];

    private static $url_handlers = [
        'projects' => 'projects',
        'jobs' => 'jobs'
    ];

...

    public function index(HTTPRequest $request)
    {

        $tagSlug = $request->params()['ID'];
        $tags = TechnologyTag::get()->filter('URLSegment', $tagSlug);

        $data = [
            'PageObject' => $tags->First()
        ];

        return $this->customise([
            'Layout' => $this
                ->customise($data)
                ->renderWith(['Portfolio\Models\Layout\TagController'])
        ])->renderWith(['Page']);
    }
}

The last piece of the puzzle is to then ensure that your DataObject has the URLSegment field defined.

The first way could solve this is by simply adding a URLSegment field to your DataObject's database fields. Straight-forward stuff.

<?php

namespace Portfolio\Models;

use SilverStripe\ORM\DataObject;

class TechnologyTag extends DataObject
{

...

    private static $db = [
        'URLSegment' => 'Varchar',
    ];

...

}

However, you will probably quickly realise just how cumbersome it is to constantly append this to every DataObject you define afterwards. Don't forget, we've not even defined the logic responsible for generating the URL slug (or URLSegment) which would probably have to get copied around, too.

The better approach could be to create a DataExtension instead. I won't talk about how DataExtension types work here, because there's plenty of documentation already.

Your DataExtension could look like this:

<?php

namespace Portfolio\Extensions;

use SilverStripe\Forms\FieldList;
use SilverStripe\ORM\DataExtension;
use SilverStripe\View\Parsers\URLSegmentFilter;

class PortfolioURLSegmentExtension extends DataExtension
{
    private static $db = [
        'URLSegment' => 'Varchar(255)'
    ];

    public function onBeforeWrite()
    {
        if ($this->owner->hasField('URLSegment')) {
            if (!$this->owner->URLSegment) {
                $this->owner->URLSegment = $this->generateURLSegment($this->owner->Title);
            }

            if (!$this->owner->isInDB() || $this->owner->isChanged('URLSegment')) {
                $this->owner->URLSegment = $this->generateURLSegment($this->owner->URLSegment);
                $this->makeURLSegmentUnique();
            }
        }
    }
    
    public function IsURLSegmentInUse($URLSegment)
    {
        $class = $this->owner;
        $items = $class::get()->filter('URLSegment', $URLSegment);

        if ($this->owner->ID > 0) {
            $items = $items->exclude('ID', $this->owner->ID);
        }

        return $items->exists();
    }

    public function makeURLSegmentUnique()
    {
        $count = 2;
        $currentURLSegment = $this->owner->URLSegment;

        while ($this->IsURLSegmentInUse($currentURLSegment)) {
            $currentURLSegment = preg_replace('/-[0-9]+$/', '', $currentURLSegment) . '-' . $count;
            ++$count;
        }

        $this->owner->URLSegment = $currentURLSegment;
    }

    public function generateURLSegment($title)
    {
        $filter = URLSegmentFilter::create();
        $filteredTitle = $filter->filter($title);

        $ownerClassName = $this->owner->ClassName;
        $ownerClassName = strtolower($ownerClassName);

        if (!$filteredTitle || $filteredTitle == '-' || $filteredTitle == '-1') {
            $filteredTitle = "$ownerClassName-$this->ID";
        }

        return $filteredTitle;
    }
}

Don't forget to hook it up with your DataObjects in the relevant extensions.yml configuration file like so.

---
Name: portfolio-extensions-objects
After:
    - '#portfolio-extensions'
---
Portfolio\Models\TechnologyTag:
  extensions:
    - Portfolio\Extensions\PortfolioURLSegmentExtension

And also ensure you have run /dev/build?flush=1 to ensure that your DataObject's are being built with the relevant URLSegment field.

Summary

What this DataExtension is doing is simply ensuring that a "URLSegment" (a "slug") is generated each time this DataObject is saved to the database. It does this in a way that is relatively safe, by doing additional checks to determine if generated URL segments or slugs exist in the database. If they don't, then the object will be written to the database.

It does this by incrementing a counter and checking subsequent mutations of the URL slug to see if there is another DataObject in the database with that URL segment.

This is the part of the code from the snippet above that takes care of this.

        while ($this->IsURLSegmentInUse($currentURLSegment)) {
            $currentURLSegment = preg_replace('/-[0-9]+$/', '', $currentURLSegment) . '-' . $count;
            ++$count;
        }

As an example, if there is a DataObject in the database that already has a URLSegment field defined as my-url-slug-goes-here, it will then append a number and check again (as per the condition of the while loop).

Retroactively Populating URL Segments

If you are adding the URLSegment field to a DataObject that already has numerous instances stored in the database, it won't automatically populate the URLSegment field until after you edit and save (as per the function overload - onBeforeWrite). To fix this, it would mean that you would have to go through every DataObject you've extended, hit "Publish" in your CMS backend, and then refresh. That's a lot of work so instead we can automate this with a BuildTask.

Read more information here about Build Tasks.

Create a new build task under "Tasks" in your /app/src directory. Your directory layout should look like this.

./app/src/
│
└── Tasks
    └── UpdateTags.php
 

You can call your file whatever you want, but my BuildTask is named "UpdateTags".

And then you can write code such as this.

<?php

use Portfolio\Models\TechnologyTag;

class UpdateTags extends BuildTask
{
    public function run($request)
    {
        if ($request == null) {
            throw new Exception("The request object is invalid or null");
        }

        $technologyTags = TechnologyTag::get();
        foreach($technologyTags as $tag) {
            $tag->write();
        }   
    }
}

Save and ensure that the file necessary permissions on disk. Then, finally, you can run the task by navigating to the endpoint /dev/task/UpdateTags in your browser.

It should automatically invoke onBeforeWrite, which contains the logic for generating your URLSegments.


Adding Blog Posts to Sitemaps

By default, blog posts should automatically appear in the sitemap as the type derives from Page, which in turn derives from SiteTree. However, you may also wish for blog tags and categories to appear separately in another sitemap. In that case, you may also appreciate adding these to your _config.php

use Wilr\GoogleSitemaps\GoogleSitemap;

use SilverStripe\Blog\Model\BlogPost;
use SilverStripe\Blog\Model\BlogTag;
use SilverStripe\Blog\Model\BlogCategory;

GoogleSitemap::register_dataobject(BlogPost::class, 'daily');
GoogleSitemap::register_dataobject(BlogTag::class, 'weekly');
GoogleSitemap::register_dataobject(BlogCategory::class, 'weekly');

This should then appear in your generated sitemap.xml as such:

Sitemap Rendering

A small screenshot displaying how the sitemap.xaml should appearing after adding the relevant types.

Which then should look a little something like this upon clicking:

Silverstripe sitemap.xml rendering for blog tags

Silverstripe sitemap.xml rendering for blog tags

Thanks for reading! Feel free to leave a comment if you want anything explained further.


Relevant Links

You might also find these links useful:


Programming Languages:

PHP Bash Shell Script


Comments

Comments