Overview

Skill Level: Any

A sitemap is a structured list of pages that comprise a website, which is used by site crawlers such as Google to index a site’s content. When you create a site with Acoustic content, as you create and modify your site pages, a set of sitemap files for your site are automatically created, published, and updated without any additional effort on your part. However, if you want to augment these programmatically generated sitemap files, for example, by adding additional metadata or a modified URL path, you can do it through the acoustic content developer tools or Acoustic content APIs. This tutorial shows you how to augment your sitemap files by using both the methods.

Prerequisites

To get started, you need the Acoustic Content developer tool known as wchtools. The wchtools provides a way to interact with your site from the command line. You can download and install the wchtools from GitHub.

Step-by-step

  1. Understanding the programatically generated sitemap files

    The automatically generated sitemap consists of a small set of files that live in the root path of your site.

    • /robots.txt — This file serves as an entry point. Web crawlers read this file to determine the rules for crawling your site. The robots.txt file of a website hosted in Acoustic content will contain pointers to one or more sitemap index and/or sitemap files.
    • /sitemap_index.xml — is a pointer to one or more sitemap files.
    • /sitemap.xml — is a sitemap file. This is the file that actually lists pages on your site and their URLs.

    Note: The files shown above are generalized. Because multiple websites could be hosted on the same server, your files will have unique prefixes to associate them with your particular site, such as mycustomhost_robots.txt or default_sitemap.xml. For simplicity, we will use the generic names throughout this tutorial.

  2. Creating new sitemap files

    You might have a need to update your sitemap files for your sites. In Acoustic content, the sitemap and sitemap_index files are regenerated whenever you create a page on your site. Therefore, when you update the  sitemap files that were created for your site in Acoustic content, any changes you make to these files can be lost at any time.

    Saving your augmentations in a new, separate sitemap file, and adding this new file to the list in the robots.txt file will prevent such rewrites. Additions to the robots.txt files are always reserved.

    1. You can use the existing sitemap as a template to create a new file. 
    2. After you create a new file, you must add it as a separate entry in robots.txt file. For example:

      #My robots.txt
      User-agent: *
      Disallow:
      sitemap: https://my10-stage.digitalexperience.ibm.com/4030f492-e057-4e05-8876-ad2077c9fa78/default_sitemap_index.xml
      sitemap: https://my10-stage.digitalexperience.ibm.com/4030f492-e057-4e05-8876-ad2077c9fa78/alternate_sitemap.xml

    You can make updates to the sitemap files or add the new files by using the Acoustic developer tool or the Acoustic content authoring API. The following steps show how to use both methods. 

  3. Method 1: Using the developer tools to update or add a new sitemap file

    Using wchtools to update your sitemap file is simple.

    1. Use the wchtools init command to establish a connection to your site’s API URL. For more information, see the Getting Started section in the wchtools readme.
    2. Run wchtools pull -A --dir <path-to-working-directory> to extract all of the site’s artifacts to your local system.

      When the pull command completes, you should see an assets subdirectory in your local working directory with the robots.txt, sitemap.xml, and sitemap_index.xml files.

      Note: These file names will probably have a unique prefix for your website.

      You can edit these files to suit your needs by either updating the existing sitemap.xml files or adding additional files.

    3. After you modified the files to your satisfaction, run wchtools push -A --dir <path-to-working-directory> to push the modified files or the newly added files back to your server. Only files that have changed since you downloaded them will be pushed back. You can refer to wchtools-cli.log to verify the success of the push.
  4. Method 2: Using the APIs to update or add a new sitemap file

    The sitemap files are maintained as unmanaged assets on your site. You can use the published Acoustic Content Hub APIs for authoring resources and assets to modify or add new sitemap files.

    To create or update one of the sitemap files with the APIs, you need to

    1. Create a resource of the new or updated file.
    2. Create or update an asset, which identifies a specific resource as a part of the website.

    The following example describes the steps  to provide a new sitemap file that gives alternative URL paths for pages on your site through APIs in detail.

    1. Create a new sitemap file on your local system and name it alternate_sitemap.xml.
      Note: You can use your existing sitemap file as a starting point and make modifications to that.
    2. Create a new resource for the new xml file in your site. When the resource is successfully created, the HTTP response to your request will include the resource’s ID.

      Example:

      curl --request POST --url "https://{DomainName}/path}/authoring/v1/resources?name=alternate_sitemap.xml" --header "accept: application/json" --header "Content-Type: text/xml; charset+utf-8" --header 'x-ibm-client-id: REPLACE_KEY_VALUE' --header 'Content-Length:495' --data-binary @alternate_sitemap.xml

      Response:

      {"id" : "0f01846b-5c3f-4e31-b753-475da2256d2c"}

      Notes:

      • name=alternate_sitemap.xml represents the name the new resource will have on your site.
      • –data-binary@alternate_sitemap.xml points to the local file you created as input.
      • The Content-Length header must match the size of the input file.
      • Content-Type: text/xml; charset=utf-8 is appropriate for a sitemap xml file.
    3. Next, create a new asset using the resource you just created. This adds your file as a part of the hosted website.

      Example:

      Create a JSON file containing metadata about your asset. “resource” will be the resource ID created in step 2.

      Body:

      {
        "resource": "0f01846b-5c3f-4e31-b753-475da2256d2c",
         "path" : "https://dw1.s81c.com/customer-engagement/alternate_sitemap.xml",
         "description": "Handcrafted sitemap file",
         "tags" : {
            "values" : [
              "sitemap"
             ]
         }
      }

      Request:

      curl --request POST --url 'https://{DomainName}/path}/authoring/v1/assets' --header 'x-ibm-client-id: REPLACE_KEY_VALUE' --header 'x-ibm-client-secret: REPLACE_KEY_VALUE' --data@alternate_body.json

      Response (edited for brevity):

      {"fileName" : "alternate_sitemap.xml", ...,"id":"2422611c-a92a-4ae3-a0f7-741eefe476ea", "rev":"1-4f514b5c66c02dfb44c224f83f8030b2", ..., "status":"ready"}

    4. Update robots.txt to include this new sitemap file to ensure that web crawlers will find it.
      • The first step is to discover the resource ID of robots.txt. Use the get all resources endpoint and search the output for robots.txt.

        Example:

        curl -- request GET --url "https://{Do,ainName}/{path}/authoring/v1/resources/views/by-created' --header 'x-ibm-client-id:REPLACE_KEY_VALUE' --header 'x-ibm-client-secret: REPLACE_KEY_VALUE'

        Response (edited for brevity):

        {"href":"/authoring/v1/resources/views/by-created","offset":0,"limit":50, "items":[{"rev":"1-1ec7264e18cf18728447e8eed46978f2", "fileName":"default_robots.txt","metadata":{},"fileSize":273, "fileExtension":"txt","created":"2019-07-29T17:09:00.439Z", "creatorId":"d91d25cc-13d9-4ffb-b37d-3d21c981e598","mediaType":"text/xml; charset=UTF-8","binaryUuid":"3b534430-d28f-4955-a9ad-424a5ab00188","id":"bba013be-f647-4983-8627-744350c44d93","classification":"resource","md5":"jqMzMmzQdT/J6VFhfnRmEA=="},...],"next":"/authoring/v1/resources/views/by-created?offset=50"}

      • Next, use this resource ID to retrieve the current contents of the robots.txt file.

        Example:

        Request

        curl --request GET --url 'http://{DomainName}/{path}/authoring/v1/resources/bba013be-f647-4983-8627-744350c44d93?mode=string&bypass-cache=string' --header 'accept: */*' --header 'x-ibm-client-id: REPLACE_KEY_VALUE' --header 'x-ibm-client-secret: REPLACE_KEY_VALUE'

        The response to this request will be the file contents:

        # My robots.txt
        User-agent: *
        Disallow:
        sitemap: https://my10-stage.digitalexperience.ibm.com/4030f492-e057-4e05-8876-ad2077c9fa78/default_sitemap_index.xml

      • Create a local copy of this file containing a reference to the new sitemap asset you created previously.

        # My robots.txt
        User-agent: *
        Disallow:
        sitemap: https://my10-stage.digitalexperience.ibm.com/4030f492-e057-4e05-8876-ad2077c9fa78/default_sitemap_index.xml
        sitemap: https://my10-stage.digitalexperience.ibm.com/4030f492-e057-4e05-8876-ad2077c9fa78/alternate_sitemap.xml

      • Create a new resource representing this modified file.

        Request:

        curl --request POST --url 'https://{DomainName}/path}/authoring/v1/resources?name=default_robots.txt' --header 'accept: application/json' --header 'Content-Type: text/xml; charset=utf-8' --header 'x-ibm-client-id: REPLACE_KEY_VALUE' --header 'x-ibm-client-secret: REPLACE_KEY_VALUE' --header 'Content-Length:273' --data-binary @updated_robots.txt

        Response:

        {"id":"bba013be-f647-4983-8627-744350c44d93"}

      • The final step will be to update the existing robots.txt asset to point to this new updated resource. To do that, you’ll need to find the asset ID of the robots file. Get all assets and search for the file name in the response.

        Request:

        curl -–request GET -–url 'https://{DomainName}/{path}/authoring/v1/assets' --header 'x-ibm-client-id: REPLACE_KEY_VALUE' --header 'x-ibm-client-secret: REPLACE_KEY_VALUE'

        Response (edited for brevity):

        {"href":"/authoring/v1/assets","offset":0,"limit":50,"items":[{"fileName":"default_robots.txt","metadata":{},"creatorId":"00000000-0000-0000-0000-000000000009","description":"robots exclusion file","mediaId":"d9d6fb71-9d43-4bdc-8454-fa3db9fb1937","path":"https://dw1.s81c.com/customer-engagement/default_robots.txt","digest":"MPYOaUSb6O24T1XX5ma8rQ==","lastModifierId":"00000000-0000-0000-0000-000000000009","id":"7178c880-a140-4522-9c8a-da56323e91ad","usageRights":{"categories":[]},"systemModified":"2019-07-26T15:15:55.704Z","rev":"1-5fc8e3a68220e5ad36b5a4628a743228","resource":"6534f63c-67c5-48e8-b162-a76ee42fc7d8","created":"2019-07-26T15:15:55.701Z","mediaType":"text/xml; charset=UTF-8","classification":"asset","assetType":"file","tags":{"declined":[],"values":[],"analysis":"none"},"isManaged":false,"categoryIds":[],"cognitive":{"status":"none"},"fileSize":158,"name":"default_robots.txt","lastModified":"2019-07-26T15:15:55.704Z","status":"ready"},...],"next":"/authoring/v1/assets?offset=50"}

      • Finally, update the existing robots.txt asset to point to the newly updated file resource.

        Body: Use the resource ID you created for this modified file from the previous steps.

        {
           "resource": "bba013be-f647-4983-8627-744350c44d93",
           "path": "https://dw1.s81c.com/customer-engagement/default_robots.txt",
           "description": "Handcrafted robots.txt file",
           "tags": {
             "values": [
                    "sitemap"
                ]
            }
        }

        Request: Use the asset ID discovered in the previous step

        curl --request PUT --url 'https://{DomainName}/path}/authoring/v1/assets/7178c880-a140-4522-9c8a-da56323e91ad?forceOverride=true' --header 'x-ibm-dx-publish-priority:now' --header 'x-ibm-client-id: REPLACE_KEY_VALUE' --header 'x-ibm-client-secret: REPLACE_KEY_VALUE' --data @updated_robots.json

        The Response will show the same asset ID with a new revision number:

        {"fileName":"default_robots.txt","description":"Handcrafted robots.txt file",...,"id":"7178c880-a140-4522-9c8a-da56323e91ad",...,"rev":"3-4602de7f9ccbabfbe35ade3a676ce963",...,"status":"ready"}

Expected outcome

Once these steps are complete and all updated assets have published, you will be done. Your new sitemap file will continue to be served until you remove it, and you can update it whenever you want with the Authoring APIs.

Join The Discussion

Your email address will not be published. Required fields are marked *