Structured data enables you to create websites that are more easily findable and searchable by search engines and machines. Schema.org creates an on-page, structured data markup schema that helps search engines understand the information on web pages and provide richer search results.

The final part of this series describes some of the tools you can use to implement Schema.org on your site. We’ll look at example of three different types of tools, including tools that:

  • Interactively guide you to add structured data to your plain HTML.
  • Validate the structured data in your pages.
  • Let developers parse the structured data from such pages.

Because Schema.org is open source (Apache license), maintained by the W3C Schema.org Community Group, you can use it without being locked into a proprietary tool. You can also automate the process of keeping up with developments.

Generating structured data

Many popular content management systems, including WordPress, Drupal, and Joomla, have plugins that simplify Schema.org output. If your CMS doesn’t have a handy plugin, or if you are generating pages directly, Google’s Structured Data Markup Helper is a useful tool.

The Structured Data Markup Helper allows you to enter content suitable to several key areas of the Schema.org vocabulary. It also permits you to get an updated version with structured data that you can refine and use. Furthermore, the helper works to generate HTML-formatted email. You can either point it at HTML you’ve already published, or paste HTML into a provided text area. Figure 1 shows the helper after some HTML for a book club event page has been pasted into it.

Figure 1. Structured Data Markup Helper
Front page of Structured Data Markup Helper

After pasting in the text above, I click Start Tagging and get the workspace shown in Figure 2.

Figure 2. Structured Data Markup Helper workspace
Main workspace of Structured Data Markup Helper

From this workspace, I can select parts of the content in the HTML preview to the left and create Schema.org markup for them, from the subset offered by the helper.

I can also see whether I have the minimum content needed for the desired Schema.org type. For example, the event name and start date (highlighted in yellow on the right) are required fields. Looking back at the code in Figure 1, I can set the name to “Book Discussion Meeting,” but I don’t have a start date. So I go back to the HTML and add a phrase for the meeting date.

Unstructured book club page: Adding a meeting date
<main>
          <h1>Geo Book Club</h1>
          <div>
          Founding member Alice Ng welcomes you!
          </div>

          <div>
            <p>Please join us for our next book discussion meeting on June 1st,
            all about the novel
              <u>Things Fall Apart</u> by
              <a href="http://enwp.org/Chinua_Achebe">
                Chinua Achebe
              </a> (ISBN: 9780393932195)</p>
              <img src="https://upload.wikimedia.org/wikipedia/en/6/65/ThingsFallApart.jpg">
          </div>

          </main>

Do-it-yourself structured data

  1. To follow along from this point, go to Structured Data Markup Helper, select the Events radio button and the HTML option. Paste in the HTML code above.
  2. In the workspace, click and drag to highlight the phrase book discussion meeting. On the resulting menu, click Name.
  3. Click and drag to highlight the phrase June 1st. On the resulting menu, click Start Date and Date/Time (Autodetect). Your workspace should now look like Figure 3.
    Figure 3. Workspace with required fields
    Workspace of Structured Data Markup Helper, adding required fields
  4. As you can see from the right sidebar, the requirement for “Name” is now satisfied, but there is a warning on the “Start date” field. The helper was able to figure out the month and day, but the year is missing. You can provide the year for the machine-readable data without involving the HTML by clicking the Add missing tags button at the bottom.
  5. From “Select tag type,” click Start date > Advanced > Year.
  6. You can then type 2018 at “Add tag”. When you click Save, the warning should disappear and the HTML should be ready to use.
  7. For added measure, click on the book’s cover image and click Image in the pop-up. The workspace should look like Figure 4.
    Figure 4. Workspace ready for export
    Workspace of Structured Data Markup Helper, ready for export

Although the Structured Data Markup Helper covers only a small subset of Schema.org and doesn’t yet support RDFa output, it is a great way to get started. By working with it, you will find that writing your own structured data will begin to seem less of a mystery.

Getting your hands on the result

To see the fruits of your efforts, click CREATE HTML. The result should look like Figure 5.

Figure 5. HTML export page
HTML export page of Structured Data Markup Helper

On the right-hand side is your starting HTML, with highlighted bits added by the helper (in Microdata format by default). You can download the entire HTML output into your development tools or use the highlighting to write code to generate the structured data output. For this example, the output is as follows.

Structured Data Markup Helper output
<!‑‑ Microdata markup added by Google Structured Data Markup Helper. ‑‑>
          <html><head></head><body><main>
          <h1>Geo Book Club</h1>
          <div>
          Founding member Alice Ng welcomes you!
          </div>

          <div itemscope itemtype="http://schema.org/Event">
            <p>Please join us for our next 
          <span itemprop="name">book discussion meeting</span> on 
          <span itemprop="startDate" content="2018‑06‑01">June 1st</span>,
            all about the novel
              <u>Things Fall Apart</u> by
              <a href="http://enwp.org/Chinua_Achebe">
                Chinua Achebe
              </a> (ISBN: 9780393932195)</p>
              <img itemprop="image" src="https://upload.wikimedia.org/wikipedia/en/6/65/ThingsFallApart.jpg"/>
          </div>

          </main>
          </body></html>

You can also get JSON-LD output from the drop-down menu that begins with “Microdata”. The resulting JSON-LD follows.

JSON-LD markup generated by Google Structured Data Markup Helper

Go ahead and click Back to tagging and play around with adding other fields.

Although the Structured Data Markup Helper covers only a small subset of Schema.org and doesn’t yet support RDFa output, it is a great way to get started. By working with it, you will find that writing your own structured data will begin to seem less of a mystery.

Validating structured data

No matter how long you’ve been writing structured data, you still need to be able to validate the data and make sure there are no errors.

Problems with Schema.org can be hard to catch because of their subtle effects. For example, it might take longer to realize that rich snippets for products you’re selling are not showing up correctly in a search engine’s results, because the search engine results are third-party sites.

Luckily, there are tools to help you process structured data embedded in HTML. These tools give you a clear picture of what your structured data is saying, and help to ensure you’ve made no errors. Let’s look at the Yandex Structured Data Validator.

At the Validator page, I opt to “enter HTML code fragment here.” That reveals a text area where I paste in the full book club RDFa from Part 2 of this series. See Figure 6.

Figure 6. Yandex Structured Data Validator
Yandex structured data validator with no errors

There are no listed warnings or errors. However, if I change property="name" on the second line to property="title", I get a warning in the results section. Figure 7 shows this warning.

Figure 7. Schema.org warning
Yandex structured data validator with Schema.org warnings

The Yandex validator incorporates the expectations documented for Schema.org, and recognizes that title is not a valid property short name for an Organization. Similarly, if you used the British spelling Organisation in the HTML, it would give you a warning that http://schema.org/Organisation is unknown in Schema.org. This is exactly the sort of problem that could be overlooked by humans, because the word is actually correctly spelled according to one custom. Schema.org uses a different spelling custom, however.

If you have a problem in the syntax that is sufficiently fundamental, say misspelling the vocab attribute or missing its equal sign, the tool pretty much gives up with an error such as this:
Microformats not detected =(

The Yandex validator does have some limitations, however. For example, I tried to validate the HTML from Part 3 of this series, where a book is also marked up as a product. The first line is as follows.

<div vocab="http://schema.org/" typeof="Product Book">
          

This confused the Yandex validator, which treated the resource strictly as a Product, giving the warnings shown in Figure 8.

Figure 8. Yandex Structured Data Validator with spurious errors
Yandex structured data validator with spurious errors

The Yandex tool supports all three Schema.org formats. There is also an API available, which you can use to automate validation of your sites.

I should mention that the SDTT from Google, mentioned earlier, had no problem validating the Product/Book example that confused the Yandex validator. Nevertheless, it’s important always to use any search engine validator tools with your eyes wide open. Their main objective is to confirm the markup they will recognize for rich snippets. While validity in any given Schema.org validator is an important indicator that your site metadata is correct, you might find some areas of divergence.

Reading Schema.org from the web with Versa

Now that you’ve published Schema.org structured data on your web sites, and confirmed that it is valid, how can you go about actually using it?

There are many tools available to parse structured data from web pages, but I’ll zero in on one that I developed myself. Versa is an open source (Apache license) library for working with RDF-like data, and one of its modules supports RDFa. It requires Python 3, and if you have this set up, you can just install Versa with the following command.

pip install versa
          

You can use Versa to parse sites that use Schema.org in RDFa form. The following listing is an example that extracts all names of things from a specified page rich in structured data. The page in question is a description of all the books and other materials held by the Denver Public Library related to the author Chinua Achebe. (My regular job is publishing such pages of structured data on the web to show the many cool things libraries have to offer the public.)

Using Versa to parse sites that use Schema.org
#Import the needed code
import urllib
from versa.reader import rdfalite
#Set the web page to be parsed
site = 'http://link.denverlibrary.org/resource/FRqlF2zfz4A/'
#List to store the parsed data
triples = #Open the web page for reading over the network
fp = urllib.request.urlopen(site)
#Run the parser
rdfalite.totriples(fp.read(), triples, site)
#Empty set where results will be added
names_of_things = set()
#Loop over all data for properties that are Schema.org name
for resource, property, value in triples:
    if property == 'http://schema.org/name':
        names_of_things.add(value)

#Print the set of results
for name in names_of_things:
    print(name)

The comments should make it easy enough to follow, if you already know Python. Even if you don’t, you’ll probably be able to get the gist of it. The output — the set of resources explicitly named using Schema.org — is as follows.

Versa output
How the leopard got his claws
Achebe, Chinua
The short century : independence and liberation movements in Africa, 1945‑1994
Arrow of God
Civil peace
Achebe, Chinua ‑‑ Interviews
Morning yet on creation day : essays
Vengeful creditor
Achebe, Chinua ‑‑ Criticism and interpretation
Home and exile
No longer at ease
Anthills of the Savannah
There was a country : a personal history of Biafra
Things fall apart
Hopes and impediments : selected essays
Another Africa
The education of a British‑protected child : essays
Conversations with Chinua Achebe
Arrow of god
Girls at war and other stories
A man of the people

Conclusion

In the four parts of this series, I introduced you to the importance of structured, machine-readable data for modern web sites. In particular, I described the Schema.org data model and explained how you express it correctly in HTML. I’ve given you a sense of how Schema.org vocabularies are arranged and documented, and I’ve shown you tools for generating structured data, validating it, and parsing it from web sites.

You now have what you need to take advantage of the latest features of search engines, intelligent agents, and many other innovations on the web. Don’t be shy: feel free to experiment, because as with all technologies, getting your hands dirty in your own problem space is the best path to mastery.