In the previous parts of this series, I introduced Schema.org, described its abstract model for machine-readable information on the web, and identified the three alternative syntaxes for expressing such information within HTML — RDFa, Microdata, and JSON-LD.

Using one of these syntaxes, you can set a vocabulary based on the topic of your page. Almost all Schema.org vocabularies use the URL stem http://schema.org/. These shared conventions and contributions from communities of interest allow web publishers to use more widely structured data on the web. They also permit developers to aggregate that data more widely.

Many of the most commonly used Schema.org vocabularies build on older formats, such the microformats I mentioned in Part 1. Others are based on the “friend-of-a-friend” vocabulary, which is a staple of RDF applications. The various vocabularies are under constant review and improvement.

Let’s take a look at Schema.org vocabularly, starting with some commonly used terms.

Describe items for sale with Schema.org

Because e-commerce is so popular, the Schema.org vocabulary areas that are related to describing products and offers are popular vocabularies. These vocabularies add machine-readable data to product or consumer websites, and the data adds inline annotations to search results on search engines like Google, Bing, and Yandex.

For e-commerce sites, there are two primary classes:

  • http://schema.org/Product
  • http://schema.org/Offer

Figure 1 shows a screenshot from the documentation page for http://schema.org/Product. Because the page is so long it has been cropped and shows fewer than a quarter of the properties actually defined for the Produce type. The properties range from the basic (names, descriptions, associated brand names, and so on) to the specialized (Global Trade Item Number (GTIN), the set of standards underlying Universal Product Codes (UPC), and bar codes).

Figure 1. The product class
Portion of Schema.org Product class documentation page

You’ll remember from the previous article that you express such properties through the HTML element hierarchy, or in JSON-LD sections. For each property, Schema.org specifies the type of acceptable values. In many cases, the value is just plain text, and it’s up to you to figure out what makes sense based on what you’re trying to communicate. There are other types of simple data type resource values, such as Number or DateTime, and in some cases the values are derived from the Thing class, in which case the values must be URL references. You can often choose from multiple types of resources, or literal values, when specifying properties.

Occasionally, Schema.org vocabularies can overlap and be confusing. In the above image, the values of the category property can be PhysicalActivityCategory or Text or Thing. However, the PhysicalActivityCategory derives from a health and life sciences extension to Schema.org and is a fairly arbitrary and limited set of items for sale.

This discrepancy is because descriptions in Schema.org are kept as unified as possible, while classes and properties are sometimes reused to refer to specific areas of interest. Because of this, you will sometimes find unusual concepts pulled into the mainstream vocabulary description. Schema.org is meant to be pretty informal, so trust your instincts to guide you to the right usage of classes and properties.

Sometimes you’ll have a choice between using a structured framework of values or just using informal text strings. Having plain text categories is much like tagging in social media. We’ll call this collection of informal tags a “folksonomy,” to distinguish it from a more formal taxonomy.

If you click on one of the properties in the left column, you get more detail about it. For example, click on aggregateRating. You get the following page:

Figure 2. The aggregateRating property
Portion of Schema.org aggregateRating property documentation page

At the bottom of the page, you can see examples to help you get started. Almost all Schema.org class and property definitions show these examples, and you can select whether you want to see a Microdata, RDFa, or JSON-LD example.

Book sale example

Listing 1 shows the sample description of a book, Things Fall Apart, that’s for sale on the web.

Listing 1. Book product/offer
<div vocab="http://schema.org/" typeof="Product">
  <img property="image" alt="book cover"
src="https://images.betterworldbooks.com/039/Things‑Fall‑Apart‑Achebe‑Chinua‑9780393932195.jpg" />
  <a property="url" href="https://www.betterworldbooks.com/product/detail/Things‑Fall‑Apart‑9780393932195">
    <span property="name">Things Fall Apart</span>
  </a>
  <div property="offers" typeof="Offer">
    <span property="priceCurrency" content="USD">$</span>
    <span property="price" content="8.48">8.48</span>
    (<span property="itemCondition" href="UsedCondition">used</span>,
    <span property="offerCount">2</span> available)
  </div>
</div>

The listing shows a basic description of the product, a name and an image, and a collection of offers for that product. The offer is described by:

  • The price
  • The price currency
  • The fact that the item is used
  • The fact that two of the items are available

Enumerations

The condition of the offered book is a property, itemCondition, whose value has several specific, recognized values in Schema.org. Called an enumeration, this is defined as a particular sort of class, OfferItemCondition. The four members of this particular enumeration are:

  • DamagedCondition
  • NewCondition
  • RefurbishedCondition
  • UsedCondition

Another property for an Offer with enumerated values is availability. The enumeration class is ItemAvailability, and expected values are:

  • Discontinued
  • InStock
  • InStoreOnly
  • LimitedAvailability
  • OnlineOnly
  • OutOfStock
  • PreOrder
  • PreSale
  • SoldOut

The property itemCondition can be used on either an Offer or a Product, but availability is expected only on an Offer.

Mixing in additional vocabularies

Sometimes you’ll find bits of other vocabularies mixed in. For instance, the commercial description sections of Schema.org originated in another vocabulary project called GoodRelations. There are still several areas where Schema.org specifies GoodRelations terms. For example, an Offer resource can have a value called availableDeliveryMethod. This is an enumeration whose values are all still GoodRelations terms, for example:

  • http://purl.org/goodrelations/v1#DeliveryModeDirectDownload
  • http://purl.org/goodrelations/v1#DeliveryModeMail
  • http://purl.org/goodrelations/v1#DeliveryModePickUp
  • http://purl.org/goodrelations/v1#FederalExpress

The following snippet from the book sale description is modified to illustrate the most direct way you can express this in RDFa.

<div property="offers" typeof="Offer">
              <span property="priceCurrency" content="USD">$</span>
              <span property="price" content="8.48">8.48</span>
              (<span property="itemCondition" href="UsedCondition">used</span>,
              <span property="offerCount">2</span> available)
              <link property="availableDeliveryMethod" href="http://purl.org/goodrelations/v1#DeliveryModeMail">
            </div>

The added line is highlighted. The value of availableDeliveryMethod is set to the full GoodRelations-based URL.

Note the use of the link element to provide the property’s value from enumeration. This approach should be used to specify enumeration values, or any other precise reference to a URL value from the Schema.org specification. Since there is no anchor text, the link won’t actually result in any display to the user; it is there for a machine to read only. You would just put it next to nearby content.

Another way to express a vocabulary URL is with a different stem from http://schema.org/. It involves an RDFa attribute I haven’t yet covered: prefix.

<div vocab="http://schema.org/" prefix="gr: http://purl.org/goodrelations/v1#" typeof="Product">
            …
            <div property="offers" typeof="Offer">
              <span property="priceCurrency" content="USD">$</span>
              <span property="price" content="8.48">8.48</span>
              (<span property="itemCondition" href="UsedCondition">used</span>,
              <span property="offerCount">2</span> available for shipping by post
              <link property="availableDeliveryMethod" href="gr:DeliveryModeMail">)
            </div>
          </div>

The prefix attribute associates the abbreviation gr: with the stem of the GoodRelations URL. You can then write an abbreviated form by adding the tail of the URL, that is, gr:DeliveryModeMail. Note that all such abbreviations must use a colon as a separator. This trick will come in handy when you are mixing many different varieties of URLs and perhaps if you are mixing your own vocabularies in with Schema.org.

Combining classes

In this example, the item for sale is really at least two things: a product and a book. The parties involved in selling and buying the book think of its price and shipping details. The reader thinks of its title, author, and page count. These roles overlap, of course. You might search an online bookstore for the most affordable book by an author you just learned of.

This common scenario highlights how things can belong to multiple classes, and Schema.org provides ready support for such cases. The following version of the book-for-sale HTML shows the combination of classes.

Listing 2. Book as product, using both classes
<div vocab="http://schema.org/" typeof="Product Book">
  <img property="image" alt="book cover"
src="https://images.betterworldbooks.com/039/Things‑Fall‑Apart‑Achebe‑Chinua‑9780393932195.jpg" />
  <a property="url" href="https://www.betterworldbooks.com/product/detail/Things‑Fall‑Apart‑9780393932195">
    <span property="name">Things Fall Apart</span>
  </a>
  <dl>
    <dt>Author</dt><dd property="author" typeof="Person">Chinua Achebe</dd>
    <dt>ISBN</dt><dd property="isbn" typeof="Person">9780393932195</dd>
  </dl>
  <div property="offers" typeof="Offer">
    <span property="priceCurrency" content="USD">$</span>
    <span property="price" content="8.48">8.48</span>
    (<span property="itemCondition" href="UsedCondition">used</span>,
    <span property="offerCount">2</span> available)
  </div>
</div>

The attribute typeof="Product Book" specifies the resource as being of two types at the same time, separated by a space. When you specify two types, you can then use properties associated with both types. The highlighted lines of text in Listing 2 show property information for both products and books.

Inherited properties

As I pointed out in Part 2, the Schema.org Book class derives from the CreativeWork class. This means that by virtue of inheritance, any Book instance can have the properties derived for its base class, CreativeWork, and also Thing. As a convenience, the Schema.org documentation for any given class includes the properties for the base classes. Figure 2 shows a screenshot from the Book class page, which illustrates this point. You can see that the first six properties are specific to Book. Following them is a section clearly marked for properties from CreativeWork. Further down on the actual page is another section marked for properties from Thing.

Figure 3. The Book class
Portion of Schema.org Book class documentation page

So far I’ve dealt mostly with simple, descriptive text and enumerations. As you can imagine, however, some things in Schema.org have to be expressed in formalized ways. Let’s have a look at how you do that.

Data typing

Having data values in regular formats is an important part of making content machine-readable. Data comes in as a string from the HTML, but shared conventions of formatting are key to rich data types. For example, when the Schema.org documentation of a property’s value type says Integer, you would not want to set 1.5 as a value because the number has a fractional part and is not a valid integer. The documentation is not always very clear about data type details. For the most part, however, you can expect conventions similar to those of your favorite programming language.

Overlaying human-readable and machine-readable data

The rigorous formats needed for machine-readable versions of data are not always so friendly for humans. Your web pages are, after all, still meant for humans. Many times with Schema.org, you will have literal element text for people, with tagging to provide the machine-readable version as metadata.

Here’s a modified snippet from the book sale example.

  <div property="offers" typeof="Offer">
              <span property="priceCurrency" content="USD">$</span>
              <span property="price" content="10">ten</span>
            </div>

In this example, the machine-readable currency is presented as a three-letter code from the ISO 4217 standard, while the human-readable currency is presented with the familiar dollar sign, $. The price amount is specified in numerical terms, but the page presents the number as English text.

So far, I’ve presented price details as direct properties on the offer, but you can also bundle them into a PriceSpecification resource. This can be useful for reflecting discount periods. Consider a holiday sale.

<div property="offers" typeof="Offer">
            <div property="priceSpecification" typeof="PriceSpecification">
              <span property="priceCurrency" content="USD">$</span>
              <span property="price" content="12.5">12.50</span>
              <meta property="validFrom" content="2018‑12‑25">
            </div>
            <div property="priceSpecification" typeof="PriceSpecification">
              <strong>
              Or just
              <span property="price" content="10">ten</span>
              <span property="priceCurrency" content="USD">dollars</span>
              <span property="validTo" content="2018‑12‑24T11:59:59">until midnight Christmas Eve!</span>
              </strong>
            </div>
          </div>

Here you have a sale price for a limited time, with the normal price marked to take effect afterward.

The validTo date for the discounted price again shows the overlay of human- and machine-readable data. The element’s body uses the English expression “until midnight Christmas Eve,” while the content attribute uses the ISO 8601 standard format for that precise time and date.

The validFrom date has no human-readable content, but has the ISO 8601 data in the attribute. Because it has no human-readable content, I use a meta tag, which is the preferred approach in Schema.org. If you have machine-readable data, but either the content has no obvious place to overlay it, or the HTML used for that content doesn’t provide a natural syntax for doing so, you use a link or meta tag. This tag should be placed as close as possible to the relevant context. I discussed using link above when the property’s value is an enumeration. If it’s not an enumeration, you use meta, as in this last listing.

Sometimes you will want to provide detailed metadata about material that is not presented in text, for example iamges or embedded scripts and media objects. In this case, the actual details shown in the browser are loaded from another file, so there is no way to provide the sorts of inline Schema.org markup that’s possible for simple textual content. In such cases, you’ll use link or meta tags.

Conclusion

This article discusses some of the important aspects of Schema.org vocabularies, as well as how to use its documentation. Now you are familiar with the syntax for expressing things as machine-readable data as well as the most common conventions for doing so within a particular area of interest.

So, how can you be sure that the Schema.org syntax you carefully code into your web pages is correct, both in syntax and vocabulary? And what sorts of tools are available to help you effectively use Schema.org? In the next and final part of this series, I show you how to validate pages using Schema.org, and discuss other practical considerations to keep in mind as you use Schema.org on your web pages.