This post is part of a series of posts created by the two newest members of our Developer Advocate team here at IBM Cloud Data Services. In honour of the book Seven Databases in Seven Weeks by Eric Redmond and Jim R. Wilson, we challenged Lorna and Matt to take a new database from our portfolio every day, get it set up and working, and write a blog post about their experiences. Each post reflects the story of their day with a new database. We’ll update our seven-days GitHub repo with example code as the series progresses. —The Editors

IBM Graph logo
“Edge, shoulders, knees and nodes”: it’s IBM Graph!
  • Database type: Graph database
  • Best tool for: Representing and querying relationships that involve connections between people, places and things

Overview

IBM Graph is a cloud solution based on the technology of Apache Tinkerpop™. It offers some great functionality for storing complex, related data and for querying that data in a powerful and performant way. This article will give an overview of how to begin using IBM Graph.

One thing that can be strange for graph newcomers is the nomenclature. To get you started, here’s a quick guide:

  • Graph: A set of vertices and edges. A database can contain many graphs.
  • Schema: Definition of the types of data that will be represented. This helps the database determine what should be indexed.
  • Vertex: (plural: vertices) A node in a graph. A vertex has a label and often some other properties.
  • Edge: A link that joins two vertices. A vertex has direction and can also have its own type and properties that relate to the relationship between the two vertices.

Getting Started

Start by adding an instance of IBM Graph to your Bluemix account. Once provisioned, you will see that your dashboard includes three key pieces of information that you will need to connect to the database:

  • apiURL
  • username
  • password
Finding your service credentials in the IBM Graph UI
Looking for service credentials?

There are two main ways of authenticating against IBM Graph. The simplest way is to use your username and password and access the service using HTTP Basic Authentication; however, this approach is rate-limited. The recommended method is to acquire an access token and use that, as it does not have the same limitations.

To acquire an access token, we simply make a call to the /_session endpoint using HTTP Basic Authentication. Note that the URL does not include the default /g/ segment that refers to the graph that was created for us by default. To make this command from curl we would do:

curl --user username:password https://ibmgraph-alpha.eu-gb.bluemix.net/7415c5d6-a80a-4ffe-896a-1a1dc7a81d21/_session

The result contains a gds-token which will be our access token for all the other curl requests we make. We will pass it as an Authorization header with the format:

Authorization: gds-token [token]

Now that we’re authenticated, we can create a graph to work with and a schema to describe the data we’ll be storing.

For the sake of clarity, we use IBM Graph’s simplified API in the examples that follow. For more complex queries or for creating a large graph with bulk operations, check out Graph’s Gremlin API and bulk input API. Graph’s simplified API is not the recommended way to use the service for anything beyond trying out basic querying and database CRUD. The /gremlin endpoint is what developers should use in their projects.

Create a Graph

Creating a graph is as simple as making a POST request to the endpoint /_graphs to create a new graph. You can specify an ID for your graph, but if you don’t, the service will simply generate a unique identifier. The curl request looks like this:

curl -H "Content-Length: 0" -X POST -H "Authorization: gds-token MGEzYTgyYmUtNjlhMi00OTljLWIxNTAtNmNhYmY3M2ZjOGJmOjE0NjU4OTU4ODkyNzI6aWtTU1B1Njg3VEk5cjFkb3RWR3RhOXM4ajMrcE5aZU9VYmh2eE5tTk1JMD0=" -v https://ibmgraph-alpha.eu-gb.bluemix.net/7415c5d6-a80a-4ffe-896a-1a1dc7a81d21/_graphs

Note the use of Content-Length: 0 as an additional header here. By default cURL won’t set this header since we’re not actually sending any body data. Without this header, you’ll see a 411 Length Required response.

The response contains both a graphId and a dbUrl, which identify the new graph that has been created. We’ll use the dbUrl in creating the schema for our graph and in adding vertices and edges.

Design a Schema

Before adding vertices and edges to the graph, it is good practice to design the schema and apply it. This step requires a bit of thinking and possibly the use of a whiteboard, as we consider the types of data and relationships that our system will represent.

As an example, we’re going to represent a group of friends (or rather, random humans where some of them know one another), and the various interests/hobbies of those people. Once we’ve collected this data and added it, we’ll be able to answer questions such as:

  • Which new human might I like to be friends with?
  • What hobbies are my friends enjoying that I should try?

To represent this data, we’ll need two types of vertices: one is people, and the other is interests. And two types of edges: one showing that someone is friends with someone else, and another showing that a person is interested in a particular hobby. Therefore, we design a schema that reflects these relationships.

Here’s our schema, which we’ll store in a file called schema.json:

{
  "propertyKeys": [
    {"name": "personName", "dataType": "String", "cardinality": "SINGLE"},
    {"name": "interestName", "dataType": "String", "cardinality": "SINGLE"}
  ],
  "vertexLabels": [
    {"name": "person"},
    {"name": "interests"}
  ],
  "edgeLabels": [
    { "name": "likes" },
    { "name": "friendsWith" }
  ],
  "vertexIndexes": [
    {"name": "vByPersonName", "propertyKeys": ["personName"], "composite": true, "unique": false},
    {"name": "vByInterestName", "propertyKeys": ["interestName"], "composite": true, "unique": false}
  ]
}

We can apply this schema to our graph by using the dbUrl given when we created a new graph, and POSTing this data to the /schema endpoint. The cURL command looks like this:

curl -H "Authorization: gds-token [token]" -H 'Content-Type: application/json' https://ibmgraph-alpha.eu-gb.bluemix.net/7415c5d6-a80a-4ffe-896a-1a1dc7a81d21/19ff504c-3d33-438b-a275-f7994c9c471f/schema --data @schema.json

This posts a new schema to the graph, which we can then inspect the results by making a GET request to the same endpoint.

curl -H "Authorization: gds-token [token]" -sS https://ibmgraph-alpha.eu-gb.bluemix.net/7415c5d6-a80a-4ffe-896a-1a1dc7a81d21/19ff504c-3d33-438b-a275-f7994c9c471f/schema

With the schema set up, we can start to add people and their interests.

Shaping the Graph

Here’s a representation of the data we collected and the schema our graph will represent. It’s a tiny example, for clarity’s sake. In practice, the best applications of IBM Graph and similar tools is on very large (“big”?!) data sets where the number of connections makes it impossible to calculate by hand and slow/heavy to use other storage approaches such as a traditional RDBMS.

Graph content

Our example is a few acquainted people and their various interests. Again, we have the two types of vertices: a person and an interest. We also have two types of edges: a paler “friendsWith” edge that links two people together and another orange-with-an-arrow “likes” edge that links people to their interests. You’ll see this distinction as we move on to adding these elements to our graph using curl commands.

Add People Vertices

First off, we’ll add our people vertices. To do this, we need to POST to the /vertices API endpoint, supplying the label we want (person) and the properties (personName) that we want to apply to this new vertex, as shown below:

curl -X POST -H "Authorization: gds-token [token]" -H 'Content-Type: application/json' https://ibmgraph-alpha.eu-gb.bluemix.net/7415c5d6-a80a-4ffe-896a-1a1dc7a81d21/19ff504c-3d33-438b-a275-f7994c9c471f/vertices -d '{"label": "person", "properties": {"personName": "Dave"}}'

You will get some JSON back in the response that, amongst other things, will tell you the unique ID of this vertex. In this instance, our ID is 4160:

{  
  "requestId": "7bf1c9dd-2da5-4616-ba93-f28d1425c9d7",
  "status": {  
    "message": "",
    "code": 200,
    "attributes": {

    }
  },
  "result": {  
    "data": [
      {  
        "id": 4160,
        "label": "person",
        "type": "vertex",
        "properties": {
          "personName": [  
            {  
              "id": "16w-37k-27t1",
              "value": "Dave"
            }
          ]
        }
      }
    ],
    "meta": {

    }
  }
}

To recap what we just did — we added a new vertex with a label field set to person. Our new vertex has a property called personName, which we have set to Dave. When this vertex was created, we received a unique ID in return, which is 4160.

Now we need to do this again, adding person vertices for a few more people. We did this for Colin, Emma, Jenny and Craig. Make sure to note the ID for each of these vertices as you add them.

Add Interest Vertices

Now that we have some people in our graph, we’ll add a few interest vertices. This process is identical to adding a person, with the exception of the values for the label and properties fields.

curl -X POST -H "Authorization: gds-token [token]" -H 'Content-Type: application/json' https://ibmgraph-alpha.eu-gb.bluemix.net/7415c5d6-a80a-4ffe-896a-1a1dc7a81d21/19ff504c-3d33-438b-a275-f7994c9c471f/vertices -d '{"label": "interest", "properties": {"interestName": "books"}}'

Here we are saying that the label is of type interest, and we want to set a property of interestName to the value books. Add more interest vertices in the same way for music, tennis and films. Remember to record the unique ID of each of these vertices!

Finding Our Vertices

To prove that we have successfully added our vertices, we can retrieve them from the graph using the GET /vertices/<id> endpoint. For any of the IDs you have stashed away from the previous steps, you can do the following:

curl -H "Authorization: gds-token [token]" -sS https://ibmgraph-alpha.eu-gb.bluemix.net/7415c5d6-a80a-4ffe-896a-1a1dc7a81d21/19ff504c-3d33-438b-a275-f7994c9c471f/vertices/4160

This simple query will return the JSON of the requested vertex. Try it on a few of your IDs to prove that they exist.

Connecting the Dots

We have a graph that contains a number of people and a number of interests, but at the moment there is not much going on. We need to add relationships, or edges, between our vertices.

Edges are the connections between vertices, and they are directional. This means that when we create an edge, we have to specify where it came from and where it is going to — this matters! We also want to describe what this relationship is.

In the curl example below, we are creating an edge between Dave (ID: 4160) and Colin (ID: 4240). We want it to be known that Dave is Friends with Colin, so we have applied the friendsWith label using the /edges endpoint.

curl -X POST -H "Authorization: gds-token [token]" -H 'Content-Type: application/json' https://ibmgraph-alpha.eu-gb.bluemix.net/7415c5d6-a80a-4ffe-896a-1a1dc7a81d21/19ff504c-3d33-438b-a275-f7994c9c471f/edges -d '{ "outV": 4160, "label": "friendsWith", "inV": 4240 }'

Friendship is a two-way street, but currently this relationship is directional, meaning that Dave is friends with Colin but as yet, Colin is not friends with Dave — poor Dave! To remedy this, we need to add another edge between Dave and Colin, but reverse the IDs so that we create the same relationship, but in the other direction.

You can also create edges between people and interests, using the likes label. Try creating the following edges using the IDs from earlier:

  • Dave friendsWith Colin
  • Colin friendsWith Dave
  • Dave friendsWith Jenny
  • Jenny friendsWith Dave
  • Jenny friendsWith Emma
  • Emma friendsWith Jenny
  • Emma friendsWith Craig
  • Craig friendsWith Emma
  • Dave likes tennis
  • Dave likes films
  • Emma likes films
  • Craig likes tennis
  • Craig likes music

Once this is done, our graph will match the image from above! Now we can start to query it.

Here, we have encountered the graph theory notion of “directed” vs. “undirected” graphs. In an undirected graph, edges are bidirectional. Friendships, in our example, are bidirectional. In a directed graph, edges run in one direction, like the interests in our example. Since we are mixing the two ideas together, we need to represent each friendship by creating two complementary directed edges. Think Like (a) Git provides a concise explanation with images.

Does Anyone Want To Be Friends?

There are lots of ways that we can use the power of IBM Graph to query our data, but we’re going to ease in slowly with a simple example. We met Dave earlier. Dave likes tennis and films, and would like to make more friends who like the same things.

Using the POST /gremlin endpoint, we can traverse the graph to find other people who share the same interests as Dave. We’ll provide the curl example in a minute, but first let’s take a look at the query part in isolation:

graph.traversal().V().has("personName", "Dave").out("likes").in("likes").has("personName", without("Dave"))

If we were to break down this query, we can see what is going on:

# Traverse the graph
# Find each Vertex that has a personName property set to 'Dave'
graph.traversal().V().has("personName", "Dave")

# Find all connected vertices that are connected via an outward 'likes' edge (i.e., Dave's interests)
.out("likes")

# Find all vertices that are connected to these interests via an inward 'likes' edge (i.e., other people who also like Dave's interests)
.in("likes")

# Of these vertices, find all that have a personName (i.e., are a person), but where that name is not Dave
.has("personName", without("Dave"))

If we put that all together, we can hit the POST /gremlin endpoint like so:

curl -X POST -H "Authorization: gds-token [token]" -H 'Content-Type: application/json' https://ibmgraph-alpha.eu-gb.bluemix.net/7415c5d6-a80a-4ffe-896a-1a1dc7a81d21/a5461ea4-a3b8-4a1b-92a9-eac3e97d8e95/gremlin -d '{"gremlin": "graph.traversal().V().has(\"personName\", \"Dave\").out(\"likes\").in(\"likes\").has(\"personName\", without(\"Dave\"))"}'

Which will return results like the JSON below:

[
  {
    "id": 4272,
    "label": "person",
    "type": "vertex",
    "properties": {
      "personName": [
        {
          "id": "17a-3ao-sl",
          "value": "Emma"
        }
      ]
    }
  },
  {
    "id": 8376,
    "label": "person",
    "type": "vertex",
    "properties": {
      "personName": [
        {
          "id": "2dz-6go-sl",
          "value": "Craig"
        }
      ]
    }
  }
]

This query shows us that both Emma (films) and Craig (tennis) share similar interests to Dave — pretty cool! This example can be extended and altered to find friends-of-friends, amongst other things. So why not have a look at some of the IBM Graph documentation and examples, and have a go yourself? If you use Node.js, there’s a library our colleagues @ukmadlz and @ptitzler maintain that’s worth a look: nodejs-graph.

Conclusion

Graph databases bring a new dimension to working with data. They open up the possibilities of querying relationships on a scale that can be difficult, clunky and slow when elements are linked in a traditional RDBMS. In particular, graph databases make it easy to visualize relationships and can handle queries with a variable number of “hops” or edges between the starting vertex and the eventual results. The query language definitely has a learning curve, but understanding even just the basics of graph theory can take you a long way. (And look how far you’ve come already!)

Based on Apache Tinkerpop, IBM Graph is an easy way to start working with this technology, with the option to scale up your deployment as your data grows. If you haven’t already, head over to the IBM Bluemix catalog so you can start your trial and test our example code. Until next time!

Join The Discussion

Your email address will not be published. Required fields are marked *