Think 2021: New tools have the developer ecosystem and IBM building together Learn more

Introduction to using the JanusGraph database

This tutorial introduces you to the open source JanusGraph database and shows you how to use it with flight data from the Bee Travels project.

JanusGraph is a scalable, transactional open source graph database that enables you to store graph data in multiple storage backends and also access multiple index backends.

In JanusGraph, storage backends store data in either in-memory storage or in a separate database. You can use index backends to improve performance and properly index the graph data when a query is made. JanusGraph has native integration with the Apache TinkerPop graph stack. From this stack, you can interact with the JanusGraph database by using the Gremlin query language. Gremlin is a graph traversal language that is used to retrieve and modify data in a graph.

JanusGraph is best suited for data that focuses on data relationships more so than the individual data points themselves. This, in turn, allows for more flexibility in modeling a graph structure. In addition, the scalability of JanusGraph and the storing of relationships and indexes allow for high performance.

Prerequisites

Setup

In this tutorial, you use the JanusGraph Docker image for your database and the TinkerPop Gremlin console v3.4.10 to interact with the database. The default Docker image configuration uses the Oracle Berkeley DB Java Edition storage backend and the Apache Lucene indexing backend.

  1. To set up the JanusGraph database, run:

     $ docker run -it -p 8182:8182 janusgraph/janusgraph janusgraph
    
  2. In the TinkerPop Gremlin console, modify serializer in the file apache-tinkerpop-gremlin-console-3.4.10/conf/remote.yaml to the following:

     serializer: { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}
    
  3. Once the database container is running, open up another terminal window and cd into the directory of the Gremlin console you downloaded. From the directory, run:

     $ ./bin/gremlin.sh start
    
  4. Once the Gremlin console is running, run the following command to connect to your JanusGraph database:

     gremlin> :remote connect tinkerpop.server conf/remote.yaml session
     gremlin> :remote console
    

Create your schema and index

Once your database is setup and you know what type of data to include in the database and what the structure of the graph will look like, you can define the schema for vertices, edges, and properties. If no schema is defined, JanusGraph has a DefaultSchemaMaker that defines one implicitly.

This tutorial uses the Bee Travels flight service data. The flight data consists of flights and airports which serve as our two vertex types. Flights depart from one airport and arrive at another airport. Therefore departing and arriving serve as the edges to connect the vertices. Both flights and airports have different data which is also known as properties associated with each.

Flight vertices contain the following properties:

  • id
  • source_airport_id
  • destination_airport_id
  • flight_time
  • flight duration
  • cost
  • airlines

Airport vertices contain the following properties:

  • id
  • is_hub
  • is_destination
  • type
  • country
  • city
  • latitude
  • longitude
  • gps_code
  • iata_code

After you define the schema, you can create the indexes. The indexes improve the speeds of queries especially for larger graphs. The indexes you create should be hit by the queries that you plan to use to retrieve data.

You can use a Groovy script to define your schema and create your indexes. Below is a Groovy script that defines the schema and creates indexes of the flight service of Bee Travels.

  1. Create a janus-schema.groovy file that contains the below script. You can also find the file in the Bee Travels GitHub repo: janus-schema.groovy.

     def defineSchema(graph) {
         mgmt = graph.openManagement()
    
         if (! mgmt.containsPropertyKey("object_type")) mgmt.makePropertyKey("object_type").dataType(String.class).cardinality(Cardinality.single).make()
         if (! mgmt.containsPropertyKey("id")) mgmt.makePropertyKey("id").dataType(String.class).cardinality(Cardinality.single).make()
         if (! mgmt.containsPropertyKey("name")) mgmt.makePropertyKey("name").dataType(String.class).cardinality(Cardinality.single).make()
         if (! mgmt.containsPropertyKey("is_hub")) mgmt.makePropertyKey("is_hub").dataType(Boolean.class).cardinality(Cardinality.single).make()
         if (! mgmt.containsPropertyKey("is_destination")) mgmt.makePropertyKey("is_destination").dataType(Boolean.class).cardinality(Cardinality.single).make()
         if (! mgmt.containsPropertyKey("type")) mgmt.makePropertyKey("type").dataType(String.class).cardinality(Cardinality.single).make()
         if (! mgmt.containsPropertyKey("city")) mgmt.makePropertyKey("city").dataType(String.class).cardinality(Cardinality.single).make()
         if (! mgmt.containsPropertyKey("country")) mgmt.makePropertyKey("country").dataType(String.class).cardinality(Cardinality.single).make()
         if (! mgmt.containsPropertyKey("latitude")) mgmt.makePropertyKey("latitude").dataType(Float.class).cardinality(Cardinality.single).make()
         if (! mgmt.containsPropertyKey("longitude")) mgmt.makePropertyKey("longitude").dataType(Float.class).cardinality(Cardinality.single).make()
         if (! mgmt.containsPropertyKey("gps_code")) mgmt.makePropertyKey("gps_code").dataType(String.class).cardinality(Cardinality.single).make()
         if (! mgmt.containsPropertyKey("iata_code")) mgmt.makePropertyKey("iata_code").dataType(String.class).cardinality(Cardinality.single).make()
         if (! mgmt.containsPropertyKey("source_airport_id")) mgmt.makePropertyKey("source_airport_id").dataType(String.class).cardinality(Cardinality.single).make()
         if (! mgmt.containsPropertyKey("destination_airport_id")) mgmt.makePropertyKey("destination_airport_id").dataType(String.class).cardinality(Cardinality.single).make()
         if (! mgmt.containsPropertyKey("flight_time")) mgmt.makePropertyKey("flight_time").dataType(Integer.class).cardinality(Cardinality.single).make()
         if (! mgmt.containsPropertyKey("flight_duration")) mgmt.makePropertyKey("flight_duration").dataType(Float.class).cardinality(Cardinality.single).make()
         if (! mgmt.containsPropertyKey("cost"))  mgmt.makePropertyKey("cost").dataType(Float.class).cardinality(Cardinality.single).make()
         if (! mgmt.containsPropertyKey("airlines")) mgmt.makePropertyKey("airlines").dataType(String.class).cardinality(Cardinality.single).make()
    
         if(!mgmt.containsVertexLabel("airport")) mgmt.makeVertexLabel("airport").make()
         if(!mgmt.containsVertexLabel("flight")) mgmt.makeVertexLabel("flight").make()
    
         if(!mgmt.containsEdgeLabel("departing")) mgmt.makeEdgeLabel("departing").multiplicity(MULTI).make()
         if(!mgmt.containsEdgeLabel("arriving")) mgmt.makeEdgeLabel("arriving").multiplicity(MULTI).make()
    
         id_ = mgmt.getPropertyKey("id")
         city = mgmt.getPropertyKey("city")
         country = mgmt.getPropertyKey("country")
         iata_code = mgmt.getPropertyKey("iata_code")
         object_type = mgmt.getPropertyKey("object_type")
    
         airport = mgmt.getVertexLabel("airport")
         flight = mgmt.getVertexLabel("flight")
    
         mgmt.buildIndex("vertex_by_object_type", Vertex.class).addKey(object_type).buildCompositeIndex()
         mgmt.buildIndex("airport_by_id", Vertex.class).addKey(id_).indexOnly(airport).buildCompositeIndex()
         mgmt.buildIndex("airport_by_city_country_code",   Vertex.class).addKey(city).addKey(country).addKey(iata_code).indexOnly(airport).buildCompositeIndex()
         mgmt.buildIndex("airport_by_city_country", Vertex.class).addKey(city).addKey(country).indexOnly(airport).buildCompositeIndex()
         mgmt.buildIndex("airport_by_city_code", Vertex.class).addKey(city).addKey(iata_code).indexOnly(airport).buildCompositeIndex()
         mgmt.buildIndex("airport_by_country_code", Vertex.class).addKey(country).addKey(iata_code).indexOnly(airport).buildCompositeIndex()
         mgmt.buildIndex("airport_by_city", Vertex.class).addKey(city).indexOnly(airport).buildCompositeIndex()
         mgmt.buildIndex("airport_by_country", Vertex.class).addKey(country).indexOnly(airport).buildCompositeIndex()
         mgmt.buildIndex("airport_by_code", Vertex.class).addKey(iata_code).indexOnly(airport).buildCompositeIndex()
    
         mgmt.commit()
     }
    
  2. After your janus-schema.groovy file is created, load it into the Gremlin console using the following command:

     gremlin> :load <PATH>/janus-schema.groovy
     gremlin> defineSchema(graph)
    
  3. You can confirm your schema and the indexes that were created for your graph by running the following command:

     gremlin> mgmt = graph.openManagement()
     gremlin> mgmt.printSchema()
     gremlin> mgmt.commit()
    

    The expected output should look like the following:

     gremlin> mgmt.printSchema()
     ==>------------------------------------------------------------------------------------------------
     Vertex Label Name              | Partitioned | Static                                             |
     ---------------------------------------------------------------------------------------------------
     airport                        | false       | false                                              |
     flight                         | false       | false                                              |
     ---------------------------------------------------------------------------------------------------
     Edge Label Name                | Directed    | Unidirected | Multiplicity                         |
     ---------------------------------------------------------------------------------------------------
     departing                      | true        | false       | MULTI                                |
     arriving                       | true        | false       | MULTI                                |
     ---------------------------------------------------------------------------------------------------
     Property Key Name              | Cardinality | Data Type                                          |
     ---------------------------------------------------------------------------------------------------
     flight_duration                | SINGLE      | class java.lang.Float                              |
     cost                           | SINGLE      | class java.lang.Float                              |
     airlines                       | SINGLE      | class java.lang.String                             |
     object_type                    | SINGLE      | class java.lang.String                             |
     id                             | SINGLE      | class java.lang.String                             |
     name                           | SINGLE      | class java.lang.String                             |
     is_hub                         | SINGLE      | class java.lang.Boolean                            |
     is_destination                 | SINGLE      | class java.lang.Boolean                            |
     type                           | SINGLE      | class java.lang.String                             |
     city                           | SINGLE      | class java.lang.String                             |
     country                        | SINGLE      | class java.lang.String                             |
     latitude                       | SINGLE      | class java.lang.Float                              |
     longitude                      | SINGLE      | class java.lang.Float                              |
     gps_code                       | SINGLE      | class java.lang.String                             |
     iata_code                      | SINGLE      | class java.lang.String                             |
     source_airport_id              | SINGLE      | class java.lang.String                             |
     destination_airport_id         | SINGLE      | class java.lang.String                             |
     flight_time                    | SINGLE      | class java.lang.Integer                            |
     ---------------------------------------------------------------------------------------------------
     Vertex Index Name              | Type        | Unique    | Backing        | Key:           Status |
     ---------------------------------------------------------------------------------------------------
     vertex_by_object_type          | Composite   | false     | internalindex  | object_type:    ENABLED |
     airport_by_id                  | Composite   | false     | internalindex  | id:           ENABLED |
     airport_by_city_country_code   | Composite   | false     | internalindex  | city:         ENABLED |
                                |             |           |                | country:      ENABLED |
                                |             |           |                | iata_code:    ENABLED |
     airport_by_city_country        | Composite   | false     | internalindex  | city:         ENABLED |
                                |             |           |                | country:      ENABLED |
     airport_by_city_code           | Composite   | false     | internalindex  | city:         ENABLED |
                                |             |           |                | iata_code:    ENABLED |
     airport_by_country_code        | Composite   | false     | internalindex  | country:      ENABLED |
                                |             |           |                | iata_code:    ENABLED |
     airport_by_city                | Composite   | false     | internalindex  | city:         ENABLED |
     airport_by_country             | Composite   | false     | internalindex  | country:      ENABLED |
     airport_by_code                | Composite   | false     | internalindex  | iata_code:    ENABLED |
     ---------------------------------------------------------------------------------------------------
     Edge Index (VCI) Name          | Type        | Unique    | Backing        | Key:           Status |
     ---------------------------------------------------------------------------------------------------
     ---------------------------------------------------------------------------------------------------
     Relation Index                 | Type        | Direction | Sort Key       | Order    |     Status |
     ---------------------------------------------------------------------------------------------------
    

Load data into the database

After your schema and indexes are confirmed, you can now load data into the database to populate the graph. There are different ways to populate the graph, including:

  • Manually through the Gremlin Console
  • Using a Groovy script
  • In code using a Gremlin driver for Java, Python, JavaScript, and other languages

For the purpose of this tutorial, I show you how to load the data using the Gremlin console, but if you are interested in loading a larger data set using a Python driver, run this script.

Let’s say a flight departs from LAX airport and arrives at JFK airport. You can add this data to the graph by running the following:

gremlin> lax = g.addV('airport').property('id', '9600276f-608f-4325-a037-f185848f2e28').property('name', 'Los Angeles International Airport').property('is_hub', true).property('is_destination', true).property('type', 'large_airport').property('country', 'United States').property('city', 'Los Angeles').property('latitude', 33.94250107).property('longitude', -118.4079971).property('gps_code', 'KLAX').property('iata_code', 'LAX').next()
gremlin> jfk = g.addV('airport').property('id', 'ebc645cd-ea42-40dc-b940-69456b64d2dd').property('name', 'John F. Kennedy International Airport').property('is_hub', true).property('is_destination', true).property('type', 'large_airport').property('country', 'United States').property('city', 'New York').property('latitude', 40.63980103).property('longitude', -73.77890015).property('gps_code', 'KJFK').property('iata_code', 'JFK').next()
gremlin> f1 = g.addV('flight').property('id', 'e7c3d85d-c523-4634-93ef-a84f55aeb1e5').property('source_airport_id', '9600276f-608f-4325-a037-f185848f2e28').property('destination_airport_id', 'ebc645cd-ea42-40dc-b940-69456b64d2dd').property('flight_time', 345).property('flight_duration', 324.384941546607).property('cost', 584.833849847819).property('airlines', 'MilkyWay Airlanes').next()
gremlin> g.addE('departing').from(f1).to(lax)
gremlin> g.addE('arriving').from(f1).to(jfk)

This creates a flight vertex that connects to:

  • an airport vertex for LAX by a departing edge
  • an airport vertex for JFK by an arriving edge

Queries

Queries for JanusGraph databases are a graph traversal with different steps that can be chained together to find and retrieve the desired data.

Let’s say we want to get the data associated with the LAX airport. Run the following query to get the data:

gremlin> g.V().and(hasLabel('airport'), has('iata_code','LAX')).valueMap()

The query specifies that you are looking for a vertex that has both a label of airport and a property iata_code with the value LAX. valueMap() returns a map representation of the properties of the vertex.

To verify your query is hitting the index you created previously, add .profile() to the end of the query. The expected output is as follows:

gremlin> g.V().and(hasLabel('airport'), has('iata_code','LAX')).valueMap().profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[~label.eq(airport), iata_cod...                     1           1           1.376    76.75
    \_condition=(~label = airport AND iata_code = LAX)
    \_orders=[]
    \_isFitted=true
    \_isOrdered=true
    \_query=multiKSQ[1]@4005
    \_index=airport_by_code
  optimization                                                                                 0.032
  optimization                                                                                 0.315
PropertyMapStep(value)                                                 1           1           0.417    23.25
                                            >TOTAL                     -           -           1.793        -

Notice the line that has \_index=airport_by_code. This confirms that your query is hitting one of the indexes that was created earlier.

Congratulations

You have successfully setup a JanusGraph database, defined the schema and indexes, loaded the data into the database, and queried the database for specific data.