Introduction

There were a lot of notes I wanted to add to my “Getting started with JanusGraph” series Part 1, Part 2 and Part 3 that were ultimately cut out to keep the series a reasonable length. As a result, I created two articles with all the off-topic snippets. Without further ado, here’s a collection of my productivity tips–as well as some of the tricks I wish I knew when I first got started.

The :help section is actually helpful

I Strongly recommend typing :help in a Gremlin console and carefully reading through all the entries. They can also be found in the Gremlin console docs. There are a lot of great features in there that come from the Groovy shell. I’ll highlight a couple below.

Clearing out the shell

If the shell gets messed up a bit, due to a bad command or mismatched quote and starts showing ......1> instead of gremlin>, you can reset the shell with: clear

gremlin> g.V().has('code', "SFO')
......1> :clear
gremlin>

Bash like history

If you type :history, you can see a numbered list of all the commands you’ve typed in the console. You can also recall commands like you can with ! in bash, but be aware that if you’re history exceeds 500 there’s a bug in Groovy that causes the index to be off by one. An example of this is shown below, where instead of displaying the help menu it’s trying to run the g.V() command instead. You can resolve this by running :history clear, but then you lose all your history, so I tend to avoid using recall.

gremlin> :history

...

513 g.V()
514 :help
515 :history
gremlin> :history recall 514
No such property: g for class: groovysh_evaluate
Type ':help' or ':h' for help.
Display stack trace? [yN]

Have JanusGraph Server bind the graph and traversal for you

If you’re running a JanusGraphFactory graph on your Gremlin Server, you can streamline the user experience by having graph.traversal() bound to g automatically.

The steps can be found in the JanusGraph Docs.

To accomplish this, create a file named scripts/janusgraph.groovy, and populate it with the contents below.

def globals = [:]
globals << [g : graph.traversal()]

Once the file has been created, it needs to be added to the gremlin-server.yaml file. It should be placed within the scripts[] array under gremlin-groovy in the scriptEngines section.

host: localhost
port: 8182
graphs: {
  graph: conf/janusgraph.properties}
plugins:
  - janusgraph.imports
scriptEngines: {
  gremlin-groovy: {
    scripts: [scripts/janusgraph.groovy]}}
serializers:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
metrics: {
  slf4jReporter: {enabled: true, interval: 180000}}

If you’re using ConfiguredGraphFactory and JanusGraph 0.3.1 or greater, your graph traversals will be automatically bound for you to <graph_name>_traversal. So the airroutes graph I created in the getting started series would be available as airroutes_traversal once I connected to the Gremlin Server with my console or application.

Bind Traversals on the Gremlin Console instead

Binding graphs and traversals can by automated on the console side as well by passing a Groovy script to gremlin.sh using the -i flag. Groovy scripts can also be run from an open Gremlin console using :load, or . which is an alias to :load.

Please note the -i flag will execute the script in the background before dropping you into the console. If you want to see the Gremlin output, you need to use :load from within the console instead.

While not helpful for binding traversals, I would like to note that gremlin.sh also supports an -e flag that executes the Groovy script and then exits.

Here is an example I’m choosing to name quickstart.groovy for a local JanusGraphFactory graph:

graph = JanusGraphFactory.open("conf/janusgraph-cassandra-es.properties")
g = graph.traversal()

Here’s the remote version of quickstart.groovy for JanusGraphFactory:

:remote connect tinkerpop.server conf/remote.yaml session
:remote console
g = graph.traversal()

For ConfiguredGraphFactory if you don’t want to use the automatic traversal binding, I would suggest the following:

:remote connect tinkerpop.server conf/remote.yaml session
:remote console
graph = ConfiguredGraphFactory.open("airroutes")
g = graph.traversal()

Once the file has been created, start the console with ./bin/gremlin.sh -i quickstart.groovy or run :load quickstart.groovy within the Gremlin console. These commands are assuming quickstart.groovy is located within the directory you started the Gremlin console from.

Running scripts with either :load or -e is helpful when managing multiple graphs, and by association multiple yaml files. I find running scripts within the console using :load is generally better.

That being said, I still pass scripts in with gremlin.sh -e fairly often since that’s the way I learned first. Using descriptive names containing the cluster name or general purpose helps keep things organized. This is especially important for me since I tend to throw all my scripts in the root of the JanusGraph distribution folder rather than the conf folder where they probably belong. That last part is a confession of my own laziness and not a recommendation.

Use Groovy scripts to create your graphs and import data

When trying something new, it helps to fail fast. Try to save all your Gremlin commands into Groovy files so you can quickly drop() and repopulate your graphs when experimenting. I’m including this very basic and contrived example of a CSV import. It generically shows how I codify my schema creation and data imports so they can be kept in source control.

Contents of example.csv:

user1, user2, relationship, start_date, platform
@chupman, @pluradj, following, 1528149066, twitter
...

The next code block is a schema creation and import script named socialLoad.groovy that takes parameters. It expects the properties file for the JanusGraphFactory graph as well as the example.csv data file. You would run it with the following command:

$JANUSGRAPH_HOME/bin/gremlin.sh -e $PWD/socialLoad.groovy $PWD/example.csv $PWD/janusgraph.properties

In case it wasn’t clear $PWD means, print working directory, or the paths of the files that are being passed in as arguments. Paths can be relative or fully qualified.

Reminder: If you run the command as shown, it’ll exit once it’s done. If you run it with -i instead of -e, it’ll drop you into the Gremlin console afterward so you can look around.

Contents of socialLoad.groovy:

FILENAME = args[0]
PROPERTIES = args[1]
graph = JanusGraphFactory.open(PROPERTIES)

// Create graph schema and indexes, if they haven't already been created
mgmt = graph.openManagement()
if (mgmt.getPropertyKey('user') == null) {
    USER = mgmt.makePropertyKey('user').dataType(String.class).cardinality(Cardinality.SINGLE).make();
    START_DATE = mgmt.makePropertyKey('start_date').dataType(Date.class).cardinality(Cardinality.SINGLE).make();
    PLATFORM = mgmt.makePropertyKey('PLATFORM').dataType(String.class).cardinality(Cardinality.SINGLE).make();
    RELATIONSHIP = mgmt.makePropertyKey('relationship').dataType(String.class).cardinality(Cardinality.SINGLE).make();
    mgmt.makeEdgeLabel('following').multiplicity(MULTI).make();
    mgmt.buildIndex('byUserComposite', Vertex.class).addKey(USER).buildCompositeIndex();
    println 'created schema'
}
mgmt.commit()


// load the data
g = graph.traversal()
batchSize = 10000
// Groovy syntax to open file and iterate through each line
new File(FILENAME).eachLine {
// by default the lines will be bound to 'it', but you can set them to a
// different variable name and provide a second variable name for a line count
    line, count ->
    if (line != null && line.trim().length() > 0) {
// Split the line into an array for easier processing
        field=line.split(",");
// Get or create users. True/False check to see if the vertex exists which is later used as a ternary check.
// If it exists set it to v1, otherwise create a new vertex and set it to v1
        t_v1 = g.V().has('user',field[0]); v1 = t_v1.hasNext() ? t_v1.next() : graph.addVertex('user',field[0]).next();
        t_v2 = g.V().has('user',field[1]); v2 = t_v2.hasNext() ? t_v2.next() : graph.addVertex('user',field[1]).next();
// Create an edge between variables v1, aliased to x, and v2, aliased to y, which were created (or retrieved) above.
// We get the edge label from the relationship field and set start_date and platform as properties of the edge.
        g.V(v1).as('x').V(v2).as('y').addE(field[2]).from('x').to('y').property('start_date', p[3]).property('platform', p[4]).iterate();
// In case a huge file is loaded break the commits into batches of 10,000 lines at a time.
        if (count % batchSize == 0) {
            graph.tx().commit()
            println count
        }
    }
}
// Commit any remaining entries and close the graph
graph.tx().commit();
graph.close();

To close out, I’ll give some additional explanation of the script above to compliment the existing comments.

In the data loading section, we bind a closure. According to the docs -> characters separate the arguments from the closure body. Or in slightly less technical terms, variables containing the line contents and line count are defined before -> and can be used afterward within the code block, or brackets.

If you want to load a large quantity of data, it’s a good idea to split up your commits. In this example, I’m checking to see if there’s a remainder when current line is divided by the batch size using the modulus operator. If the remainder equals 0, we perform a commit and print the current line number to stdout to provide some feedback to the user during the import. If I wanted to run this with -i instead of -e, I would remove the graph.close() line to have the Gremlin console available to me afterward.

Another reason I tend to pass scripts to gremlin.sh with -e, is that a bash script wrapper can be placed around multiple scripts. This effectively facilitates running sequential operations while getting coffee or sitting through meetings.

Whenever you need help with traversals it’s always a great idea to search the gremlin-users and janusgraph-users Google groups. For the get or create commands I originally found some great tips in this thread.

Conclusion

We covered some tips on using built-in features in the Gremlin console, automating traversal bindings to save on typing, and using .groovy scripts for imports and schema creation. In Tips and tricks – Part 2, we’re going to cover troubleshooting indexes, and exporting a subgraph to both GraphML and GraphSON. In addition, Part 2 has a section “Feature Preview printSchema” which isn’t really a feature preview anymore now that it’s generally available in the 0.2.3 release, although it’s still coming for the 0.3.x and 0.4.x releases.