An expanded partnership between IBM and Hortonworks has combined Hortonworks Data Platform (HDP) with IBM Big SQL into a new integrated solution. HDP does not cover all of the services that were available on IBM Open Platform with Apache Spark and Apache Hadoop (IOP). This blog post shows you how to deploy and configure IOP Titan, a transactional distributed graph database that can support thousands of concurrent users, on HDP.
- Prepare the HDP environment. Ensure that your HDP deployment is ready for IOP Titan. Both Spark and HBase are required for Titan.
- Install IOP Titan on HDP.
- Download the Titan rpm from our website.
For RHEL7: http://birepo-build.svl.ibm.com/repos/IOP/RHEL7/x86_64/4.2.5.0/4.2.5.0-GM/titan/noarch/titan_4_2_5_0-1.0.0_IBM-000000.el7.noarch.rpm For RHEL6: http://birepo-build.svl.ibm.com/repos/IOP/RHEL6/x86_64/4.2.5.0/4.2.5.0-GM/titan/noarch/titan_4_2_5_0-1.0.0_IBM-000000.el6.noarch.rpm
- Install Titan.
yum install titan_4_2_5_0-1.0.0_IBM-000000.el7.noarch.rpm
The path on which Titan will be installed is /usr/iop/4.2.5.0-0000/titan.
- Configure the Titan client by completing the following steps:
- Modify the /usr/iop/4.2.5.0-0000/titan/conf/titan-hbase-solr.properties file:
gremlin.graph=com.thinkaurelius.titan.core.TitanFactory storage.backend=hbase storage.hostname=hbase_hostname1,hbase_hostname2,hbase_hostname3 storage.hbase.table=titan storage.hbase.ext.zookeeper.znode.parent=/hbase-unsecure cache.db-cache = true cache.db-cache-clean-wait = 20 cache.db-cache-time = 180000 cache.db-cache-size = 0.5
- Copy the titan-env.sh file into /usr/iop/4.2.5.0-0000/titan/conf.
- Run the following command:
chown -R titan:Hadoop /usr/iop/4.2.5.0-0000/titan/conf/titan-env.sh
- Create the ‘titan’ user and add it to the ‘hadoop’ group by running the following command on each node:
useradd titan -g Hadoop
- Run the following commands:
chown -R titan:hadoop /usr/iop/4.2.5.0-0000/titan/ext/plugins.txt chown -R titan:hadoop /usr/iop/4.2.5.0-0000/titan/conf/titan-hbase-solr.properties
- Run a test. Under the /usr/iop/4.2.5.0-0000/titan/bin directory, run the following command:
./gremlin.distro
In the Gremlin console, run the following code:
graph = TitanFactory.open("/usr/iop/4.2.5.0-0000/titan/conf/titan-hbase-solr.properties") mgmt = graph.openManagement() pkeyName = mgmt.makePropertyKey("name").dataType(String.class).cardinality(Cardinality.SINGLE).make() pkeyAge = mgmt.makePropertyKey("age").dataType(String.class).cardinality(Cardinality.SINGLE).make() mgmt.commit() hercules = graph.addVertex("name", "hercules", "age", 30, "type", "demigod"); alcmene = graph.addVertex("name", "alcmene", "age", 45, "type", "human"); jupiter = graph.addVertex("name", "jupiter", "age", "5000", "type", "god"); pluto = graph.addVertex("name", "pluto", "age", "4000", "type","god"); neptune = graph.addVertex("name", "neptune", "age", "4500", "type","god"); satum = graph.addVertex("name", "satum", "age", "10000", "type","titan"); satum1= graph.addVertex("name", "satum1", "age", "10200", "type","titan"); hercules.addEdge("father", jupiter); hercules.addEdge("mother", alcmene); jupiter.addEdge("father", satum); jupiter.addEdge("brother", neptune); jupiter.addEdge("brother", pluto); neptune.addEdge("brother", jupiter); neptune.addEdge("brother", pluto); pluto.addEdge("brother", jupiter); pluto.addEdge("brother", neptune); satum.addEdge("brother", satum1); satum1.addEdge("brother", satum); graph.tx().commit()
- Go to the HBase shell and check whether the table ‘titan’ was created in HBase by running the following command:
scan 'hbase:meta'
- Check other information that pertains to the table ‘titan’ by running the following command:
scan 'titan'
Your Titan client should now be working well with HBase backend storage.
- Modify the /usr/iop/4.2.5.0-0000/titan/conf/titan-hbase-solr.properties file:
- Configure the Titan server by completing the following steps:
- Run the following command:
chown -R titan:hadoop /usr/iop/4.2.5.0-0000/titan/conf/gremlin-server/gremlin-server.yaml
- Modify the /usr/iop/4.2.5.0-0000/titan/conf/gremlin-server/gremlin-server.yaml file:
host: titan_server_host port: 8182 channelizer: org.apache.tinkerpop.gremlin.server.channel.HttpChannelizer graphs: { graph: /usr/iop/4.2.5.0-0000/titan/conf/titan-hbase-solr.properties, graphSpark: /usr/iop/4.2.5.0-0000/titan/conf/hadoop-graph/hadoop-gryo.properties} plugins: - aurelius.titan - tinkerpop.hadoop - tinkerpop.tinkergraph
- Under the /usr/iop/4.2.5.0-0000/titan/bin directory, run the following command:
./ gremlin-server.sh
- Run a test.
curl -XPOST -Hcontent-type:application/json -d '{"gremlin":"100-1"}' http://titan_server_host:8182 | grep 99
The Titan server is now using the Restful API mode.
- Configure the Titan server log. Run the following commands:
mkdir /var/log/titan/ chown -R titan:hadoop /var/log/titan/
- Configure the Titan server to enable the WebSocket mode. Open /usr/iop/4.2.5.0-0000/titan/conf/gremlin-server/gremlin-server.yaml and change HttpChannelizer to WebSocketChannelizer.
- Run the following command:
chown -R titan:hadoop /usr/iop/4.2.5.0-0000/titan/conf/remote.yaml
- Modify the /usr/iop/4.2.5.0-0000/titan/conf/remote.yaml file:
hosts: [titan_server_host] - gremlin server host port: 8182 serializer: { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
- Start the Titan server by running the following command under /usr/iop/4.2.5.0-0000/titan/bin:
./ gremlin-server.sh
- Run a test. Open another SSH client, and under the /usr/iop/4.2.5.0-0000/titan/bin directory, run the following command:
./gremlin.distro
In the Gremlin console, run the following code:
:remote connect tinkerpop.server /usr/iop/4.2.5.0-0000/titan/conf/remote.yaml session :remote console
- Run the following command:
- Configure Titan SparkGraphComputer by completing the following steps:
- Run the following commands:
Sudo -u spark hadoop fs -mkdir -p /user/spark/share/lib/spark Sudo -u spark hadoop fs -put -f /usr/hdp/current/spark2-client/jars/* /user/spark/share/lib/spark; Sudo -u spark hadoop fs -rm -r /user/spark/share/lib/spark/guava*.jar; Sudo -u spark hadoop fs -put -f /usr/iop/4.2.5.0-0000/titan/lib/guava*.jar /user/spark/share/lib/spark sudo -u hdfs hadoop fs -mkdir /user/titan sudo -u hdfs hadoop fs -chown -R titan:hdfs /user/titan
- Copy the hadoop-gryo.properties file to the /usr/iop/4.2.5.0-0000/titan/conf/hadoop-graph directory.
chown -R titan:Hadoop /usr/iop/4.2.5.0-0000/titan/conf/hadoop-graph/hadoop-gryo.properties
- Modify the ‘spark.yarn.jars’ attribute to point to your NameNode host in the /usr/iop/4.2.5.0-0000/titan/conf/hadoop-graph/hadoop-gryo.properties directory. For example:
spark.yarn.jars=hdfs://hostname:8020/user/spark/share/lib/spark/*.jar
- Test SparkGraphComputer: Under the /usr/iop/4.2.5.0-0000/titan/bin directory, run the following command:
./gremlin.distro
gremlin> hdfs.copyFromLocal('/usr/iop/4.2.5.0-0000/titan/data/tinkerpop-modern.kryo','/user/ambari-qa/data/tinkerpop-modern.kryo') ==>null gremlin> graph = GraphFactory.open('/usr/iop/4.2.5.0-0000/titan/conf/hadoop-graph/hadoop-gryo.properties') ==>hadoopgraph[gryoinputformat->gryooutputformat] gremlin> g = graph.traversal(computer(SparkGraphComputer)) ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], sparkgraphcomputer] gremlin> g.V().valueMap()
- Run the following commands:
- Enable security for Titan by completing the following steps. SSL provides the standard encryption technology for establishing a secure connection between a Titan client and the Titan server.
- Go to the /tmp directory and create a self-signed server certificate.
openssl req -newkey rsa:2048 -nodes -keyout server.key -x509 -days 365 –out server.crt
- Convert the private key to PKCS #8 format and set the password to ‘changeit’.
openssl pkcs8 -topk8 -inform pem -in server.key -outform pem -out server.pk8
- Go to the /usr/iop/4.2.5.0-0000/titan/conf/gremlin-server directory, open the gremlin-server.yaml file, change WebSocketChannelizer to HttpChannelizer, and then append the following configuration code to the end of this file:
ssl: { enabled: true, keyFile: /tmp/server.pk8, keyPassword: changeit, keyCertChainFile: /tmp/server.crt}
- Restart the Titan server and run the following command to determine whether SSL is working:
curl --cacert /tmp/server.crt -XPOST -Hcontent-type:application/json -d '{"gremlin":"100-1"}' https://hhdp1.fyre.ibm.com:8182 | grep 99
For Knox, SSL provides a single point of authentication and access to the Titan server in a cluster.
- From the Knox page in the Ambari user interface, add the following code to the advanced topology. If the Titan server is running without SSL, you can use ‘http’ as the URL.
<service> <role>GREMLIN</role> <version>1.0.0</version> <url>https://hhdp1.fyre.ibm.com:8182</url> </service>
- Restart Knox and then start a demo LDAP server.
- Run the following command. You will need to add the cert file to the JRE.
keytool -import -file hive.crt -keystore /usr/jdk64/jdk1.8.0_112/jre/lib/security/cacerts -storepass changeit -alias titan
- Run the following command to check whether Titan works with Knox:
curl -Hcontent-type:application/json -u guest:guest-password -k https://hhdp1.fyre.ibm.com:8443/gateway/default/gremlin -d '{"gremlin": "100-1"}'
For a Kerberized cluster, SSL is designed to provide strong authentication for Titan client/server applications by using secret key cryptography. From your Kerberos key distribution center (KDC) server, run the following command to create a Titan principal and generate a keytab:
kadmin.local addprinc titan/[email protected] xst -norandkey -k titan.service.keytab titan/[email protected] scp titan.service.keytab hostname:/etc/security/keytabs chown -R titan:hadoop /etc/security/keytabs/titan.service.keytab sudo –u titan /usr/bin/kinit -kt /etc/security/keytabs/titan.service.keytab titan/[email protected]
The ‘hostname’ identifies your Titan server.
For HBase access control, start the HBase shell and then grant the following privileges to the ‘titan’ user:
grant 'titan', 'RWXCA'
Simple Authentication and Security Layer (SASL) and Knox have similar functions and cannot be enabled at the same time. SASL should be disabled when Knox is enabled. To enable SASL, complete the following steps:
- Copy the tinkergraph-empty.properties file to the /usr/iop/4.2.5.0-0000/titan/conf directory.
- Run the following command:
chown -R titan:hadoop /usr/iop/4.2.5.0-0000/titan/conf/tinkergraph-empty.properties
- Create an empty file named credentials.kryo under the /usr/iop/4.2.5.0-0000/titan/data directory.
- Add the following content to the /usr/iop/4.2.5.0-0000/titan/conf/gremlin-server/gremlin-server.yaml file:
authentication: { className: org.apache.tinkerpop.gremlin.server.auth.SimpleAuthenticator, config: { credentialsDb: /usr/iop/4.2.5.0-0000/titan/conf/tinkergraph-empty.properties, credentialsDbLocation: /usr/iop/4.2.5.0-0000/titan/data/credentials.kryo}}
- In the Gremlin console, run the following command to create a SASL user and password:
gremlin> :plugin use tinkerpop.credentials ==>tinkerpop.credentials activated gremlin> graph = TinkerGraph.open() ==>tinkergraph[vertices:0edges:0] gremlin > graph.createIndex("username",Vertex.class) gremlin> credentials = credentials(graph) ==>CredentialGraph{graph=tinkergraph[vertices:0edges:0]} gremlin> credentials.createUser("stephen","password") //to create user ==>v[0] gremlin> graph.io(IoCore.gryo()).writeGraph("/usr/iop/4.2.5.0-0000/titan/data/credentials.kryo") //to save the credentials database
- Restart the Titan server.
- To access the Titan server, run curl commands that are similar to the following examples:
Titan SASL+TITAN SSL: curl --cacert /tmp/server.crt -X POST -u stephen:password -d "{\"gremlin\":\"100-1\"}" https://hostname:8182 Titan SASL: curl -X POST -u stephen:password -d "{\"gremlin\":\"100-1\"}" https://hostname:8182
- Go to the /tmp directory and create a self-signed server certificate.
- Download the Titan rpm from our website.