Overview

Skill Level: Beginner

This article will describe how to configure Cassandra Connector to connect to Datastax Enterprise to extract data by using an example.

Ingredients

IBM InforSphere Information Server(IIS) 11.7 or above version installed 

Step-by-step

  1. Setup Datastax Enterprise environment

    Setup a Datastax Enterprise as source test data followed by the steps in https://docs.datastax.com/en/install/6.0/install/installTOC.html . In this example, we installed DSE6.0 on the directory /data, and verify that DataStax Enterprise is running normally from the installation directory.

    step1

     

     

  2. Create keyspace and tables on Cassandra data source

    Created a keyspace named stresscql on the data source , then created a table named perftest1 which is used to connect with Cassandra connector, and inserted sample rows to the table.

    step2

  3. Download and configure the Datastax Enterprise Java Driver

    Created a directory /home/dsadm/cassandra-client60 on IIS Server, then downloaded the following DataStax Enterprise Java Driver from DataStax Offical homepage and copied them to the directory /home/dsadm/cassandra-client60

    dse-java-driver-core-1.7.0.jar

    dse-java-driver-extras-1.7.0.jar

  4. Configure related jar files

     Copied all the jar files from the following Datastax installation directory to /home/dsadm/cassandra-client60 on IIS Server, you must always use Cassandra client libraries compatible with the version of the target database.

      /data/dse-6.0.6/resources/cassandra/lib

      /data/dse-6.0.6/resources/dse/lib

  5. Create a Datastage job with Cassandra Connector

    Created a datastage job with Cassandra Connector to read data from Cassandra database (Datastax Enterprise) to a sequential file like below,  you can also create a job using Cassandra Connector to write data to Cassandra table.

    step5

  6. Configure Connection properties for Cassandra Connector

    Double click the Cassandra Connector stage icon to open the stage editor, click on the Properties page, specify the following value for Connection properties

    ·  Cluster contact points – the list of cluster seed nodes. It should contain IPs or hostnames of Cassandra cluster nodes, optionally with ports if different than the default Cassandra port. The contact points should be separated by semicolon.  In this example, we used “9.30.215.49”, which is the source cassandra database we created in step1

    ·  Protocol version – chose the protocol that you want to use to connect to the cluster. In this example, we selected “ V5”

    ·  Cassandra client jars – contains the list of folders and/or jar files with Cassandra client jars and optionally some custom jars (with custom type codes). The list uses semicolon as a separator.  In this example, we used ”/home/dsadm/cassandra-client60” which is created in step3.

     

    step6

  7. Configure Usage properties for Cassandra Connector

     In Usage properties, specified Keyspace name “stresscql” and Table name “perftest1”, which are created in step2

    step7

  8. Specify the value for Columns page for Cassandra Connector

     Click Output tab, specify the value for all the column fields for table “perftest1” in Column page, click OK to save all the settings.

    step8

  9. Configure the target Sequential File stage

    Click the Sequential File stage to specify the Target file name and location.

    step9

  10. Compile and run the job

     Compiled and ran the job, the data in Cassandra table “perftest1” are extracted to the sequential file successfully.

    step10

Join The Discussion