Skill Level: Any Skill Level

This recipe provides a guide for configuring IBM InfoSphere Information Server DataStage to connect to Hive using a bundled DataDirect JDBC driver as a source or target within a job design.


IBM InfoSphere Information Server 9.1, 11.3, 11.5, or 11.7 and later (11.5 or later preferred)

Hive Server

DataDirect JDBC Driver (bundled with IBM InfoSphere Information Server)

Other third party JDBC drivers may be used, but configuration steps for the JDBC URL, etc may vary.  Please see the driver vendor's documentation for details.


  1. Hive JDBC Driver

    It is important to obtain the latest DataDirect Hive JDBC driver to benefit from known fixes and performance improvements.

    To get the current version of your Hive driver:

    java -cp <location_of_IShive.jar> com.ibm.isf.jdbc.hive.HiveDriver

    IShive.jar is typically located in: /opt/IBM/InformationServer/ASBNode/lib/java so the full command would be:

    java -cp /opt/IBM/InformationServer/ASBNode/lib/java/IShive.jar com.ibm.isf.jdbc.hive.HiveDriver

    JR56156 is a more recent version of the driver and can be obtained by installing 11.5 Fix Pack 2 or contacting IBM Support.


    If you require HTTP transport Mode support which might be needed if you were trying to connect through a services such as Knox, etc you must use JR58858 or later version of the DataDirect JDBC driver.

  2. IBM JDK

    All stages: JDBC Connector, Hive Connector, and File Connector (Hive Table create option) rely on the IBM JDK for the JVM that is used to establish the JDBC connection.

    It is important to obtain the latest version of the IBM JDK to benefit from known fixes and security vulnerabilities.

    To upgrade the IBM JDK that is used with IBM InfoSphere Information Server, see one of the security bulletins

    Note that IBM Fix Central will display the latest IBM JDK version that is certified by IBM InfoSphere Information Server, which might supersede the fix mentioned in the security bulletin.

    To upgrade the IBM JDK used with Information Server, follow these steps:

    1. Extract the downloaded JDK:
      tar xvf ibm-java-sdk* --directory /opt/IBM/InformationServer
    2. Backup the original IBM JDK:
      mv /opt/IBM/InformationServer/jdk /opt/IBM/InformationServer/jdk-`date +"%Y%m%d%H%M"`
    3. Move the new JDK in place:
      mv /opt/IBM/InformationServer/ibm-java* /opt/IBM/InformationServer/jdk
    4. [Optional] If you are using Kerberos, or had previously applied the unrestricted export policy files, please reapply them now.  Note that any IBM JDK downloaded after January 2018 provides both unrestricted export policy files and restricted export files as part of the IBM JDK package and the unrestricted export policy files are put in place by default. You do not need to download or install unrestricted JCE policy files.


    Please consult the README packaged with the IBM JDK for additional information as well as instructions for upgrading the WebSphere JDK.

  3. isjdbc.config

    The isjdbc.config file controls the CLASSPATH and CLASSNAMES that are loaded by all JDBC related stages.

    Sample contents for the isjdbc.config:


    Additional entries can be added to both the CLASSPATH and CLASS_NAMES variables in the isjdbc.config by adding a ; delimiter.

  4. JAAS Configuration File (Kerberos)

    If your JDBC connection does not require Kerberos, you can skip this step and proceed to step #6

    The JDBCDriverLogin.conf file is a configuration file that specifies which Java Authentication and Authorization Service (JAAS) login module to use for Kerberos authentication.

    The JASS Configuration file (commonly JDBCDriverLogin.conf) is loaded by setting the JVM option:


    Most commonly this file is placed in: /opt/IBM/InformationServer/ASBNode/lib/java, but the -Djava.security.auth.login.config must be provided/set even if the JAAS Configuration File is in this location.

    By default, the krb5.conf (Kerberos configuration file) will be read from /etc/krb5.conf.  This location can be overwritten by setting the JVM option:


    These JVM options can be set by setting the environment variable:


    Or in the case of the JDBC Connector or Hive Connector, can be set directly in the stage under Java Properties.


    Sample JDBCDriverLogin.conf file:

    com.ibm.security.auth.module.Krb5LoginModule required

    There are different options available for the JDBCDriverLogin.conf file, for instance instead of a Kerberos Keytab, you can use a Kerberos Credential Cache:

    com.ibm.security.auth.module.Krb5LoginModule required

    Where /tmp/krb5cc_13251 is the location of the Kerberos Credential Cache as seen by the output of a command such as klist:

    Ticket cache: FILE:/tmp/krb5cc_13251
    Default principal: dsadm@IBM.COM
    Valid starting Expires Service principal
    07/06/17 01:30:02 07/07/17 01:30:02 krbtgt/IBM.COM@IBM.COM
    renew until 07/06/17 01:30:02

    In either case, the principal provided should be a user principal, i.e. the user you are obtaining a Kerberos ticket with.  The Hive Service Principal can be provided in the JDBC URL and will be discussed in step #5.

    Multiple entries can be defined in the JDBCDriverLogin.conf file.  If multiple entries are defined, the JDBC URL option loginConfigName can be set to control multiple entries.  For example, to use the entry that has a keytab, the JDBC URL might look like:


    JDBC URL syntax will be covered at length in step #6 below.


    For importing metadata via IMAM (InfoSphere¬ģ Metadata Asset Manager), the ASBNode Agent is used for the connection to Hive.¬† Because of this, the¬†-Djava.security.auth.login.config property should be set for the ASBNode Agent so that it can use the proper JAAS configuration file for a Kerberized JDBC connection.¬† One way to ensure the ASBNode Agent uses this JVM option is to modify the eval exec section in the Agent.sh script in¬†/opt/IBM/InformationServer/ASBNode/bin

    Original eval exec in Agent.sh
    eval exec ‘”${JAVA_HOME}/bin/java”‘ ‘$PLATFORM_OPTIONS’ ‘-Xbootclasspath/a:conf:eclipse/plugins/com.ibm.iis.client’ -Xss2M -Xmso2M ‘$LANGUAGE_OPTIONS’ ‘-Djava.ext.dirs=$JAVA_HOME/lib/ext:lib/java:eclipse/plugins:eclipse/plugins/com.ibm.iis.client’ ‘-Djava.util.logging.config.file=${NODE_DIR}/conf/asbagent-logging.properties’ -classpath ‘conf:eclipse/plugins/com.ibm.iis.client’ ${J2EE_OPTS} com.ibm.iis.isf.agent.impl.AgentImpl ‘”$@”‘

    New eval exec including the -Djava.security.auth.login.config JVM Option in Agent.sh:
    eval exec ‘”${JAVA_HOME}/bin/java”‘ ‘$PLATFORM_OPTIONS’ ‘-Xbootclasspath/a:conf:eclipse/plugins/com.ibm.iis.client’ -Xss2M -Xmso2M ‘$LANGUAGE_OPTIONS’ ‘-Djava.ext.dirs=$JAVA_HOME/lib/ext:lib/java:eclipse/plugins:eclipse/plugins/com.ibm.iis.client’ ‘-Djava.util.logging.config.file=${NODE_DIR}/conf/asbagent-logging.properties’ ‘-Djava.security.auth.login.config=/path/to/JDBCDriverLogin.conf’ -classpath ‘conf:eclipse/plugins/com.ibm.iis.client’ ${J2EE_OPTS} com.ibm.iis.isf.agent.impl.AgentImpl ‘”$@”‘

  5. Hive Service Principal (Kerberos)

    If your JDBC connection does not require Kerberos, you can skip this step and proceed to step #6

    The Hive Service Principal is typically needed for the ServicePrincipalName JDBC URL option. 

    The service principal name is typically:



    hive is your service name.

    [hive-server] is the fully qualified domain name of the Hive Server.

    REALM.COM is your Kerberos realm.

    Otherwise, you should be able to find your service principal by looking at the value of the hive.server2.authentication.kerberos.principal property in the hive-site.xml file.

  6. Hive JDBC Driver URL Syntax

    Full JDBC options for the DataDirect driver, can be found within their documentation.

    The most important aspect to remember is that because the DataDirect JDBC drivers are embedded/provided by IBM InfoSphere Information Server, they are branded.  This means the beginning of the JDBC URL will be:


    Some basic JDBC URL examples are included below:

    Sample URL (Non-Kerberos)


    Sample URL (Kerberos)




    Hive allows for two methods for hive.server2.transport.mode.  With the latest version of the DataDirect drivers, both HTTP and Binary Transport modes are supported.  You can specify the JDBC URL option TransportMode=binary or TransportMode=HTTP to toggle the protocol used for the Thrift RPC requests.


    The DataDirect driver also supports using Apache Zookeeper to facilitate the connection to Hive.  When using Zookeeper, the driver will retrieve and overide connection properties such as the ServerName, PortNumber, Kerberos settings, Encryption settings, and transport mode settings.

    Sample URL (Kerberos)



    This concludes the JDBC configuration/setup, now we are ready to proceed to configuring/using either the Hive Connector, File Connector Hive Table Create Option, or the JDBC Connector stages as will be detailed in the remaining steps.

  7. Hive Connector

    The Hive Connector is the preferred method for connecting to Hive via JDBC.  The initial release of the Hive Connector was on 11.5

    One of the most powerful features of the Hive Connector is it’s ability to perform partitioned reads and partitioned writes which allows you to extract/load data in parallel.

    The Hive Connector also includes support for creating tables with different Row formats and data storage formats:


    The default stage properties for a source Hive Connector stage are shown below:


  8. File Connector (Hive Table create option)

    The File Connector has an option to create an External Hive table on top of HDFS data that it loads.  This is one of the fastest ways to load into Hive.  The Hive Table create option leverages a JDBC Connection, some sample properties from the File Connector stage are shown below.  All the previous steps in this recipe are relevant to establish connectivity for the Hive Table Create option within the File Connector.

    The File Connector Hive Table create option does not have a “visible” JDBC URL.¬† Although additional JDBC options can be passed to the constructed URL by setting the “Additional driver attributes” option within the stage properties.


  9. JDBC Connector

    The JDBC Connector can be used with the same URL we constructed in step #6.¬† The main difference with this stage, is that in some cases, queries may need to be “user-defined”.

    Some default stage properties for a source JDBC Connector stage are shown below:


  10. Debugging

    When debugging Kerberos configuration issues, setting the following JVM options help obtain a Kerberos trace:

    -Dcom.ibm.security.jgss.debug=all -Dcom.ibm.security.krb5.Krb5Debug=all

    The easiest way to set these for the JVM is to use the CC_JVM_OPTIONS environment variable.

    When debugging generic Java issues or SSL issues, the following JVM option can be helpful:


    All three stages (Hive Connector, JDBC Connector, and File Connector) support the CC_MSG_LEVEL environment variable which will provide additional information into the job log.

5 comments on"JDBC Hive Connectivity with IBM InfoSphere Information Server"

  1. DM_gm_usa July 19, 2017

    Thanks Scott for the nice Tutorial.

  2. Just found this…nicely done Scott…thank you.

  3. Suneel Kalapala May 03, 2018

    Nice article ..Thanks Scott..

  4. Do we have the same steps for MapR DB for connecting to binary tables?

Join The Discussion