Overview

Skill Level: Any Skill Level

This recipe introduces NoSQL Databases and then provides a detailed guide for configuring JDBC Connector in IBM InfoSphere Information Server DataStage to connect to MongoDB using a bundled Progress DataDirect JDBC driver for MongoDB

Ingredients

IBM InfoSphere Information Server 11.7 or 11.7.1 and later.

MongoDB Server 3.0 or above

DataDirect JDBC Driver for MongoDB (ISmongodb.jar), bundled with IBM InfoSphere Information Server

Note:

This recipe primarily uses the Datadirect JDBC Driver for MongoDB, however, with the JDBC Connector, other third party JDBC drivers may also be used as long as the driver conforms to the JDBC 4.0 standard or above. But the configuration steps for the JDBC URL, etc may vary. Please see the driver vendor's documentation for details.

Step-by-step

  1. What is a NoSQL database ? How does it compare to SQL or traditional RDBMS and the types of NoSQL Databases

    NoSQL, which stands for “not only SQL,” is an approach to database design that provides flexible schemas for the storage and retrieval of data beyond the traditional table structures found in relational databases. While NoSQL databases have existed for many years, NoSQL databases have only recently become more popular in the era of cloud, big data and high-volume web and mobile applications. They are chosen today for their attributes around scale, performance and ease of use.

    It’s important to emphasize that the “No” in “NoSQL” is an abbreviation for “not only” and not the actual word “No.” This distinction is important not only because many NoSQL databases support SQL like queries, but because in a world of microservices and polyglot persistence, NoSQL and relational databases are now commonly used together in a single application.

    NoSQL vs SQL

    NoSQL databases do not follow all the rules of a relational database —specifically, it does use a traditional row/column/table database design and does not use structured query language (SQL) to query data.

    Relational databases and SQL work well for large servers and storage mediums. But as larger sets of frequently evolving, disparate data became more common for things like e-commerce applications, programmers needed something more flexible than SQL. NoSQL is that alternative.

    NoSQL databases are built for specific data models and have flexible schemas that allow programmers to create and manage modern applications. NoSQL is also more agile because it’s not built on the concept of tables and does not use SQL to manipulate or analyze data (although some NoSQL databases may have SQL-inspired query language).

    NoSQL encompasses structured data (code in a specific format, written in such a way that search engines understand it), semi-structured data (data that contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields within the data), unstructured data (information that either does not have a pre-defined data model or is not organized in a pre-defined manner), and polymorphic data (data that can be transformed to any distinct data type as required).

    NoSQL enables you to be more agile, more flexible, and to iterate more quickly. NoSQL database enables simpler design, better control over availability and improved scalability.

    Types and examples

    A NoSQL database can manage information using any of four primary data models, i.e Document-based, Key-value store, Column-based and Graph-based.

    Document-based

    A document is an object and keys (strings) that have values of recognizable types, including numbers, Booleans, and strings, as well as nested arrays and dictionaries. Document databases are designed for flexibility. They aren’t typically forced to have a schema and are therefore easy to modify. If an application requires the ability to store varying attributes along with large amounts of data, document databases are a good option. MongoDB and Apache CouchDB are examples of popular document-based databases.

    We shall learn more about the other types of the NoSQL data stores while dealing with the respective types in future.

  2. JDBC Driver Configuration

    Before you can use any of the JDBC Drivers with the JDBC Connector in the IBM Information Server, you need to ensure that the CLASSPATH and CLASS_NAMES are configured in the JDBC Driver configuration file. The driver configuration file isjdbc.config is typically located under the Information Server installation, under the Engine directory.

    Unix platforms :
    <ISHOME>/Server/DSEngine

    Windows platform:
    <ISHOME>\Server\Engine

    Note :

    1. From the release IS 11.7.1 onwards, Information Server ships a default configuration file isjdbc.config.default with the driver information pre-configured, with the details of the drivers that are shipped along with the release. If the user wants to configure any other third party drivers, then they should configure the entries for the new drivers in the driver configuration file. This can be done either by copying the contents of the default driver configuration file isjdbc.config.default to a new file isjdbc.config file or by editing the default driver configuration file directly.

    Data Source and Driver Classes

    The driver class is, com.ibm.isf.jdbc.mongodb.MongoDBDriver

    The driver is typically installed under <ISHOME>/ASBNode/lib/java on unix platforms and under <ISHOME>\ASBNode\lib\java on Windows.

  3. Construct the JDBC Connection URL using the Progress Datadirect JDBC Driver for MongoDB

    This section shows the correct format for specifying a connection URL

    jdbc:ibm:mongodb://<host>:<port>;DatabaseName=<database>;SchemaMap=<schema>;[property=value[;…]]

    The host and port are the MongoDB server connection details.

    DatabaseName specifies the name of the database to which you want to connect. This value is used as the default qualifier for unqualified table names in SQL queries.
    This is Required for User/ID password authentication.

    Important: This value is case-insensitive if you have access privileges to query the list of databases on the server. If you do not have access, this value is case-sensitive.

    SchemaMap specifies the fully qualified path of the configuration file where the relational map of native data is written. The driver looks for this file when connecting to a MongoDB server. If the file does not exist, the driver creates one. See SchemaMap for further details.

    For more information on connection properties, see Using Connection Properties from the Datadirect JDBC Driver for MongoDB documentation.

    This example shows how to establish a connection to a MongoDB data store with user ID/password authentication:

    Example Connection URL:

    jdbc:ibm:mongodb://MyServer:27017;DatabaseName=Test;SchemaMap=/home/dsadm/MongoDB/MyServer/Test.config;UserName=admin;Password=adminpass;

  4. Creating and Customizing Schemas Using the Schema Tool

    The Schema Tool allows you to map your NoSQL data model to a relational model that exposes your data to relational, SQL-based applications. Using the Schema Tool’s Table Wizard, you decide whether to extract your native data into flattened, normalized, or customized relational views. Once you select a relational model, you can further customize your schema by defining column data types and the order in which columns appear. The schema maps created by the Schema Tool are stored in configuration files that allow them to be shared across servers.

    Note:

    1. The schema tool (schematool.jar) is shipped along with the driver and is available under <ISHOME>/branded_odbc/tools on unix platforms and under <ISHOME>\ODBCDrivers\tools on Windows.

    2. You must launch the schematool utility with the command provided below (choose the one matching your platform).

    UNIX platform: <ISHOME>/jdk/jre/bin/java -jar $DSHOME/../branded_odbc/tools/schematool.jar

    Windows: <ISHOME>\jdk\jre\bin\java -jar <ISHOME>\ODBCDrivers\tools\schematool.jar

    While executing the command from the Unix platforms, this command will launch a Xwindows GUI for you, and hence an XServer should be installed and running on a Windows machine and set the DISPLAY environment variable to that server so that the GUI can be seen on the respective Windows machine to operate.

    3. If you want to let the driver create a schema map, then you can skip this step and just provide the absolute path to an empty schema file to the SchemaMap connection attribute in the JDBC Connection URL and the driver will create the Schema Map file with the default mapping. However, if you want to customize the mapping of your collections from the MongoDB, it is recommended that you invoke the Schema map tool and generate the configuration file and use it in the Connection URL.

    Please refer to the section(s) `Customizing Your Schema` from the Progress Datadirect JDBC Driver for MongoDB documentation for additional information on customizing the Schema.

  5. Configure the Connector to read the data from a table in the MongoDB (Source Context)

    In this step, we shall configure the JDBC Connector in the source context to read the data from a table in the MongoDB.

    1. Design the job as shown below with the JDBC Connector acting as source and writing the data into the downstream stage (Peek in this example)

    JDBCConnector_SourceContext_MongoDB
    2. Configure the Connection URL following the steps outlined in the Step 3. The JDBC Connection URL should appear as shown in the image below.

    JDBCConnector_ConnectionURL_MongoDB-1

    3. Under the Usage section in the Connector properties, select either ‘Generate SQL at runtime’ to Yes, if you would like the Connector to generate the SQL Statement and accordingly provide a value for the Tablename property or set the ‘Generate SQL at runtime’ to No, if you would want to provide the user-defined SQL.
    4. Define the schema of the table under the Columns tab
    5. Save the job and execute it to read the data from the table.

    Note:
    1. If the table you are using in the job has been created after the Schema Map has been created or configured in the job, then you need to Refresh the schema map. Otherwise, the table would not be visible and the job could possibly fail with the following error

    JDBCConnector_Error

    2. In order to refresh the Schema Map, define the statement ‘REFRESH MAP’ in the Before SQL property under the Before / After SQL category

    JDBCConnector_BeforeAfterSQL

    3. The REFRESH MAP statement adds newly discovered objects to your relational view of native data. It also incorporates any configuration changes made to your relational view by reloading the schema map and associated files. (Please refer to the Datadirect JDBC Driver documentation for more details on the Connection Property)

  6. Configure the Connector to write the data to a table in the MongoDB (Target Context)

    In this step, we shall configure the JDBC Connector in the target context to write the data to a table in the MongoDB.

    1. Design the job as shown below with the JDBC Connector acting as target and reading the data from an upstream stage (DB2 Connector in this example)

    JDBCConnector_TargetContext

    2. Configure the Connection URL following the steps outlined in the section

    JDBCConnector_ConnectionURL_MongoDB
    3. Under the Usage section in the Connector properties, set the ‘Write mode’ to ‘Insert’ and the Table Action to ‘Append’ and either set ‘Generate SQL at runtime’ to Yes, if you would like the Connector to generate the SQL Statement and accordingly provide a value for the Tablename property or set the ‘Generate SQL at runtime’ to No, if you would want to provide the user-defined SQL.

    JDBCConnector_UserDefined
    4. Define the schema of the table under the Columns tab
    5. Save the job and execute it to read the data from the table.

    Note:
    1. The Note in the Step 5 is applicable in the Target context as well.

  7. Data Encryption and Kerberos authentication

    Data Encryption

    The Datadirect JDBC Driver for MongoDB supports ‘Data Encryption‘. Typically, SSL works by allowing the client and server to send each other encrypted data that only they can decrypt. SSL negotiates the terms of the encryption in a sequence of events known as the SSL handshake. When SSL is enabled on the server side, the parameter EncryptionMethod=SSL should be specified in the JDBC Connection URL.

    The handshake involves two types of authentication:

    • SSL server authentication requires the server to authenticate itself to the client.
    • SSL client authentication is optional and requires the client to authenticate itself to the server after the server has authenticated itself to the client.

    When SSL is enabled, the following properties also apply and need to be configured depending on the type of the authentication enabled.

    The following connection properties are needed to be configured in the JDBC Connection URL if the SSL server authentication is enabled on the MongoDB Server side.

    TrustStore: The directory of the truststore file to be used when SSL is enabled and server authentication is used. The truststore file contains a list of the SSL certificates the client trusts.

    TrustStorePassword: Specifies the password that is used to access the truststore file when SSL is enabled and server authentication is used.

    ValidateServerCertificate: Determines whether the driver validates the certificate that is sent by the database server when SSL encryption is enabled. When using SSL server authentication, any certificate that is sent by the server must be trusted and validated.

    HostNameInCertificate: This parameter specifies a host name for certificate validation when SSL encryption is enabled and validation is enabled (ValidateServerCertificate=true)

    SSL client authentication is optional and requires the client to authenticate itself to the server after the server has authenticated itself to the client. The connection URL should include KeyStore and KeyStorePassword parameters if the SSL client authentication is enabled.

    Kerberos Authentication

    If the MongoDB Server is configured with Kerberos authentication, additional Kerberos related configuration steps are required on the driver side before you could connect to the MongoDB Server. Please follow the steps outlined below to configure the Kerberos environment.

    1. Configure Java Authentication and Authorization Service (JAAS) configuration file.

    Typically, JAAS configuration would be named JDBCDriverLogin.conf and should be placed alongside the JDBC Driver under <ISHOME>/ASBNode/lib/java. (If the user does not have permissions to write files under <ISHOME>/ASBNode/lib/java, then the file can be placed under any location and the absolute path to the file needs to be passed to the JVM through the JVM option).  The JAAS configuration file specifies the login module to be used for the Keberos authentication and will be passed to the JVM by setting th relevant JVM option (details provided below).

    -Djava.security.auth.login.config=<absolute_path_to_config_file>/JDBCDriverLogin.conf

    Sample entries from the JDBCDriverLogin.conf file:

    $ cat JDBCDriverLogin.conf

    JDBC_DRIVER_user1_keytab{
    com.ibm.security.auth.module.Krb5LoginModule required
    credsType=both
    principal=”user1@IBM.COM”
    useKeytab=”FILE:/home/user1/user1.keytab”;
    };

    JDBC_DRIVER_user2_keytab{
    com.ibm.security.auth.module.Krb5LoginModule required
    credsType=both
    principal=”use2@IBM.COM”
    useKeytab=”FILE:/home/user2/user2.keytab”;
    };

    <EOF>

    As you can observe, we can provide multiple entries in the file as independent stanzas for each user. We can as well use the credentials cache. However, using keytab for authentication is recommended approach. When multiple entries are configured in the configuration file or if the stanza name is changed from the default name of JDBC_DRIVER_01, the connection parameter loginConfigName should be added to the JDBC Connection URL.

    2. Obtain the Service Principal name

    Every MongoDB mongod and mongos instance must have an associated service principal. Typically , service principal names have the form:

    <service>/<fully qualified domain name>@<KERBEROS REALM>

    For MongoDB, the <service> defaults to mongodb. However, the service name and the Service Principal Name can be customized by the mongodb administrator  and should be obtained from the administrator.

    3. Define the additional authentication properties in the JDBC Connection URL.

    Additional connection parameters should be supplied through the JDBC URL when Kerberos authentication is enabled. The URL should look like the one shown below.

    jdbc:ibm:mongodb://<host>:<port>;AuthenticationMethod=kerberos;loginConfigName=JDBC_DRIVER_user1_keytab;ServicePrincipalName=mongodb/<FQDN>@<REALM>

    AuthenticationDatabase : Specifies the database on the MongoDB server where user principal names are stored for Kerberos authentication. The default is $external. If the database name is different on the MongoDB server, then this option should be added to the Connection URL.

    Note :

    1. Refer to the Datadirect JDBC Driver documentation for additional information on the Data Encryption and Kerberos Authentication options supported by the driver.

    2. Typically, Kerberos configuration file (krb5.conf) would located under /etc/. However, the user can configure the JVM option to overwrite the location of the kerberos configuration file.

    -Djava.security.krb5.conf=<absolute_path_to>/krb5.conf

    3. The JVM options can be set through the environment variable CC_JVM_OPTIONS or through the Java Properties under Usage section in the JDBC Connector.

  8. Troubleshooting

    Driver logging can be enabled to troubleshoot any of the connectivity problems you may encounter while connecting to the MongoDB using the Datadirect JDBC Driver.

    The driver has capability to capture the communication between the application client and the driver (Spy logging) and the communication between the driver and the database server (Packet logging or Snoop).

    In order to enable Spy and Snoop tracing the following parameters needs to be added to the JDBC Connection URL:

    Steps to enable Snoop tracing:

    Add the following parameters to the JDBC Connection URL to enable the Snoop tracing

    ddtdbg.ProtocolTraceEnable=true;ddtdbg.ProtocolTraceMaxline=16;ddtdbg.ProtocolTraceLocation=/home/dsadm/mongodb/logs/Snoop;ddtdbg.ProtocolTraceShowTime=true;

    Add the following parameters to the JDBC Connection URL to enable the Spy tracing

    SpyAttributes=(log=(file)/home/dsadm/mongodb/logs/Spy.log;timestamp=yes);

    This will create log files SnoopX.out and Spy.log in the log directory mentioned in the connection parameters. There could be more than 1 SnoopX.log files.

    Common connection issues:

    1. Most often this error would be seen when SSL is enabled on the MongoDB Server side and the application (i.e JDBC Connector) does not have the SSL options (i.e EncryptionMethod=SSL) enabled.

    Fatal Error: The connector failed to connect to the data source. The reported error is: java.sql.SQLException: [IBM][MongoDB JDBC Driver][MongoDB]No more data available to read.

  9. Limitations

    The following limitations can be encountered while using the JDBC Connector with the MongoDB. The limitations are primarily either from the Data source side or the Driver.

    • Complex data types aren’t supported and cannot be represented in the DataStage Schema. The schema with complex types or the nested collections, must either be flattened or normalized using the schematool GUI utility and use proper queries.
    • Create, Replace or Truncate Write modes are not supported
    • Insert with primary key violation, Insert News Only and Insert then Update write modes doesn’t work as expected (database limitation)
  10. References

    1. https://www.ibm.com/cloud/learn/nosql-databases

    2. https://documentation.progress.com/output/DataDirect/jdbcmongohelp/index.html

    3. https://documentation.progress.com/output/DataDirect/jdbcmongohelp/#page/mongohelp%2Fcreating-and-customizing-schemas-using-the-datad.html%23

    4. https://docs.mongodb.com/manual/core/kerberos/

Join The Discussion