Connect to JDBC enabled databases from SPL and Python

Mon August 10, 2020 03:11 PM

NATASHA D'SILVA

The Streams JDBC toolkit allows Streams applications to interact with databases via JDBC. It supports databases from popular vendors like IBM, Oracle, Microsoft and Teradata.

This post will cover how to use the toolkit to create Streams applications using Python or SPL (Streams Processing Language).

Toolkit overview

To connect to a database from a Streams SPL application, use the JDBCRun operator in the toolkit. For Python applications, the streamsx.database Python package is available. Before you get started, you need to have the following information:

A JDBC compliant database.
JDBC Connection information:
- JDBC driver library for your database. Download the JDBC JCC driver for DB2
- JDBC Driver class name and connection URL for your database. Depending on the type of database source you are connecting to, the format of the JDBC URL and driver information will be different. Consult your database vendor’s documentation to find the correct driver library, driver class name and URL. For DB2, the URL is of the form:
```
jdbc:db2://hostname:port/database_name
```
  and the classname is:
```
com.ibm.db2.jcc.DB2Driver
```

Using the toolkit from SPL

You execute standard SQL statements using the JDBCRun operator, which is the only operator in this toolkit.
You simply specify the JDBC connection information mentioned earlier, and the SQL statement and its parameters using the statement and statementParamAttrs parameters. When a tuple is received on the required input port, the operator runs the SQL statement.

For example, you can Insert a row in a table:

stream <insertSchema> insert = JDBCRun(create){
    param
        jdbcDriverLib: $jdbcDriverLib;
        jdbcClassName: $jdbcClassName;
        jdbcUrl: $jdbcUrl;
        jdbcUser: $jdbcUser;
        jdbcPassword: $jdbcPassword; 

     statement : "INSERT INTO JDBCRUN_SAMPLE (ID, FNAME, LNAME, AGE, GENDER, SCORE, TOTAL) VALUES (?, ?, ?, ?, ?, ?, ?)";        
statementParamAttrs : "ID, FNAME, LNAME, AGE, GENDER, SCORE, TOTAL"; 
}

As you can see, all that is needed are the connection information for the database, and the SQL statement you want to execute.

The operator expects that the input stream will contain attributes that match the column names in the statement, and that any columns in the result set of the statement will have corresponding attributes on the operator’s output stream.

One powerful feature of the JDBCRun operator is that the SQL statement to be executed does not have to be specified at compile time, but it can be bound to an attribute on the operator’s input stream.

In the snippet below, the operator receives a generated SQL statement and executes it:

  stream<selectSchema >genSelect = Functor(insert){
    output
       genSelect: sql="SELECT ID, FNAME, LNAME, AGE, GENDER, SCORE, TOTAL FROM JDBCRUN_SAMPLE";
}
        
  stream<rsSchema> select = JDBCRun(genSelect){
    param
        jdbcDriverLib: $jdbcDriverLib;
        jdbcClassName: $jdbcClassName;
        jdbcUrl: $jdbcUrl;
        jdbcUser: $jdbcUser;
        jdbcPassword: $jdbcPassword;      
     statementAttr: sql;
}

Detailed SPL examples

For detailed examples for connecting to Microsoft, Teradata, Oracle, Postgres, Db2 and Db2 Warehouse, please see the toolkit homepage.

Python example: Connect to Db2 Warehouse from a notebook

This video shows how to use the streamsx.database package to perform SQL queries in an application running in IBM Streams.

The Streams application is created in a Python notebook running in IBM Cloud Pak for Data, but this API can be used on a local Streams installation or the Streaming Analytics service.

Try it

The sample notebook is available for download from GitHub.

Links

#CloudPakforDataGroup