Learn how to create a connection to Db2 Warehouse on Cloud (formerly dashDB for Analytics) data in IBM Watson Studio and load the data in a Scala notebook.


Try the tutorial

Learn how to use Spark SQL to load and analyze Db2 Warehouse on Cloud data using a Scala notebook in in IBM Watson Studio.

Before you begin

Complete these tasks before starting this tutorial:

  • Provision an instance of the IBM Db2 Warehouse on Cloud service.
  • Download this SALES table, and import it into your Db2 Warehouse on Cloud service instance.
    Make sure you don’t use any dashes (-) in the table name. For example, do not use SALES-TABLE, but instead use SALES.
  • After you load the data into the table, go to your tables and verify the table definition and browse the data. There should be 104 rows of data on the table found. If the table is empty, then reload the data into the existing table.

Procedure 1: Set up the connection to your Db2 Warehouse on Cloud instance

  1. Log in to IBM Watson Studio at http://dataplatform.cloud.ibm.com.
  2. Open an existing project, or create a new project.
  3. From the Add to project menu, select Connection.
  4. Select your Db2 Warehouse on Cloud service instance.
  5. Click Create.

Procedure 2: Create a Scala notebook to analyze the Db2 Warehouse on Cloud data

  1. From the Add to project menu, select Notebook.
  2. Specify a notebook name and description.
  3. Select a runtime environment with Spark and Scala, and then select the Spark and Scala versions.
  4. Click Create Notebook.

Procedure 3: Analyze the data in a notebook

  1. Click the Find and Add Data icon.
  2. Click the Connections tab.
  3. Under your Db2 Warehouse on Cloud connection, click Insert to code, and then select Insert SparkSession DataFrame.
  4. Select the Schema, then the table, and click Select.
  5. In the inserted code, change the DataFrame variable to dashdata.
  6. Next to this first cell, click the Run icon.
  7. Paste the following statement into the third cell, and then click Run. This third command takes the dashdata and registers it as a table called salesdata.
    dashdata.createOrReplaceTempView("salesdata")
  8. Paste the following statement into the fourth cell, and then click Run. This next command lets you take a look at that schema.
    dashdata.printSchema
  9. Paste the following statement into the fifth cell, and then click Run. The dashdata.collect command shows the data collected for each sales associate.
    dashdata.collect
  10. Paste the following statement into the sixth cell, and then click Run. Once the data is registered as a table, you can use SQL to process the data. This next line returns the SALESMAN_ID data from the salesdata table.
    val results = spark.sql("SELECT SALESMAN_ID from salesdata")
    results.collect
  11. Paste the following statement into the seventh cell, and then click Run. This command retrieves the information for a specific sales person by ID.
    val salesman = spark.sql("select * from salesdata where SALESMAN_ID='NC100'")
    salesman.collect

2 comments on"Analyze Db2 Warehouse on Cloud Data in a Jupyter Notebook"

  1. Joongyeop Kim May 24, 2019

    In my case, I made connection with external but, it shows error message “An error occurred loading connections information”.

Join The Discussion

Your email address will not be published. Required fields are marked *