Learn how to create a connection to Db2 Warehouse on Cloud (formerly dashDB for Analytics) data in IBM Analytics for Apache Spark, and load data in a Scala notebook.
If you’re using Db2 Warehouse Private (formerly dashDB Local), see Use Db2 Warehouse Private with Spark and Notebooks for more videos.
Try the tutorial
Before you begin
Complete these tasks before starting this tutorial:
- If you do not have the Db2 Warehouse on Cloud service provisioned, refer to the Get started with Db2 Warehouse on Cloud (formerly dashDB) on IBM Cloud video and tutorial to learn how to provision the service.
- Download this SALES table, and import it into your Db2 Warehouse on Cloud service instance.
Make sure you don’t use any dashes (-) in the table name. For example, do not use SALES-TABLE, but instead use SALES.
- After you load the data into the table, go to your tables and verify the table definition and browse the data. There should be 104 rows of data on the table found. If the table is empty, then reload the data into the existing table.
- Watch the Getting Started on IBM Cloud video to create a IBM Cloud account and add the IBM Analytics for Apache Spark service.
Procedure 1: Set up the connection to your Db2 Warehouse on Cloud instance
- Log in to Data Science experience at http://datascience.ibm.com.
- From the menu, click Connections.
- Click Create a Connection.
- Type a name and description for the connection.
- Select the service type which is Data Service in this case.
- From the Instance drop-down box, select the Db2 Warehouse on Cloud instance.
- Select the database which is BLUDB in this case.
- Click Create Connection.
Procedure 2: Create a Scala notebook to analyze the Db2 Warehouse on Cloud data
- Open an existing project, or create a new project.
- From the Files slide out panel, click Connections. Now select the Db2 Warehouse on Cloud connection you created earlier, and click Apply to add the connection to the list of data assets.
- Create a new notebook, specifying a name, description, Spark service to use, Scala, and Spark 2.0.
- Paste the following SQL statement into the first cell in the notebook, and then click the Run icon on the toolbar. This first command contains SQLContext which is the entry point into all functionality in Spark SQL and is necessary to execute SQL queries.
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
- Paste the following SQL statement into the second cell, and then click Run. Replace the JDBCURL with the JDBC URL for your Db2 Warehouse on Cloud instance, such as, jdbc:db2://bluemix05.bluforcloud.com:50000/BLUDB. Also replace account and password with the username and password for your Db2 Warehouse on Cloud instance. If you don’t know your credentials, click Insert to code to see your credentials. This second command loads the sales table from the Db2 Warehouse on Cloud account and assigns it to the dashdata variable.
val dashdata = sqlContext.load("jdbc", Map( "url" -> "<JDBCURL>:user=<account>;password=<password>;", "dbtable" -> "<account>.SALES"))
val dashdata = sqlContext.load("jdbc", Map( "url" -> "jdbc:db2://awh-yp-small02.services.dal.bluemix.net:50000/BLUDB:user=dash######;password=######;","dbtable" -> "dash######.SALES"))
- Paste the following SQL statement into the third cell, and then click Run. This third command takes the dashdata and registers it as a table called salesdata.
- Paste the following SQL statement into the fourth cell, and then click Run. This next command lets you take a look at that schema.
- Paste the following SQL statement into the fifth cell, and then click Run. The dashdata.collect command shows the data collected for each sales associate.
- Paste the following SQL statement into the sixth cell, and then click Run. Once the data is registered as a table, you can use SQL to process the data. This next line returns the SALESMAN_ID data from the salesdata table.
val results = sqlContext.sql("SELECT SALESMAN_ID from salesdata")
- Paste the following SQL statement into the seventh cell, and then click Run. This command retrieves the information for a specific sales person by ID.
val salesman = sqlContext.sql("select * from salesdata where SALESMAN_ID='NC100'")