Introduction

There is a requirement by many customers to store and query a JSON column that is defined as part of a DB2 table so that they may continue to use the SQL for tuning and monitoring optimizations available within DB2. APIs and JSON access method introduced in DB2 do not deal with existing relational tables so, dashDB here for analytics, using R Scripts JSON capability comes handy and enables users to interact with JSON data in many ways.

This tutorial provides step-by-step instructions on creating, developing, and deploying R scripts to get SQL, join Query results in JSON format as well as learn how to work with Tables and SQL Query execution. This article will also guide you in handling the scenario when you are remotely running an R script that is already deployed on dashDB analytics service using IBM Bluemix.

Table creation and Data loading using dashDB analytics service

Here our intention is not to do heavy analysis work, generate big reports using dashDB analytics and in particular, tables that are column-organized by default due to the fact that column organized table have issues in supporting various data types such as LOB,XML etc. If you need a regular row-organized table in dashDB for Analytics, then the table needs to be created with ORGANIZED BY ROW.

Depending on your account type, your Bluemix menu will contain the hamburger icon hamburger and additional links. If you are an existing user with a Bluemix account, you can use the hamburger menu to switch between the Apps and Services dashboards. From there, you can launch the dashDB console.

There are two different methods available for creating a new table. You can browse for an Excel or a CSV file to define the table. When you use this method, dashDB generates the “Create Table” statement based on the content provided in the CSV file. Here are the steps you need to follow.

    a) Click Go to your tables.

      b)

    1. Click the Schema drop-down box to see a list of the default tables.
    2. Click Add Table.
    3. Click Browse.
    4. Locate the CSV file, and click Open.
    5. Trend of a function over builds in last month
    6. Highlight the auto-generated DDL statement.

The above mentioned is a simple way to create a table.  Alternately, you can also write your own DDL statements containing, create statements for the customer, sales, and orders tables. You can include multiple tables in one run. Execute the below mentioned steps to include multiple tables in one run, after performing above steps mentioned at point b(1,2).

  1. Highlight the pasted DDL statements.
  2. Click Run DDL.

Another way to create tables is using the Run SQL tool which lets you run DDL and SQL statements. This article is demonstrating how to execute SQL queries using RUN SQL Tool. Here are the steps:

  1. Click Run SQL.
  2. Select all of the statements, and press Run All.

In this solution, the original tables are created as row-organized tables,like this:


CREATE TABLE "DASH5976"."LIST" (
"ID" VARCHAR(4) NOT NULL ,
"TXT" VARCHAR(40) NOT NULL ,
"IS_ACTIVE_YN" SMALLINT NOT NULL )
ORGANIZE BY ROW;

ALTER TABLE "DASH5976"."LIST"
ADD CONSTRAINT "LIST_PK" PRIMARY KEY
("ID",
"TXT");


CREATE TABLE "DASH5976"."COURSE" (
"COURSEPK_ID" BIGINT NOT NULL GENERATED BY DEFAULT AS IDENTITY,
"ID" VARCHAR(4) NOT NULL,
"COR_ID" VARCHAR(20) NOT NULL,
"TITLE" VARCHAR(60) NOT NULL,
"DESC" VARCHAR(60) NOT NULL,
"PREREQ" VARCHAR(20),
"POSTREQ" VARCHAR(20),
"IS_AP_YN" SMALLINT,
"IS_IB_YN" SMALLINT,
"IS_HONORS_YN" SMALLINT,
"IS_REQUIRED_YN" SMALLINT,
"IS_ELECTIVE_YN" SMALLINT,
"CREDIT" VARCHAR(20),
"SUBJECT" VARCHAR(60),
"LAST_UPDATE_DT" TIMESTAMP NOT NULL,
"LAST_UPDATE_TX_ID" BIGINT,
"LAST_UPDATE_USER" VARCHAR(20)
)
DATA CAPTURE NONE
ORGANIZE BY ROW;

ALTER TABLE "DASH5976"."COURSE"
ADD PRIMARY KEY (
COURSEPK_ID
);


INSERT INTO "DASH5976"."LIST" VALUES ('D000','ABHSIHE',1);
INSERT INTO "DASH5976"."COURSE" VALUES(1,'D000','C6892223','ALGEBRA-1','NA','NA','NA',NULL,NULL,1,1,1,1,'Math','2017-04-27 21:10:14',NULL,'P');
INSERT INTO "DASH5976"."COURSE" VALUES(2,'D000','C6892224','POETRY','NA','NA','NA',NULL,NULL,1,1,1,1,'Math','2017-04-27 21:10:14',NULL,'P');
INSERT INTO "DASH5976"."COURSE" VALUES(3,'D000','C6892225','TRIGNOMETRY','NA','NA','NA',NULL,NULL,1,1,1,1,'Math','2017-04-27 21:10:14',NULL,'P');
INSERT INTO "DASH5976"."COURSE" VALUES(4,'D000','C6892226','HISTORY','NA','NA','NA',NULL,NULL,1,1,1,1,'Social Studies','2017-04-27 21:10:14',NULL,'P');
INSERT INTO "DASH5976"."COURSE" VALUES(5,'D000','C6892227','ALGEBRA-2','NA','NA','NA',NULL,NULL,1,1,1,1,'Math','2017-04-27 21:10:14',NULL,'P');
INSERT INTO "DASH5976"."COURSE" VALUES(6,'D000','6892251','TRIGNOMETRY-1','NA','NA','NA',NULL,NULL,1,1,1,1,'Math','2017-04-27 21:10:14',NULL,'P');
INSERT INTO "DASH5976"."COURSE" VALUES(7,'D000','6892252','TRIGNOMETRY-2','NA','NA','NA',NULL,NULL,1,1,1,1,'Math','2017-04-27 21:10:14',NULL,'P');
INSERT INTO "DASH5976"."COURSE" VALUES(8,'D000','6892253','GEOMETRY-1','NA','NA','NA',NULL,NULL,1,1,1,1,'Math','2017-04-27 21:10:14',NULL,'P');
INSERT INTO "DASH5976"."COURSE" VALUES(9,'D000','6892254','GEOMETRY-2','NA','NA','NA',NULL,NULL,1,1,1,1,'Math','2017-04-27 21:10:14',NULL,'P');
INSERT INTO "DASH5976"."COURSE" VALUES(10,'D000','6892255','ALGEBRA-3','NA','NA','NA',NULL,NULL,1,1,1,1,'Math','2017-04-27 21:10:14',NULL,'P');
INSERT INTO "DASH5976"."COURSE" VALUES(11,'D000','6892256','TRIGNOMETRY-3','NA','NA','NA',NULL,NULL,1,1,1,1,'ENGLISH','2017-04-27 21:10:14',NULL,'P');
INSERT INTO "DASH5976"."COURSE" VALUES(12,'D000','6892257','GEOMETRY-3','NA','NA','NA',NULL,NULL,1,1,1,1,'ENGLISH','2017-04-27 21:10:14',NULL,'P');

The next step is to select the target to load your data. You may choose to select an existing table, if you’ve created one yourself, or you can let dashDB create one for you based on the contents of your file.

For convenience sake, above Insert statements into table has been also executed using this tool otherwise in case if you want to populate data into a table in your IBM dashDB database from a local file residing on your machine then loading your data depends on type of load you’d like to perform. Since your data is on the same machine as your browser, select Load From Desktop. Supported file types are CSV, XLS, XLSX. Please follow the steps outlined below:

  1. Hover over Load in left menu pane to surface menu.
  2. Click on Load from Desktop.
  3. Screen loads and displays.
  4. Select a source.
  5. Click on Browse. Navigate to the appropriate file and select it.

Developing and running R scripts

Use the RStudio Integrated development environment(IDE) available at IBM dashDB for analytics in Bluemix to develop and run R scripts in the cloud. You can use a REST API deploy and run R scripts remotely. Use the IBM dashDB API to issue, from any location, REST calls to deploy and run R scripts within dashDB.

To Access the integrated RStudio development environment, follow these steps to log in to the integrated version of RStudio.

  1. Open the Connection Information page by clicking Connect > Connection Information on the dashDB start page. The Individual database settings section of that page displays your dashDB user ID and password:

  2. Open RStudio(at dashDB home page click Analytics > R Scripts),to sign in, copy your user ID and password from the Connection Information page, paste them into the corresponding fields on the RStudio sign-in page, and click Sign In and save your script with extension “.R”.

Here the scenario is we are remotely running an R script developed using RStudio environment that is already deployed on dashDB so we submit an HTTP POST request that calls the /rscript/filename endpoint, where filename represents the path to run an R script file that is located in the home directory, such as the sample dashDB R scripts and scripts generated by the integrated RStudio. For example, to run the file scripts/rscript/MyScript.R, issue the following cURL command (replace the user ID, password, and host name).

curl --user " userid:password" -X POST -H "Content-Type: application/json" "https://[hostname]:[port]/dashdb-api/rscript/ MyScript.R"

To find and retrieve generated output files (if any), submit an HTTP GET request that calls the /home endpoint and use the GET /home/{filename} endpoint to retrieve output files.

curl --user "userid:password" -X GET -H "Content-Type: application/json" "https://[hostname]:[port]/dashdb-api/home/MyScript.R"

Above curl command gives the result in form of commands used in R script(in this case it is described as “MyScript.R”) which is the same Script which is developed using R Studio IDE above. The result will be like this:


library(ibmdbR)
library(RODBC)
library(jsonlite)
mycon <- idaConnect("BLUDB", "dash5976", "PASSWORD")
idaInit(mycon)
pundat <- sqlQuery(mycon, "SELECT * FROM COURSE A,LIST B where A.ID = B.ID")
b<- toJSON(pundat)
a <- fromJSON(b)
print(a)
write_json(a,"abkdemo2.json")
idaClose(mycon)

Note that the statement that is used to create the connection object uses the idaConnect() method. idaInit initialize the analytics package while sqlQuery method is used to submit a query to database and return the results. The jsonlite package is a JSON parser/generator and functions (toJSON and fromJSON) are used to convert between JSON data and R objects. write_json method saves the content of a DataFrame as a JSON file (one object per line) and finally scripts use idaClose() method to close the connection.

Once we get the JSON result of SQL Query which uses joins of tables syntax then, we can expose results as JSON REST API to client application for use. The following snapshot demonstrates the use of POSTMAN that Construct requests quickly, save them for later use and analyze the responses sent by the API. Use the GET endpoint https://[hostname]:[port]/dashdb-api/home/abkdemo2.json/ to retrieve JSON results like this:



Summary

This tutorial shows ways to leverage R Scripts using IBM dashDB for analytics on IBM Bluemix to expose complex tables joins results as an JSON REST API, considering choosing either of these tables depending on your environment. With only a few configuration steps and a few lines of source code, you can see how easy it is to generate and expose JSON results from R Scripts used in dashDB for analytics service. That’s the magic of Bluemix. Enjoy it.

References

  1. Getting started with IBM dashDB for Analytics API Reference
  2. https://developer.ibm.com/static/site-id/85/api/dashdb-analytics/

  3. Developing application that interacts with dashDB service
  4. https://developer.ibm.com/clouddataservices/docs/dashdb/

  5. Technical issue related to dashDB and its resolution
  6. http://stackoverflow.com/questions/tagged/dashdb?cm_mc_uid=27902871580014906266738&cm_mc_sid_50200000=1491980258/

Join The Discussion

Your email address will not be published. Required fields are marked *