Win $20,000. Help build the future of education. Answer the call. Learn more

Improve performance for your data virtualization data sources with remote connectors

Enterprise data is often spread across various data stores, such as data marts, data warehouses, and data lakes. Companies often seek to break down these silos by copying all the data into a central store for analysis. Such duplication of data can cause issues such as stale data and additional costs for managing the central data store.

Data virtualization provides a simplified approach to handle such data silos by querying multiple data sources without copying and replicating the data. This, in turn, reduces costs, simplifies analytics, and ensures that each user is accessing the latest data since the data is being accessed directly from the source and not from a copy.

If the data to be used for data virtualization is in a remote file system or within a database in a private server, you must install a remote connector in order to connect to the data source. In addition to facilitating access of such remote data, remote connectors also help improve performance by enhancing parallel processing and filtering data at the source when dealing with large data sets.

In this tutorial, you will learn how to use remote connectors to improve the performance of your data source connections in data virtualization on IBM Cloud Pak for Data. You will learn how to set up a remote connector in order to connect to a MongoDB instance present in a different data center as compared to the IBM Cloud Pak for Data instance. Then, using data virtualization, you will learn to make queries across the MongoDB instance and a Db2 Warehouse instance that is in the same data center as the IBM Cloud Pak for Data instance.

Architecture diagram

Learning objectives

In this tutorial, you will learn how to:

  • Add a data source for data virtualization
  • Create a remote connector
  • Add a data source for data virtualization using the remote connector
  • Virtualize the data from both data sources and create a joined view
  • Assign virtualized data to a project

Prerequisites

Estimated time

This tutorial will take 45-60 minutes to complete.

Step 1. Download the data

You will be working with a synthetic credit risk training data set based on the UCI German Credit dataset. The data set contains information about loan applicants with 20 attributes for each applicant.

Download the three data files:

  1. applicant_financial_data.csv
  2. applicant_loan_data.csv
  3. applicant_personal_data.csv

All three files have one common column between them: CUSTOMERID.

Step 2. Set up the Db2 Warehouse on Cloud instance in the same location as IBM Cloud Pak for Data

Create the Db2 Warehouse on Cloud instance

  • Start by creating a Db2 Warehouse on Cloud instance in the same location as your IBM Cloud Pak for Data instance. Assuming your IBM Cloud Pak for Data instance is in us-south, you will need to create the Db2 Warehouse on Cloud instance also in us-south.

  • Using a browser, go to your IBM Cloud account and create a Db2 Warehouse on Cloud instance. Select the Dallas (us-south) location and select a pricing plan of your choice. Cloud - Db2WH - create instance - location

  • Scroll to the bottom of the page and provide a name for the Db2 Warehouse on Cloud service instance. Optionally, update the resource group and provide any tags. Select the Datacenter Location as Dallas and click Create to create the service instance. Cloud - Db2WH - create instance - name

  • The service instance will take some time to be provisioned. Once the service status is updated to Active in your Resource List, click on the name of the service instance. Cloud - Db2WH - go to instance

Obtain service credentials for the Db2 Warehouse instance

  • Go to the Service credentials tab and click on New credential +. In the pop-up, click Add to create the credentials. Cloud - Db2WH - create service creds

  • Once the credentials are created, click on the Copy icon to copy the credentials. Store the credentials for later because you will be using them to connect to this Db2 Warehouse instance from IBM Cloud Pak for Data. Cloud - Db2WH - copy credentials

Obtain the SSL certificate for the Db2 Warehouse instance

  • Go to the Manage tab of the Db2 Warehouse instance and click on Open Console to open the Db2 Warehouse console. Cloud - Db2WH - open console

  • On the console, go to the Administration tab using the icon on the left side. Click on Connections to go to the Connections tab, then click the Download SSL Certificate button. Db2WH - Download SSL cert

  • You will need to convert the SSL certificate from .crt to a .pem file using OpenSSL. Run the following command in a terminal on your local machine:

    openssl x509 -in DigiCertGlobalRootCA.crt -out DigiCertGlobalRootCA.pem -outform PEM -inform DER
    
  • Save this file for later use.

Load data into the Db2 Warehouse instance

  • Back on the Db2 Warehouse console, go to the Data tab using the icon on the left side. Click on Load Data to go to the Load Data tab. Click browse files and select the applicant_financial_data.csv you downloaded earlier. Once the file is selected, click Next. Db2WH - select file

  • Click on New schema +. Provide a name for the new schema (CREDITRISK) and click Create. Db2WH - create schema

  • The newly created schema is selected. Click on New table + to create a new table within the schema. Provide a name for the new table (APPLICANT_FINANCIAL_DATA) and click Create. Once the table is created, click Next. Db2WH - create table

  • Click Next. Db2WH - column definitions

  • Click Begin Load. Db2WH - begin load

Step 3. Set up a MongoDB instance

  • You will need to create a MongoDB instance in a different location compared to your IBM Cloud Pak for Data and Db2 Warehouse on Cloud instances. It was previously assumed that both the IBM Cloud Pak for Data and Db2 Warehouse on Cloud instances were in the us-south location, so the MongoDB instance can be created in the us-east location.

  • Using a browser, go to your IBM Cloud instance to create a Databases for MongoDB instance. Provide a name for the service instance and set the location as Washington DC. Optionally, update the resource group and provide any tags. Select the resources you want to allocate for the MongoDB instance. For this tutorial, you can go to the Custom tab and type in the values for RAM, Disk Usage, and Dedicated Cores as 1, 10, and 0, respectively. Click Create to create the service instance. Cloud - MongoDB - create instance

  • The service instance will take some time to be provisioned. Once the service status is updated to Active in your Resource List, click on the name of the service instance. Cloud - MongoDB - go to instance

Set the Admin password for MongoDB

  • Go to Settings and look for the Change Database Admin Password section. Type in a password you want to use for the admin user and click Change Password. Cloud - MongoDB - admin password

Obtain service credentials for the MongoDB instance

  • Go to the Overview tab and look for the Endpoints section. Click on the MongoDB tab. Cloud - MongoDB - Endpoints 1

  • Scroll to the bottom of the page. Copy any one pair of hostname and port, as well as the value for Replica set. You will need these to connect to the MongoDB instance. Click on the Download button to download the SSL certificate. Cloud - MongoDB - Endpoints 2

Connect to MongoDB using MongoDB Compass

MongoDB Compass is a graphical user interface to view and work with MongoDB data.

  • Download and install MongoDB Compass from MongoDB.

  • Open MongoDB Compass, then click on Fill in connection fields individually. MongoDB - new connection

  • On the Hostname tab, provide the hostname and port you copied earlier. Select the Authentication as Username / Password. Type in the Username admin and provide the admin password you set earlier as the password for the admin user. MongoDB - hostname tab

  • Go to the More Options tab. Provide the Replica Set name you copied earlier. Select the value for SSL as Server Validation. Click on Select files… and select the certificate you downloaded earlier. Click Connect. MongoDB - more options tab

Once the connection is successfully added, you should see a list of the databases available in your MongoDB instance.

MongoDB - connected

Load data into the MongoDB instance

  • Click on CREATE DATABASE and provide a name for your database (CREDIT_RISK). You also need to provide a collection name when creating a database. Let’s start with APPLICANT_LOAN_DATA. You can create additional collections once the database has been created. Click on Create Database. MongoDB - create database

  • Go to the newly created database by clicking on its name, CREDIT_RISK. MongoDB - go to CREDIT_RISK

  • Click on the collection name, APPLICANT_LOAN_DATA. MongoDB - Ap_Loan_Data - go to collection

  • Click on ADD DATA > Import File. Alternatively, you can click Import Data in the center of the screen. MongoDB - Ap_Loan_Data - import file

  • In the pop-up, click BROWSE and select the applicant_loan_data.csv file you downloaded earlier. Once the file is selected, you will see the first 10 records from the file listed on the screen. Click IMPORT to import the contents of the file into the collection. MongoDB - Ap_Loan_Data - import

  • A progress bar at the bottom of the pop-up will show the progress of the import process. Once the import process is complete, click DONE. MongoDB - Ap_Loan_Data - import successful

  • The data is now successfully loaded into the collection. Click on CREDIT_RISK in the breadcrumbs at the top of the screen to go back to the database. MongoDB - Ap_Loan_Data - view data

  • Click on CREATE COLLECTION. In the pop-up, provide APPLICANT_PERSONAL_DATA as the collection name and click Create Collection. MongoDB - Ap_Personal_Data - create collection

  • Click on the collection name, APPLICANT_PERSONAL_DATA. MongoDB - Ap_Personal_Data - go to collection

  • Click on ADD DATA > Import File. Alternatively, you can click Import Data in the center of the screen. MongoDB - Ap_Personal_Data - import file

  • In the pop-up, click BROWSE and select the applicant_personal_data.csv file you downloaded earlier. Once the file is selected, you will see the first 10 records from the file listed on the screen. Click IMPORT to import the contents of the file into the collection. MongoDB - Ap_Personal_Data - import

  • A progress bar at the bottom of the pop-up will show the progress of the import process. Once the import process is complete, click DONE. MongoDB - Ap_Personal_Data - import successful

The data is now successfully loaded into the collection.

MongoDB - Ap_Personal_Data - view data

Step 4. Set up a Virtual Server in the same location as the MongoDB instance

Locate or create an SSH key

  • Before setting up a virtual server, you need to locate or create your SSH key. The SSH key must be an RSA key with a key size of either 2048 bits or 4096 bits.

  • If you have already created an SSH key, it will be present as a file called id_rsa.pub within the .ssh folder in your home directory. For example, on a Linux or Mac system, it should be at /Users/<username>/.ssh/id_rsa.pub; and on a Windows system, it should be at C:\\Users\\<username>\\.ssh\\id_rsa.pub, where <username> is your user on the Linux/Mac/Windows machine.

  • If you don’t have an SSH key, you can create it by running the following command in your terminal/command prompt (NOTE: For Windows systems, you can use PuTTYgen to generate an SSH key):

ssh-keygen -t rsa -b 4096 -C "user_ID"

user_ID is your IBM Cloud Pak for Data user ID.

Provision a virtual server

Next, you will need to set up a Virtual Server in the same location as the MongoDB instance.

  • Using a browser, go to your IBM Cloud account to create a Virtual Server instance. Select the type of virtual server as Public and give your virtual server a name. Select the location as NA East. Cloud - VM - name

  • Scroll down to the Profile section. The default profile Balanced | B1.2x4, which has 2 vCPU and 4 GB RAM, will suffice for this tutorial. If you want to choose a larger profile, you can do so by clicking on View all profiles. Next, you need to provide your SSH key. Click on Add key +. Cloud - VM - profile

  • In the pop-up, provide a name and an optional description for your SSH key and paste the contents of the public key (id_rsa.pub) you located/generated earlier. Click Add. Cloud -  VM - SSH

  • In the Operating system section, you can choose the operating system for the virtual server. For this tutorial, we will use CentOS, so you can leave the default values. Cloud - VM - OS

  • Scroll down to the Attached storage disks section. Here you can specify the boot disk size for this virtual machine. For this tutorial, the default 25GB size should suffice. The Private VLAN value is auto-assigned by default. If you wish, you can select a specific private VLAN after clicking on -Auto assigned- to open the list of available private VLANs. Finally, click Create to create the virtual server. Cloud - VM - create

  • If the private VLAN value was set to auto-assigned, and if there were more than one private VLANs available, you will see a pop-up that says a VLAN will be assigned automatically and you will not be able to change the value after the device is provisioned. Click Accept to continue provisioning the device. Cloud - VM - VLAN accept

  • You will be brought to the Classic Infrastructure Device List of your IBM Cloud account, where you can see a new entry for the virtual server you just created. Click on the arrow icon against your virtual server to expand the entry. It can take upto 20 minutes for the virtual server to be provisioned. Once ready, the server will show up as “Powered On” and “Connected.” The username and password needed to access the virtual server are shown. You can click on the eye icon to view the password. Click on the name of your virtual server. Cloud - VM - created

Install required packages on the virtual server

  • On the Overview tab, look for the public IP address. You will need this value to SSH into the virtual server. (NOTE: You only need the IP address – the portion before the /) Cloud - VM - get IP address

  • Open a terminal and SSH into the virtual server using the following command. (NOTE: If you have a Windows machine, use Putty to SSH into the virtual server.)

ssh root@<IP-address>

<IP-address> is the public IP address for your virtual server.

  • Once you have connected to the virtual server, you will be using yum to install the required packages. Start by updating yum using the following command:
yum update
  • When you get a prompt to confirm installation, such as the one below, type y and press the Return key.
Is this ok [y/N]:
  • Install curl and tar using the following commands. As before, when you get a prompt to confirm the installation, type y and press the Return key.
yum install curl
yum install tar
  • Next, you need to download and install IBM Java 8. Download the right version of the installer for your virtual server.

  • Use scp to copy the installer from your local machine to the tmp folder of the virtual server by running the following command on a terminal in your local machine:

scp <downloads-folder>/<installer> root@<IP-address>:/tmp

<downloads-folder> is the location where the installer was downloaded to, <installer> is the filename for the downloaded installer, and <IP-address> is the public IP address for your virtual server.

  • Back on your virtual server, use a terminal to ensure that the installer is executable, then run the installer:
chmod +x <installer>
./<installer>

Follow the instructions and prompts to install IBM Java 8. After the installation completes successfully, the installer will display the location where Java was installed. Note down the location for later use.

Create required user and folder structure on the virtual server

  • Use the command below to create a new user called dv-user on the virtual server:
adduser dv-user
  • Create a dvendpoint directory within the dv-user home directory using the command below:
mkdir /home/dv-user/dvendpoint

Step 5. Provision Data Virtualization on IBM Cloud Pak for Data

  • Log into your IBM Cloud Pak for Data instance as the admin user. CPD - login

  • Go to the hamburger (☰) menu, expand Services and click Services catalog. CPD - services catalog

  • Expand Category and choose the Data sources category on the left-hand side. Click on the tile for Data Virtualization. CPD - service - DV

  • Click the Provision instance button. CPD - deploy data virtualization

Follow the instructions to provision Data Virtualization.

NOTE: For deployment using Managed OpenShift, you must do the following:

  1. Decide whether to check the Updated the kernel semaphore parameter checkbox.
  2. Do NOT choose the defaults for storage. If you use Portworx storage, select portworx-db2-rwx-sc as the storage class for IBM Cloud Pak for Data v4.0 (portworx-dv-shared-gp3 for IBM Cloud Pak for Data v3.5). Otherwise choose ibmc-file-gold-gid as the storage class.

Step 6. Add the Db2 Warehouse instance as a data source to Data Virtualization

Next, you will add the Db2 Warehouse instance as a Platform Connection and add the platform connection as a data source for Data Virtualization.

Add the Db2 Warehouse instance as a Platform Connection to IBM Cloud Pak for Data

  • Go to the hamburger (☰) menu, expand Data and click Platform connections. CPD - go to platform connections

  • Click on New connection +. CPD - platform connection - new

  • Click on Db2 Warehouse to select it as the type of data source you want to create, then click Select. CPD - platform connection - Db2WH

  • Provide a name (Db2 Warehouse for DV-RC) and an optional description for the connection. CPD - platform connection - conn 1

  • Scroll down to the Connection details section. Provide the database name, hostname, and port for the Db2 Warehouse instance from the connection details you obtained earlier. CPD - platform connection - conn 2

  • Scroll down to the Credentials section. Provide either the API key or the username and password for the Db2 Warehouse instance. CPD - platform connection - conn 3

  • Scroll down to the Certificates section. Check the box for Port is SSL-enabled, then provide the PEM SSL certificate for the Db2 Warehouse instance. Click Test connection to test your connection. If the test was successful, you will see a notification for the same. Finally, click Create to create the connection. CPD - platform connection - conn 4

You will be brought back to the Platform Connections page, where you can now see an entry for the new Db2 Warehouse connection.

CPD - platform connection - conn 4

Add the existing Db2 Warehouse connection as a data source to Data Virtualization

  • Go to the hamburger (☰) menu, expand Data, and click Data virtualization. CPD - go to data virtualization

  • Click on Add connection +, then click on Existing connection. CPD - DV - add existing connection

  • Click on the radio button for the Db2 Warehouse platform connection and click Add. CPD - DV - add Db2WH

  • You will not be adding a remote connector to this connection, so click Skip. CPD - DV - skip RC

The data connection will be added as a data source for Data Virtualization, and you should see the data connection listed on the Data sources screen.

CPD - DV - add Db2WH - success

Step 7. Add MongoDB as a data source to Data Virtualization using remote connectors

You will now add the MongoDB instance as a data source to Data Virtualization. You will start by setting up a remote connector on the Virtual Server you created, then add the MongoDB instance as a data source to Data Virtualization using the remote connector.

Set up a remote connector on Data Virtualization

  • On the Data sources page of Data Virtualization, click on Set up remote connector. CPD - DV - set up remote connector

  • Provide a name (DV_RC_for_MongoDB) and an optional description for the remote connector. Select the data source OS as Linux and specify where Java is installed on your virtual server. You need to provide the path to the jre directory within the Java installation directory. You had obtained this path at the end of the IBM Java 8 installation. It may be something like /opt/ibm/java-x86_64-80/jre/. Provide /home/dv-user/dvendpoint as the location where you want to install the remote connector. Click Generate script. CPD - DV - RC script - 1

  • Scroll down to find the generated script. Click on the download button to download the script to your local machine. CPD - DV - RC script - 2

  • Copy this script to the /home/dv-user/dvendpoint of the virtual server by running the following command in a terminal on your local machine:

scp <downloads-folder>/dv_endpoint.sh root@<IP-address>:/home/dv-user/dvendpoint

<downloads-folder> is the location where the dv_endpoint.sh script was downloaded to, and <IP-address> is the public IP address for your virtual server.

  • SSH into the virtual server:
ssh root@<IP-address>
  • Go to the dvendpoint directory and make sure that the dv_endpoint.sh script is executable:
cd /home/dv-user/dvendpoint
chmod +x dv_endpoint.sh
  • Run the dv_endpoint.sh script:
./dv_endpoint.sh
  • Verify that the process is running on the port mentioned in the output of the script:
netstat -na | grep "<port-number>"

<port-number> is the port number where the script remote connector is running.

  • Back on your IBM Cloud Pak for Data instance, click Done. CPD - DV - RC script - 3

Add MongoDB as a data source to Data Virtualization using the remote connector

The next step is to add the MongoDB instance as a source to Data Virtualization using the remote connector.

  • On the Data Sources page, click on Add connection > Remote data source. CPD - DV - add remote data source

  • You should see an entry for the remote connector you set up on the virtual server. Click on the 3 vertical dots on the far right of that entry to open the overflow menu and click on Search data sources via another host. CPD - DV - search data sources via another host

NOTE: If you don’t see your remote connector listed here, you may have to manually configure the connection from Data Virtualization to the remote connector using the instructions in the Troubleshooting section.

  • In the pop-up, provide the hostname of your MongoDB server that you obtained earlier and click Search. CPD - DV - search data sources VAH - search

  • You should now see an entry for the MongoDB host under the Remote connector. Click on the radio button to choose this host/port. Use the drop-down under Type and select Mongo DB, then click Configure. CPD - DV - search data sources - VAH - configure

  • Click on MongoDB to select the connection type and click Select. CPD - DV - search data sources VAH - conn 1

  • Type in a name for the connection (MongoDB for DV-RC) and an optional description. CPD - DV - search data sources VAH - conn 2

  • Scroll down to the Connection details section. Provide the Database as admin and provide the Hostname and Port for the MongoDB instance. CPD - DV - search data sources VAH - conn 3

  • Scroll down to the Credentials section. Provide the Authentication database as admin, the Username as admin, and provide the Password for the admin user in MongoDB. CPD - DV - search data sources VAH - conn 4

  • Scroll down to the Certificates section. Check the box for Port is SSL-enabled and provide the SSL certificate for the MongoDB instance. Provide the MongoDB Hostname and check the Validate the SSL certificate checkbox, then click Create. CPD - DV - search data sources VAH - conn 5

  • Once the connection is added, you should see an entry for the connection with the Status as “Configured and added”. Click Cancel to go back. CPD - DV - search data sources VAH - completed

  • The data connection will be added as a data source for Data Virtualization, and you should see both the connections listed on the Data sources screen. CPD - DV - data sources - completed

  • If you click on Constellation view, you will be able to see how the data sources are connected to Data Virtualization; the Db2 Warehouse data source is connected directly and the MongoDB data source is connected via the Remote Connector. CPD - DV - constellation view

Step 8. Create virtual tables using data virtualization

Virtualize the data

Now that the data sources are available in Data Virtualization, you can virtualize the data within the data sources.

  • Click on the Data sources drop-down. Expand Virtualization and click Virtualize. CPD - DV - menu - virtualize

  • Several tables will appear. Find and select the APPLICANT_LOAN_DATA and the APPLICANT_PERSONAL_DATA tables contained in the MongoDB connection, as well as the APPLICANT_FINANCIAL_DATA table contained in the Db2 Warehouse connection. You can also search for the tables using the search bar. Once selected, click Add to cart and then View cart. CPD - DV - add to cart

  • The next screen prompts you to choose whether you want to assign the virtualized data to a data request, a project, or to your virtualized data. Choose My virtualized data and click Virtualize. If you see a pop-up screen to “Confirm virtualization”, click Continue. CPD - DV - virtualize

  • You will see a notification that the virtual tables have been created. To see the newly virtualized data, click View my virtualized data. CPD - DV - virtualize completed

Join the virtualized data

The next step is to join the virtual tables that have been created in order to create a merged view of the data.

  • Select the APPLICANT_FINANCIAL_DATA and APPLICANT_LOAN_DATA tables, and click on the Join button. CPD - DV - view virtualized data

  • The columns of both the tables will be shown on the screen. To join the tables, you need to pick a key that is common to both the tables. In this case, the CustomerID column is common between the two tables, and you can mark this column as the key by clicking on the CustomerID column in one table and dragging it to the CustomerID column in the other table. When you see a line or curve connecting the columns in the two tables, click Next. CPD - DV - select join key

  • You can edit the column names for the joined view. But for now, you can leave the column names as they are and simply click Next.

  • On the next screen, provide a name for the joined view (to be consistent with SQL standards, pick an all-uppercase name, such as APPLICANTFINANCIALLOANDATA). Under Assign to, choose My virtualized data and click Create view. CPD - DV - create joined view

  • You will be notified that the join has succeeded. Click on View my virtualized data to view your virtualized data. CPD - DV - join success

  • Repeat the above joining process, but this time choose to join the view you just created (APPLICANTFINANCIALLOANDATA) and the last remaining virtualized table APPLICANT_PERSONAL_DATA, to create a new joined view that has data from all three tables.

  • As before, use the CUSTOMERID column as the join key.

  • Pick an all-uppercase name, such as APPLICANTFINANCIALLOANPERSONALDATA, for the new view, assign this new view to My virtualized data, and click Create view.

Once the view is successfully created, you should be able to see all three virtualized tables and the two joined views on the My virtualized data page. Do not go to the next section until you have all these tables.

CPD - DV - all virtualized views

Step 9. Grant access to the virtualized data

For other users to have access to the data you just virtualized, you need to grant it. Follow these steps to make your virtualized data is visible to them.

NOTE: It is assumed that the users have already been created in IBM Cloud Pak for Data.

Assign the Engineer role to the users

IBM Cloud Pak for Data users that need to use data virtualization functions must be added as users of Data Virtualization and need to be assigned specific roles based on their job descriptions. These roles are Admin, Engineer, User, and Steward. You will assign the Engineer role to some IBM Cloud Pak for Data users.

  • Open the Data Virtualization menu and click on User management. CPD - DV - user management

  • Click on Add users +. In the pop-up, select the users you wish to add to Data Virtualization, update the role of the users to Engineer, and click Add. CPD - DV - update user roles

Grant virtualized data access to data virtualization users

  • On the My virtualized data page, click on the three vertical dots to the right of the virtualized APPLICANTFINANCIALPERSONALLOANDATA view, and choose Manage access. CPD - DV - manage access to virt data

  • Click the Specific users radio button, then Grant access +. In the pop-up, select the users you wish to grant access to and click Add users. CPD - DV - grant access

You can repeat the above steps for any other tables and views that you want the other users to access.

Step 10. Users assign virtualized data

The users that have been granted access to the virtualized data can assign the data to their projects as an asset.

NOTE: It is assumed that users already have analytics projects to which they want to assign the virtualized data.

  • Log in as one such (non-admin) user. Go to the hamburger (☰) menu, expand Data, and click Data virtualization. CPD - DV - user - go to DV

  • Open the Data virtualization menu, expand Virtualization, and click My virtualized data. You should see the data you have virtualized or that you have been given access to (or that the administrator has assigned to you). CPD - DV - user - my virt data

  • Select the checkbox next to the datasets you want to use in your project and click the Assign button to start importing the data sets to your project. CPD - DV - select views to assign

  • On the Assign virtual objects screen, choose your analytics project from the drop-down list, then click the Assign button to add the data to your project. CPD - DV - assign

  • You will see a pop-up with confirmation that the objects have been assigned to your project. Click the Go to project button. CPD - DV - go to project

  • On the project page, clicking on the Assets tab will show all the virtualized data assets that are now in your project, along with other assets that may be in the project. CPD - project assets

Troubleshooting

The remote connector is essentially a gateway to your remote data sources. A remote connector host machine with network access to the IBM Cloud Pak for Data cluster should be able to automatically contact and connect to the cluster. However, this may not happen if the automated discovery port is not exposed, if there are firewall rules that prevent access, or if the physical network configuration allows only one-way communication from IBM Cloud Pak for Data to the remote connector. As a result, the remote connector does not appear in the list when you “Search for data sources using remote connectors.” In such a scenario, you will need to establish the connection between the remote connector and the IBM Cloud Pak for Data cluster manually.

  • On your IBM Cloud Pak for Data instance, go to the hamburger (☰) menu, expand Data, and click Data virtualization. CPD - go to data virtualization

  • Open the Data virtualization menu and click on Run SQL. CPD - DV - Run SQL

  • Click Create new +. CPD - DV - Run SQL - create new

  • A new empty script is loaded on the screen. Type in the following SQL command and click Run all. (NOTE: Replace <host-name> with the host name for the virtual machine where you set up the remote connector; 6414 is the default port that the remote connector uses.)

CALL DVSYS.DEFINEGATEWAYS('<host-name>:6414')

CPD - DV - Run SQL - run

The manual connection between the remote connector and the IBM Cloud Pak for Data cluster has been added, and you should be able to connect to remote data sources using the remote connector.

Summary and next steps

In this tutorial, you learned how to improve the performance of your data source connections for Data Virtualization using remote connectors. You saw how data from different databases and servers, even remote and private ones, can be virtualized and joined to create a single comprehensive virtual view of the data.

Now that the data has been virtualized, you can use this data to perform different actions such as visualizing the data using Data Refinery, preparing and understanding the data using Watson Knowledge Catalog, and building analytical models using Jupyter notebooks, AutoAI, or IBM SPSS Modeler.

Legend