This workshop series walks you through an overview of the Big Data & Analytics reference architecture, how Bluemix supports that architecture, and how you can build your very own native Big Data application on Bluemix today. Getting hands on, you will create the System of Record (SoR) database in a Virtual Machine on the Bluemix VM service (simulating an on-premise SoR), deploy & configure a Secure Gateway to connect the SoR to Bluemix, deploy a Big Data sample application to Bluemix, and configure the application to connect to the on-premise SoR via the Secure Gateway connection. This first workshop is a prerequisite for all the other workshops in the Big Data series, as it provides the necessary application components for future workshop interaction. To get started with the Big Data reference architecture and your application, you’ll need to complete the steps in Task 1 through Task 8. This workshop has the following sections: Contents:

Why deploy your Big Data application on Bluemix

A common theme in IT today is analytics--efforts to gain insights resulting from the systematic analysis of data. IT today provides a multitude of options for developing and deploying applications commonly referred to as Big Data. These applications quickly perform complex data analytics on huge data sets, from multiple perspectives, while requiring much less hardware and overall investment than previous workloads. IBM Bluemix enables you to host your analytics applications in the cloud and provides a unique combination of additional benefits: a selection of database technologies for managing the data, storage for large sets of data, a variety of analytics tools for implementing the analysis logic, and plenty of computational capacity to run that logic at scale. Bluemix frees you from the details of managing data and running the analysis, so you can focus on specifying the data and devising customized analysis processes that provide unique advantage to your business.

Bluemix handles traditional IT tasks so that you can focus on the business logic and data that differentiates your application. It enables you to begin your implementation immediately and avoid roadblocks of the traditional development scenarios such as provisioning hardware, network, storage, and middleware. Your organization can take advantage of automated provisioning and integration of the required components, as well as integrated security and monitoring capabilities built into the platform.

Cloud architecture overview for Big Data & Analytics solution hosting

Fig 1. Big Data & Analytics on Cloud Reference Architecture
Fig 1. Big Data & Analytics on Cloud Reference Architecture
Big data and analytics require a new view on business intelligence, data management, governance, and delivery. Cloud computing is a perfect vehicle for hosting big data and analytics workloads. The Big Data and Analytics on Cloud Reference Architecture functional view (shown above in Figure 1 and documented at the Cloud Standards Customer Council Resource Hub) depicts a set of capabilities that a business must consider as they enter the big data and analytics space. It includes capabilities around data integration, management, security, and analytics. Through this workshop series, you will build out a subset of the key Big Data & Analytics on Cloud reference architecture capabilities. The workshop tasks cover the Integrated Data Warehouse, Archive Repository, Data Load, and Analytics Application components.

Bluemix services used in this workshop

Since the Reference Architecture is vendor agnostic, we selected from a myriad of different technology choices for implementation in this workshop series.

Integrated Data Warehouse

Trusted data is stored in the traditional enterprise data warehouse. Data for this repository is modeled to support interactive business intelligence activities. Warehouse data is normalized, matched, cleansed, and validated. This repository typically requires high availability and disaster recovery. It is also the most expensive repository. The Enterprise Warehouse repository keeps detailed data for the most current month(s) and aggregates yearly data as opposed to maintaining years of detailed data in raw form. It has the following characteristics:
  • Structured
  • Validated (Trusted)
  • Consolidated
  • Aggregated
  • Historical
Considering that the majority of data warehousing has traditionally been deployed on-premise, we have purposely chosen an on-premise data warehouse in this workshop series to reflect where most of our clients are today. For that purpose, the Secure Gateway service was chosen. It enables applications that are running in Bluemix to access remote systems and databases reliably and securely. An enterprise can configure a gateway that connects to an existing data warehouse inside their data center, providing controlled access to the data. With a gateway, an application running in Bluemix can access data in the databases that are already housing it and make it available to the application for analytics and other uses.

Archive Repository

Archiving “cold” data from data warehousing environments is necessary as a way of reducing warehouse costs and improving performance. While this “cold” data may be of no interest for operational reporting or business intelligence, it is increasingly of relevance to users performing exploratory or deep analytics. Taking advantage of a cost effective Hadoop infrastructure within the Archive Zone to store the “cold” historic data provides data for deeper analytics. The Hadoop component for the archive repository in this workshop series is the IBM Analytics for Hadoop service. Analytics for Hadoop enables users to perform complex analysis of large data sets, built on the open-source Hadoop technology. Based on an enterprise-grade Hadoop offering, this service leverages Hadoop's distributed processing capabilities to provide easy access to large data sets with fast and efficient visualization of those data sets. Users can analyze and visualize Big Data on a single-node Hadoop cluster through a flexible pay-as-you-go payment model. Previously, similar jobs would have taken days or weeks, often run serially on hardware and technology of the past. Hadoop takes advantage of parallel processing capabilities to run these jobs in mere minutes, on commodity hardware, for a much smaller overall cost.

Data Load

This component focuses on the process of loading or inserting data into a target repository (or analytical source) and making it available for use in a Big Data application. The data load process might be scheduled in batch intervals or in near real-time/”just-in-time” intervals, depending on the nature of the data and the business purpose for its use. The IBM DataWorks service enables data management for three separate roles: developers, data stewards, and business analysts. DataWorks enables developers to quickly build high-quality applications, with data easily accessible, allowing them to focus on writing business logic, not data access logic. DataWorks provides cloud-based data refinement to move data across various cloud-based and on-premise data sources, making data available to the applications that need it, when they need it, and in the form they expect. For data stewards, DataWorks easily enables self-service data access, instills confidence in the data among end users, and maintains data governance and security controls. For business analysts, DataWorks accelerates the finding and using of refined data for their high-value analytic needs. For all of these roles, DataWorks enables IT teams to better meet business demands and facilitate rapid, self-service data access.

Analytics Application

IBM Bluemix allows developers to focus on developing applications and provides the necessary services to get the job done. Upon the completion of your application development, you can deploy your application to the cloud, bind a service to your application, and automatically generate access credentials for your application to connect to the new service. Your application can then fetch the credentials through an environment variable named VCAP_SERVICES and parse it to get the specific connection information. This binding allows your application to be independent of the environment or service instance, since it parses the information dynamically.

Introduction to Stock Volatility, our sample Big Data application

Fig 2. Solution Architecture for Data Warehouse Augmentation
Fig 2. Solution Architecture for Data Warehouse Augmentation
Our sample analytics application, called Stock Volatility, runs Hive queries against the data and displays the output in a bar chart. The goal of this application is to calculate the volatility of a selected stock during the following recession years: 2000, 2001, 2007, 2008, and 2009. The application UI allows you to select a stock and analyze the volatility. The developer's job is to write the queries that are needed to fetch the data. This application itself is simple and can easily be created using a SQL database, but if you are dealing with terabytes of complex data, you can leverage the power of Hadoop to analyze the data quickly on Bluemix and use tools like D3 libraries to visualize it. Our Big Data workshop series integrates Big Data and data warehouse augmentation capabilities to increase operational efficiency. Figure 2 above illustrates the solution architecture used in this workshop series to augment a hybrid data warehouse. The goal of warehouse augmentation is to help organizations get more value from an existing data warehouse investment while reducing overall costs, for example:
  • Optimizing storage by providing a queryable archive
  • Rationalizing data for greater simplicity and reduced expense
  • Speeding data queries to enable more complex analytical applications
  • Improving the ability to scale predictive analytics and business intelligence operations
We use the IBM DataWorks cloud service to offload aged data from on-premise MySQL data warehouse to the cloud-based Hadoop repository, IBM Analytics for Hadoop, as part of a continuous delivery workflow. Our Stock-Volatility application takes advantage of SQL-based analytics using Virtual Report Marts over Hive. It accesses and augments both the current data in the MySQL data warehouse and the historical data in IBM Analytics for Hadoop to analyze the stock volatility.

Big Data Workshop Series Overview

The Big Data workshop series is broken into 2 Workshops, as follows:
  1. Actionable Architecture: Secure Hybrid Data Warehouse on Bluemix
  2. Actionable Architecture: Data Warehouse Augmentation on Bluemix

Workshop tasks

This workshop shows how to deploy a Big Data application and connect it to a secured data warehouse. It consists of eight tasks that include getting your Bluemix account, setting up a sample System of Record (SoR), deploying a Secure Gateway connection from Bluemix to the SoR, and deploying a Big Data application connecting to the SoR through the secured connection. For this workshop, the SoR is a SQL database on a virtual machine in Bluemix, providing a simplified simulation of what would be done in a real-world environment.

Task 1. Set up your Bluemix account

IBM Bluemix is an open-standards, cloud-based platform for building, managing, and running all types of applications: mobile, smart devices, web, and big data. The Bluemix capabilities include Java™, mobile back-end development, application monitoring, and features from ecosystem partners and open source, all through an as-a-service model in the cloud.

Before you can use the Bluemix capabilities, you must register for an account. You can sign up for one at no charge for a 30-day free trial. After the trial period, you will need to provide a credit card to pay as you go for your resource usage. After you sign up, you can find helpful information in the overview section of the Bluemix Docs.

Tip: If you are using a free or Trial account, you have a limit of 4 service instances. During subsequent workshops, you will create a number of service instances for use with the application. If you’ve already created other services during previous Bluemix activities, you may need to delete some unused or unnecessary service instances to proceed through these workshops. To delete a service, from the Bluemix Dashboard, highlight the settings icon in the top right of the service panel and select Delete. If you are asked to restage your application, click Restage and wait for your application to be redeployed before proceeding.

  1. If you do not already have a Bluemix account, sign up for one at no charge.
  2. Log in to Bluemix. The dashboard opens as shown:
Fig 3. Bluemix Dashboard
Fig 3. Bluemix Dashboard
The dashboard shows an overview of the active Bluemix space for your organization. By default, the space is dev and the organization is the project creator’s user name. For example, if bob@example.com logs in to Bluemix for the first time, the active space is dev and the organization is bob@example.com. If you create more organizations or spaces in Bluemix, be sure to use the same organization and space as you follow the tutorials.

Task 2. Create an SSH Keypair

For this workshop, we’re going to use a virtual machine (VM) to simulate the computer hosting the system of record (SoR). The Secure Gateway service’s client needs two network connections, to the SoR and to the Bluemix data center, and so it must run in a region of the data center network that has access to both. For this example, we'll give the VM a public Internet IP address so that it will be able to use the Internet to connect to Bluemix. This VM could be hosted by any cloud provider; we'll create it in the Bluemix VM service so that you can create the VM using your Bluemix account and capacity.

To log into the Bluemix VM, you will need an RSA keypair. Since we will need to specify the keypair when we create the VM, we will create the keypair first. A keypair is more secure than a password, preventing anyone without your key from hijacking your VM. Bluemix provides two ways to specify a key: You can import one or create one. With the create option, Bluemix generates a keypair and gives you the private key. Here, we’ll use the import option, where we generate our own keypair and import the public key into Bluemix.

Generate the keypair by using tools on your computer. We’ll call our keys bigdatakey. This generation will result in a pair of files, a private key (bigdatakey) and a public key (bigdatakey.pub).
  • In Unix/Linux: Run ssh-keygen -t rsa -f bigdatakey
  • Windows: Use PuTTY.
For more information, see ssh-keygen (Wikipedia) and ssh-keygen (Ubuntu Manpages).

Task 3. Create your Virtual Machine on Bluemix

Now that you have a keypair, go to the Bluemix Dashboard and follow these steps to create a virtual machine:
  1. Select CREATE VIRTUAL MACHINES to start creating your new VM.
  2. On the Create a Virtual Machine properties page, below the Security Key field, press the + Add Key button.
  3. In the Add Key dialog, name your key bigdatakey. Copy the contents of your bigdatakey.pub file and paste those contents into the Public Key to import field. Press OK to close the window and import the public key.
  4. On the Create a Virtual Machine page, name your VM group Big_Data. To make sure your VM group name is unique, add your initials or a timestamp to the end of the name. Ensure the other settings are as shown below, then press Create.
    • VM Cloud: IBM Cloud Public
    • Initial instances: 1
    • Image: Ubuntu 14.04
    • IBM image default user ID: ibmcloud
    • VM group: Big_Data_XX
    • VM size: m1.small
    • Security Key: bigdatakey
    • Network: private
  5. The Dashboard page for your VM, Big_Data_XX, shows the details about your VM, which initially are the same as those you set in the create properties. When Virtual Machines Health status panel says Your VMs are running, it also displays the IPs for your VM. The first address, 129.xxx.xxx.xxx, is public and is used for Internet clients to address the VM. The second address, 192.168.xxx.xxx, is private and is used for other VMs hosted by Bluemix to address this VM. For this workshop, the public address is referred to as <public-IP-address>.
    Fig 4. Bluemix Virtual Machine Dashboard Panel
    Fig 4. Bluemix Virtual Machine Dashboard Panel
  6. Now that your VM is created, you can log into it using Secure Shell (SSH). The command is:
    $ ssh -i <private-key> -l <username> <public-IP-address>
    Where:
    • <private-key> is the path and filename of your bigdatakey private key file
    • <username> is ibmcloud for this VM created from one of the virtual images supplied by Bluemix
    • <public-IP-address> is your VM’s public IP address
When you log in, you don’t need to specify a password because you’re already authenticated by the private key. If you can log into your VM successfully by using SSH, then you have successfully created a VM and it is running and ready for use.

Task 4. Install Docker on the virtual machine

The Secure Gateway service in Bluemix requires a client that runs in the same data center as the system of record (SoR) the gateway will connect to. There are a couple of options for the Gateway client; the easiest one to configure, useful for development purposes, is one that runs in a Docker container. Since the Gateway client runs in a Docker container, install Docker on the VM that’s simulating a computer in a data center.

Docker documents the process for installing Docker on Ubuntu in Installing Docker on Ubuntu.

Tip: Notice that Docker’s instructions use wget to download from https://get.docker.com/, which installs the latest version of Docker. Do not follow any instructions that use apt-get to install docker.io; that approach typically installs an older version of Docker.

Log into your VM using SSH, as described above, and perform the following commands:
  1. Before installing any software, make sure your Ubuntu installation is running the latest version of all of its packages. Run this command:
    $ sudo apt-get update
  2. Install the Docker package
    $ wget -qO- https://get.docker.com/ | sh
  3. Verify that Docker is installed correctly.
    $ sudo docker run hello-world
  4. When hello-world runs correctly, part of the output should say:
    Hello from Docker.
    This message shows that your installation appears to be working correctly.
    
When you can run hello world successfully, your VM has Docker installed and running correctly.

Task 5. Create a sample data warehouse on your virtual machine

To simulate the SoR, we will use a MySQL database running in a Docker container. Although a data warehouse would not typically be hosted in MySQL, for the purposes of this workshop, MySQL is a free SQL database that is already available in a Docker container. The VM already has Docker installed to run the gateway client’s container, so Docker can also run a container with MySQL. Because MySQL is already installed in the container, you won’t have to install MySQL; you’ll just need to download the MySQL container and run it.

Download the schema and sample data files

When you install the image for the MySQL Docker container, the container simply runs the database server. For this workshop, we not only need the database server, but we also need it to contain a database with some particular sample data. To create that database of sample data, we’ve provided a couple of files for you to download. We’ll put these files in the ibmcloud user’s home directory under bigdata-nasdaq.
  1. Create the directory to store the downloads in. The scripts to initialize the database will look for the files in this directory.
    $ mkdir ~/bigdata-nasdaq
    $ cd ~/bigdata-nasdaq
  2. Download the files for creating the schema and for creating the data.
    $ wget https://hub.jazz.net/git/osowski%2Fbigdata-volatility/contents/master/data/bigdata-nasdaq-create.sql
    $ wget https://hub.jazz.net/git/osowski%2Fbigdata-volatility/contents/master/data/bigdata-nasdaq-data.sql
For more information, see Wget (Ubuntu Manpages).

Install the database’s Docker container

Now that we have the files for initializing the database, we’ll start a Docker container from an image that has MySQL installed and has a mechanism to enable us to execute the initialization scripts. For this workshop, the Docker image you will use is tutum/mysql. For more information about this Docker image, see the GitHub project: tutumcloud/tutum-docker-mysql.
  1. Create the MySQL container instance and load the sample data from the two downloaded files with the command below:
    $ sudo docker run -d --name mysql-tutum -p 3306:3306 -v /home/ibmcloud:/home/ibmcloud -e ON_CREATE_DB="nasdaq" -e MYSQL_PASS=passw0rd -e STARTUP_SQL="/home/ibmcloud/bigdata-nasdaq/bigdata-nasdaq-create.sql /home/ibmcloud/bigdata-nasdaq/bigdata-nasdaq-data.sql" tutum/mysql
    
    Where:
    • –d runs the container in the background, not interactively
    • mysql-tutum is the name to give the container created from the image
    • 3306:3306 forwards the MySQL port to make it accessible from the host OS’s IP address
    • /home/ibmcloud:/home/ibmcloud binds the directory to make the directory in the host OS available within the container
    • ON_CREATE_DB instructs the container to create the "nasdaq" database when the container first starts
    • MYSQL_PASS sets the password of the database’s main user, in this example to passw0rd
    • STARTUP_SQL tells the container to run the SQL files in the order specified via the space-separated list
    • tutum/mysql is the name of the Docker image to create the container from
You now have a running Docker container named tutum/mysql. That container has a MySQL database server running in it. The database server contains a database named nasdaq that contains a table named rawData that contains the sample data for a bunch of Nasdaq quotes.

Task 6. Deploy the Stock Volatility sample application

You will now deploy an application that will connect to the simulated data warehouse through the Secure Gateway. This application is already available in an IBM Bluemix DevOps Services project that you will fork and have a copy of your own. You will then configure the project's pipeline to deploy the application to Bluemix and to automatically push future changes in application updates.

Fork the bigdata-volatility sample project

  1. Access the bigdata-volatility sample application in IBM Bluemix DevOps Services.
  2. Click Fork Project. You may be prompted to log in or create a short name to log in with.
  3. Create a new name for your project. You are not required to change the name, since this project will be created in your account.
  4. Click Create. Your project is created and you are redirected to the new project page.
    Fig 5. Forked bigdata-volatility project
    Fig 5. Forked bigdata-volatility project

Configure the Build Pipeline

  1. Click Build & Deploy in the upper right corner of your new project.
  2. Click ADD STAGE
  3. Click MyStage and rename this stage to Build. All other defaults are acceptable.
    Build Pipeline setup - Build stage
    Fig 6. Build Pipeline setup - Build stage
  4. Click the JOBS tab at the top, click ADD JOB in the new tab, and select Build.
  5. For the Builder Type, select Ant. Our project uses a simple Ant script to build a Websphere Liberty application on Bluemix. You can integrate your own build scripts for your application.
    Build Pipeline setup - Build stage, part 2
    Fig 7. Build Pipeline setup - Build stage, part 2
  6. The Build stage is now complete. Click SAVE.
  7. Click ADD STAGE again. Click MyStage again and rename to Deploy. Set the Input Type to Build Artifacts. All other defaults are acceptable here.
    Build Pipeline setup - Deploy stage
    Fig 8. Build Pipeline setup - Deploy stage
  8. Click the JOBS tab at the top, click ADD JOB in the new tab, and select Deploy.
  9. Most defaults are acceptable in this panel, however, ensure that your Application Name is specific enough to be unique across all of Bluemix, as this name will become the application's hostname. You can configure these values separately, but for now we will use them as one and the same. Add your initials or a time stamp to the end of your Application Name and click SAVE.
    Build Pipeline setup - Deploy stage, part 2
    Fig 8. Build Pipeline setup - Deploy stage, part 2

Deploy your application to Bluemix

In the Pipeline: All Stages view:
  1. In the Build stage, click the Run Stage button. Hint: It looks like a play button.
    Build Pipeline Setup Complete
    Fig 9. Build Pipeline Setup Complete
    The project builds and automatically deploys the application to Bluemix.
    Build Pipeline Deploy Complete
    Fig 10. Build Pipeline Deploy Complete
After the application is deployed, you will be able to see this new application in your Bluemix Dashboard. Now you will securely connect to your data warehouse and configure the deployed application to connect to it with the provided credentials.

Task 7. Connect to your data warehouse with the Secure Gateway service

The Secure Gateway service in Bluemix supports the development of hybrid cloud and hybrid IT applications—ones with parts running in multiple cloud and non-cloud environments. It provides secure connectivity from Bluemix to other applications and data sources—commonly called systems of record (SoR)—running on-premise or in other clouds. The service includes a remote client which enables secure connectivity. Most of the steps to set up the Secure Gateway must be performed in Bluemix. The step to set up the Secure Gateway Client must be performed on the remote system.

Create a Secure Gateway

Like any service instance, an instance of the Secure Gateway service is bound to a particular application. You can then use that Secure Gateway instance to connect that application to as many systems of record (SoRs) as you like.

Follow these steps to add a Secure Gateway to an application:
  1. In the Bluemix Dashboard, click ADD A SERVICE OR API.
  2. Search for Secure Gateway by typing the name in the search field.
    Fig 11. Search for Secure Gateway service in the Catalog
    Fig 11. Search for Secure Gateway service in the Catalog
  3. Click the Secure Gateway service to open the details.
    Secure Gateway service description
    Fig 12. Secure Gateway service description
  4. Make sure that under App you have your Java Liberty application selected. Leave the other default values and click CREATE.
  5. Click RESTAGE.

    Why restage?: Because you added a new service to a running application, you are prompted to restage the application to update it with the new service. Bluemix is trying to make sure that the application code is up to date with any changes that were applied.
  6. The Secure Gateway service is now created and bound to your application.

Add a gateway and client

For a Secure Gateway to connect its application to a particular resource, you must define a gateway in the Secure Gateway and install a gateway client on that resource. The client only connects to that gateway and the gateway can only connect to one client. The client needs to have a network connection to the gateway, such as an Internet connection between the private data center and the Secure Gateway service instance in Bluemix. The client does not have to be installed on the same computer as the resource it will connect to, but the client does need to have a network connection to each resource. In this way, the client is a gateway connecting the Secure Gateway service instance to the resource.

Follow these steps to add a gateway and its client to the Secure Gateway:
  1. Go back to the Bluemix Dashboard. You should see the new Secure Gateway service that you created.
  2. Click the Secure Gateway service tile to open the Secure Gateway Dashboard.
  3. Click Add Gateway. The Add Gateway page is displayed.
  4. Provide a name for your gateway, such as Trading Systems, and click Connect it.
    Fig 13. Add Secure Gateway details panel
    Fig 13. Add Secure Gateway details panel
  5. The Connect it page is displayed and the bullet item for Name it is marked as complete. By default, the Docker option is automatically selected and shows the Docker command that you must run to create the gateway client.
    Fig 14. Add Secure Gateway details - Connect It panel
    Fig 14. Add Secure Gateway details - Connect It panel
  6. Copy and run the Docker command that is provided on your virtual machine. For example:
    docker run -it bluemix/secure-gateway-client <configuration_id>
    where:
    <configuration_id> is the configuration ID for the gateway this client will connect to. Note: Remember to add sudo if necessary at the beginning of the Docker command above.
  7. The Gateway is connected. The client logs a status that it is connected, and the Gateway page shows that it is connected.

    Note: The Docker -run command is provided when you create your Secure Gateway configuration. Each configuration has a different command-line parameter that specifically defines that client. Unlike the docker command that created the MySQL database, this docker container runs in the foreground and will take control of the terminal while running. It's best to open a new terminal window for use while connected.

Add a destination

The Gateway creates a connection between the Secure Gateway in Bluemix and the client running in the private data center, enabling the Bluemix application access to resources within the data center. Each system the application wants to connect to is represented by a destination. The system must have an IP address or hostname and a port that an IP client can use to connect to the system. The destination binds the system’s address to an Internet address that the application can use to access the system, and uses the client to access the system.

Follow these steps to add a destination to the gateway:
  1. In the Bluemix UI, click Add destinations. The Create Destinations page is displayed and the bullet item for Connect it is marked as complete.

    Note: A destination is a Secure Gateway connection to a specific on-premises resource. The host name and port number provide direct access to that resource from the cloud side.
    Fig 15. Add Secure Gateway details - Add Destinations panel
    Fig 15. Add Secure Gateway details - Add Destinations panel
  2. Complete the following fields:
    • Destination name, such as Stock Quote Warehouse
    • Host name or IP address of your VM, the VM’s <public-IP-address>
    • The port for accessing the resource, which for the MySQL container is 3306
    • For the drop down about how you want to secure access to the destination, leave it as the default option of TCP. This means there will be no TLS (Transport Layer Security). Your application can communicate directly to the gateway without requiring any certificates.
  3. Click the + icon. The destination is added to the collection of destinations below the graph.
    Fig 16. Add Secure Gateway details - Setup complete
    Fig 16. Add Secure Gateway details - Setup complete
  4. Click I’m done to complete your configuration. The Secure Gateway Dashboard page is displayed.

Retrieve your secured Destination URL

  1. Click the connection card associated with the Trading Systems Gateway you just setup.
  2. Click the Info icon, which is associated with the Stock Quote Warehouse Destination card.
  3. Copy & paste the value under the Cloud Host : Port label to a temporary text file or another location you can easily reference in the next Task. This value will look something along the lines of the following:
    cap-sg-prd-X.integration.ibm.cloud.com:XXXXX
    This value will be needed to configure our connection to our data warehouse from our application in the next Task.
    Fig 17. Secure Gateway Destinations panel
    Fig 17. Secure Gateway Destinations panel

Task 8. Connect the Stock Volatility sample application to your secured data warehouse

Now we'll use the Secure Gateway to connect the sample application to the data warehouse.
  1. In the Bluemix Dashboard, click your newly deployed application. In the previous Task, bigdata-volatility-walkthrough was used as the application name.
  2. Click Environment Variables in the left side-menu and then click on USER-DEFINED.
  3. You will now create some user-defined variables for the application to connect to your data warehouse. Click ADD and create a variable for each row below, omitting the colon in the variable name:
    • DATAWAREHOUSE_HOST: The previously copied Cloud Host : Port from the Secure Gateway Dashboard. E.G. cap-sg-prd-X.integration.ibm.cloud.com
    • DATAWAREHOUSE_PORT: The previously copied port from the Cloud Host : Port from the Secure Gateway Dashboard.
    • DATAWAREHOUSE_DB: nasdaq
    • DATAWAREHOUSE_USERNAME: admin
    • DATAWAREHOUSE_PASSWORD: The value of MYSQL_PASS from Task 5.2. E.G. passw0rd
    Bluemix Application Environment Variables panel
    Fig 18. Bluemix Application Environment Variables panel
  4. Click SAVE and your application will be restarted. Click Overview in the left side-menu and wait for your application to restart.
  5. After your application is restarted, click the Routes link in the top of the page to access your running application.
  6. Hover over the left arrow and the menu will slide out. Select Recession Analysis.
    Stock Volatility app - Recession Analysis menu
    Fig 19. Stock Volatility app - Recession Analysis menu
  7. Select a stock from the drop down list and click Analyze.
  8. You are presented with a historical view of stock prices for the selected stock during years with recession-marked periods.
    Stock Volatility app - AMZN stock prices
    Fig 20. Stock Volatility app - AMZN stock prices
When you see historical views of stocks, you know you application is working and is retrieving data from the data warehouse.

Conclusion

You have now completed Actionable Architecture: Secure Hybrid Data Warehouse on Bluemix (Big Data Workshop 1). In this workshop, you:
  • Created a Virtual Machine by using the Bluemix Virtual Machine service
  • Set up and configured a simulated on-premise data warehouse using containers
  • Securely connected a data warehouse to the cloud for secured access from Bluemix applications
  • Deployed a stock volatility application, running on Bluemix, that utilizes data from the on-premise data warehouse via the Secure Gateway connection.
In general, you can use this process any time that you want to connect an application that is running in Bluemix to a database or some other system of record (SoR) that's running outside of Bluemix.

Acknowledgements

Many individuals contributed time and effort to the creation of this workshop series, from initial use case discussion to planning to hands-on development to documentation. I'd like to acknowledge and thank the core team of contributors to this Big Data workshop series, specifically:
  • Bobby Woolf
  • Katerina Goulioutkina
  • Manav Gupta
  • Rajeev Sikka
  • Ruth Willenborg
  • Shahir Daya
  • Xiaomei Wang

2 comments on"Actionable Architecture: Secure Hybrid Data Warehouse on Bluemix (Big Data Workshop 1)"

  1. I just completed the workshop: Actionable Architecture: Secure Hybrid Data Warehouse on Bluemix (Big Data Workshop 1) … I would like to complete the next workshop: Actionable Architecture: Data Warehouse Augmentation on Bluemix (Big Data Workshop 2) …. Where do I find Big Data Workshop 2?

  2. ntndevworks June 09, 2016

    Great tutorial. Thanks all.

Join The Discussion

Your email address will not be published. Required fields are marked *