In this hands-on lab, you’ll learn how to work with key Big SQL, a component of IBMâ€™s big data platform based on Apache Hadoop. Â Big SQL is included with several IBM BigInsights offerings.
Big SQL enables IT professionals to create tables and query data in BigInsights using familiar SQL statements. To do so, programmers use standard SQL syntax and, in some cases, SQL extensions created by IBM to make it easy to exploit certain Hadoop-based technologies. Big SQL shares query compiler technology with DB2 (a relational DBMS) and offers a wide breadth of SQL capabilities.
Organizations interested in Big SQL often have considerable SQL skills in-house, as well as a suite of SQL-based business intelligence applications and query/reporting tools. The idea of being able to leverage existing skills and tools â€” and perhaps reuse portions of existing applications â€” can be quite appealing to organizations new to Hadoop. Indeed, some companies with large data warehouses built on relational DBMS systems are looking to Hadoop-based platforms as a potential target for offloading “cold” or infrequently used data in a manner that still allows for query access. In other cases, organizations turn to Hadoop to analyze and filter non-traditional data (such as logs, sensor data, social media posts, etc.), ultimately feeding subsets or aggregations of this information to their relational warehouses to extend their view of products, customers, or services.
After completing all exercises in this lab guide, you’ll know how to
- Inspect the status of your Big SQL service through Apache Ambari, a Web-based management tool included with the IBM Open Platform for Apache Hadoop.
- Create a connection to your Big SQL server from a command line environment (JSqsh).
- Execute Big SQL statements and commands.
- Create Big SQL tables stored in the Hive warehouse and in user-specified directories of your Hadoop Distributed File System (HDFS).
- Load data into Big SQL tables.
- Query big data using Big SQL projections, restrictions, joins, and other operations.
- Store complex data types in Big SQL tables.
- Gather statistics about your tables and explore data access plans for your queries.
- Create and execute SQL-based scalar and table user-defined functions.
- Work with Big SQL web tooling to explore database metrics and perform other tasks.
- Work with Big SQL using Squirrel SQL Client, an open source SQL tool, to explore your database, execute queries, and chart query results.
Allow 6 â€“ 7 hours to complete all sections of this lab. Separate labs are available on using Big SQL with HBase and using Spark to access Big SQL data.
Special thanks to Uttam Jain, Carlos Renteria, and Raanon Reutlinger for their contributions to earlier versions of this lab.Â Thanks also to Nailah Bissoon and Daniel Kikuchi for their reviews and to Kevin Hom for his assistance with publication.
This lab requires a BigInsights 126.96.36.199 environment in which Big SQL is installed and running.Â Big SQL is part of BigInsights Quick Start Edition, BigInsights Data Analyst, and BigInsights Data Scientist.
Examples in this lab use are based on a sample multi-node cluster with the configuration shown in the tables below.Â If your environment is different, modify the sample code and instructions as needed to match your configuration.
|Big SQL Administrator||bigsql||bigsql|
|Knox Gateway account||guest||guest-password|
|Ambari port number||8080|
|Big SQL database name||bigsql|
|Big SQL port number||51000|
|HBase installation directory||/usr/iop/188.8.131.52/hbase|
|Big SQL installation directory||/usr/ibmpacks/bigsql/4.0|
|JSqsh installation directory||/usr/ibmpacks/common-utils/jsqsh/2.14|
|Big SQL samples directory||/usr/ibmpacks/bigsql/4.0/bigsql/samples/data|
To get started with the lab exercises, you need to access to a working Big SQL environment and a secure shell (command window).Â A free Quick Start Edition is available for download from Hadoop Dev at https://developer.ibm.com/hadoop/try-it/.
This lab was tested against a native BigInsights installation on a multi-node cluster.Â Although the IBM Analytics for Hadoop cloud service on Bluemix includes Big SQL, it does not support SSH access.Â As a result, certain examples in this lab — such as those involving JSqsh (the Big SQL command line interface) and DFS commands — cannot be executed using that Bluemix service. Therefore, if you want to follow all examples in this lab, you should install and configure BigInsights on your own cluster.Â Consult the instructions in the product’s Knowledge Center (http://www-01.ibm.com/support/knowledgecenter/SSPT3X_4.0.0/com.ibm.swg.im.infosphere.biginsights.welcome.doc/doc/welcome.html).
Before continuing with this lab, verify that Big SQL and all its pre-requisite services are running.
|Â||If have any questions or need help getting your environment up and running, visit Hadoop Dev (https://developer.ibm.com/hadoop/) and review the product documentation or post a message to the forum.Â You cannot proceed with subsequent lab exercises without access to a working environment.|