Digital Developer Conference on Data and AI: Essential data science, machine learning, and AI skills and certification Register for free

Store and query unstructured JSON data from CockroachDB hosted on Red Hat Marketplace

This tutorial shows you how to get unstructured JSON data from an API, store it in CockroachDB hosted on Red Hat Marketplace, and query the unstructured JSON data from the table using Python and Jupyter notebooks.

About CockroachDB Operator

CockroachDB is an elastic SQL database that easily scales transactions for your apps and services. Its cloud-native architecture simplifies scale and also guarantees consistent transactions across multiple regions and multiple clouds. Learn more.

Learning objectives

When you have completed this tutorial, you will understand how to:

  • Install CockroachDB Operator from Red Hat Marketplace on an OpenShift Cluster
  • Create a CockroachDB cluster instance
  • Create a user and database in CockroachDB
  • Store and query unstructured JSON data from a third-party API in CockroachDB
  • Manage the CockroachDB cluster from the Cluster Overview page

Estimated time

Completing this tutorial should take about 30 minutes.

Prerequisites

  1. Red Hat Marketplace account.
  2. Red Hat OpenShift Cluster.
  3. OpenShift container & kubectl CLI.
  4. Access to a Jupyter Notebook. You can install a Jupyter Notebook from python-pip or use a tool such as Anaconda to open the Jupyter Notebook.

Steps

Step 1: Install the CockroachDB Operator from Red Hat Marketplace on OpenShift Cluster

Follow the steps in this tutorial to deploy a CockroachDB Operator from Red Hat Marketplace on an OpenShift Cluster:

Once you have successfully set up a CockroachDB Operator on an OpenShift Cluster you can create a database.

Step 2: Create a user and database in CockroachDB

Now, let’s create a user and a database.

  1. Run the following command to spin up a CockroachDB client:

     $ kubectl run -it --rm cockroach-client \
     --image=cockroachdb/cockroach \
     --restart=Never \
     --command -- \
     ./cockroach sql --insecure --host=example-cockroachdb-public.cockroachdb-test
    

    This should run the CockroachDB client and take you to a SQL Command Prompt as shown. If you don’t see a command prompt, try pressing Enter.

     root@example-cockroachdb-public.cockroachdb-test:26257/defaultdb>
    
  2. From the CockroachDB client, run the following commands:

    • Create a user maxroach as follows:

      root@example-cockroachdb-public.cockroachdb-test:26257/defaultdb> CREATE USER IF NOT EXISTS maxroach;
      CREATE USER 1
      
      Time: 9.580878ms
      
    • Create a database employees as follows:

      root@example-cockroachdb-public.cockroachdb-test:26257/defaultdb> CREATE DATABASE employees;
      CREATE DATABASE
      
      Time: 14.449525ms
      
    • Give the user, maxroach, permission to update our database, employees, as follows:

      root@example-cockroachdb-public.cockroachdb-test:26257/defaultdb> GRANT ALL ON DATABASE employees TO maxroach;
      GRANT
      
      Time: 9.308095ms
      

      At this point, you should have a user and a database.

  3. Type \q to quit the client console as shown:

    root@example-cockroachdb-public.cockroachdb-test:26257/defaultdb> \q
     pod "cockroach-client" deleted
     

Step 3: Access the admin console on your localhost

The following steps show you how to view the results of the commands you ran in the earlier steps via the admin console. You can access the console at localhost with port forwarding.

Note: You need to be logged in to your OpenShift Cluster with the OC login that you accessed in Step 2: Connect to the OpenShift cluster in your CLI.

  1. Run the following command to port forward 8080:

     $ kubectl port-forward example-cockroachdb-0 8080
    
     Forwarding from 127.0.0.1:8080 -> 8080
     Forwarding from [::1]:8080 -> 8080
    
  2. Visit http://localhost:8080 on your browser as shown. The page should load the cluster overview.

    cockroach-clusteroverview

  3. Click on databases to view the employees database that you created earlier.

    cockroach-databases

Step 4: Store and query unstructured JSON data using Python and Jupyter Notebooks

Once the CockroachDB is running and you created the user and database, you can now explore the JSON support for CockroachDB in a python runtime using Jupyter Notebook.

  1. In your terminal, run the following command to port forward the 26257 port from the CockroachDB database instance. This port is used in your Jupyter Notebook to establish a connection with the CockroachDB database instance.

     $ kubectl port-forward example-cockroachdb-0 26257
    
     Forwarding from 127.0.0.1:26257 -> 26257
     Forwarding from [::1]:26257 -> 26257
    
  2. Download and open the notebook Store-and-query-unstructured-Json-CockroachDB.ipynb in your local machine.

    We are using the http://dummy.restapiexample.com/api/v1/employees API for demonstration purpose.

    nb-api

  3. Click on the Cell tab and select Run All.

    nb-run

    You can now follow the notebook instructions for more details on what is happening in each cell.

    After you have executed the notebook, verify the table in the CockroachDB instance through the CockroachDB client from terminal.

  4. In your terminal, run the following command to spin up a CockroachDB client:

     $ kubectl run -it --rm cockroach-client \
     --image=cockroachdb/cockroach \
     --restart=Never \
     --command -- \
     ./cockroach sql --insecure --host=example-cockroachdb-public.cockroachdb-test
    

    This should run the CockroachDB client and take you to a SQL Command Prompt as shown. If you don’t see a command prompt, try pressing Enter.

     root@example-cockroachdb-public.cockroachdb-test:26257/defaultdb>
    
  5. From the CockroachDB client, run the following commands to view the user, database, and table which the Jupyter Notebook created:

    • View users by running the SHOW users; command:

      root@example-cockroachdb-public.cockroachdb-test:26257/defaultdb> SHOW users;
       user_name
      `-------------`
        cpuser
        maxroach
        root
        (3 rows)
      
      Time: 3.037641ms
      
    • View databases by running the SHOW databases; command:

      root@example-cockroachdb-public.cockroachdb-test:26257/defaultdb> SHOW databases;
      database_name
      `-----------------`
      bank
      defaultdb
      employees
      postgres
      system
      (5 rows)
      
      Time: 2.890031ms
    • To view the tables present in the Employees database, run the USE bank; command to switch to bank database, and run \d command to view the tables:

      root@example-cockroachdb-public.cockroachdb-test:26257/defaultdb> USE employees;
      SET
      
      Time: 11.83841ms
      
      root@example-cockroachdb-public.cockroachdb-test:26257/employees> \d
      table_name
      `----------------------`
      jsontbl
      (1 row)
      
      Time: 3.684617ms
    1. Finally, to view the unstructured JSON from the table, run the SELECT command as follows:

      root@example-cockroachdb-public.cockroachdb-test:26257/employees> SELECT * from jsontbl;
      

List of rows of people and their salaries

Summary

In this tutorial, we showed you how to get unstructured JSON data from a third-party REST API, created a table in CockroachDB and store the unstructured JSON data into the table. You also learned how to query the unstructured data from CockroachDB table into a pandas dataframe which can be further used for data engineering and data science tasks.

Reference

Refer to the following documentation from Cockroach labs to learn more about the operator and its features.