This tutorial shows you how to get unstructured JSON data from an API, store it in CockroachDB hosted on Red Hat Marketplace, and query the unstructured JSON data from the table using Python and Jupyter notebooks.
About CockroachDB Operator
CockroachDB is an elastic SQL database that easily scales transactions for your apps and services. Its cloud-native architecture simplifies scale and also guarantees consistent transactions across multiple regions and multiple clouds. Learn more.
Learning objectives
When you have completed this tutorial, you will understand how to:
- Install CockroachDB Operator from Red Hat Marketplace on an OpenShift Cluster
- Create a CockroachDB cluster instance
- Create a user and database in CockroachDB
- Store and query unstructured JSON data from a third-party API in CockroachDB
- Manage the CockroachDB cluster from the Cluster Overview page
Estimated time
Completing this tutorial should take about 30 minutes.
Prerequisites
- Red Hat Marketplace account.
- Red Hat OpenShift Cluster.
- OpenShift container & kubectl CLI.
- Access to a Jupyter Notebook. You can install a Jupyter Notebook from python-pip or use a tool such as Anaconda to open the Jupyter Notebook.
Steps
Step 1: Install the CockroachDB Operator from Red Hat Marketplace on OpenShift Cluster
Follow the steps in this tutorial to deploy a CockroachDB Operator from Red Hat Marketplace on an OpenShift Cluster:
Once you have successfully set up a CockroachDB Operator on an OpenShift Cluster you can create a database.
Step 2: Create a user and database in CockroachDB
Now, let’s create a user
and a database
.
Run the following command to spin up a CockroachDB client:
$ kubectl run -it --rm cockroach-client \ --image=cockroachdb/cockroach \ --restart=Never \ --command -- \ ./cockroach sql --insecure --host=example-cockroachdb-public.cockroachdb-test
This should run the CockroachDB client and take you to a
SQL Command Prompt
as shown. If you don’t see a command prompt, try pressing Enter.root@example-cockroachdb-public.cockroachdb-test:26257/defaultdb>
From the CockroachDB client, run the following commands:
Create a user
maxroach
as follows:root@example-cockroachdb-public.cockroachdb-test:26257/defaultdb> CREATE USER IF NOT EXISTS maxroach; CREATE USER 1 Time: 9.580878ms
Create a database
employees
as follows:root@example-cockroachdb-public.cockroachdb-test:26257/defaultdb> CREATE DATABASE employees; CREATE DATABASE Time: 14.449525ms
Give the user,
maxroach
, permission to update our database,employees
, as follows:root@example-cockroachdb-public.cockroachdb-test:26257/defaultdb> GRANT ALL ON DATABASE employees TO maxroach; GRANT Time: 9.308095ms
At this point, you should have a
user
and adatabase
.
Type
\q
to quit the client console as shown:root@example-cockroachdb-public.cockroachdb-test:26257/defaultdb> \q pod "cockroach-client" deleted
Step 3: Access the admin console on your localhost
The following steps show you how to view the results of the commands you ran in the earlier steps via the admin console
. You can access the console at localhost with port forwarding.
Note: You need to be logged in to your OpenShift Cluster with the OC login that you accessed in Step 2: Connect to the OpenShift cluster in your CLI.
Run the following command to port forward
8080
:$ kubectl port-forward example-cockroachdb-0 8080
Forwarding from 127.0.0.1:8080 -> 8080 Forwarding from [::1]:8080 -> 8080
Visit http://localhost:8080 on your browser as shown. The page should load the cluster overview.
Click on
databases
to view theemployees
database that you created earlier.
Step 4: Store and query unstructured JSON data using Python and Jupyter Notebooks
Once the CockroachDB is running and you created the user and database, you can now explore the JSON support for CockroachDB in a python runtime using Jupyter Notebook.
In your terminal, run the following command to port forward the
26257
port from the CockroachDB database instance. This port is used in your Jupyter Notebook to establish a connection with the CockroachDB database instance.$ kubectl port-forward example-cockroachdb-0 26257
Forwarding from 127.0.0.1:26257 -> 26257 Forwarding from [::1]:26257 -> 26257
Download and open the notebook Store-and-query-unstructured-Json-CockroachDB.ipynb in your local machine.
We are using the http://dummy.restapiexample.com/api/v1/employees API for demonstration purpose.
Click on the Cell tab and select Run All.
You can now follow the notebook instructions for more details on what is happening in each cell.
After you have executed the notebook, verify the table in the CockroachDB instance through the CockroachDB client from terminal.
In your terminal, run the following command to spin up a CockroachDB client:
$ kubectl run -it --rm cockroach-client \ --image=cockroachdb/cockroach \ --restart=Never \ --command -- \ ./cockroach sql --insecure --host=example-cockroachdb-public.cockroachdb-test
This should run the CockroachDB client and take you to a
SQL Command Prompt
as shown. If you don’t see a command prompt, try pressing Enter.root@example-cockroachdb-public.cockroachdb-test:26257/defaultdb>
From the CockroachDB client, run the following commands to view the
user
,database
, andtable
which the Jupyter Notebook created:View
users
by running theSHOW users;
command:root@example-cockroachdb-public.cockroachdb-test:26257/defaultdb> SHOW users; user_name `-------------` cpuser maxroach root (3 rows) Time: 3.037641ms
View
databases
by running theSHOW databases;
command:root@example-cockroachdb-public.cockroachdb-test:26257/defaultdb> SHOW databases; database_name `-----------------` bank defaultdb employees postgres system (5 rows) Time: 2.890031ms
To view the tables present in the Employees database, run the
USE bank;
command to switch tobank
database, and run\d
command to view thetables
:root@example-cockroachdb-public.cockroachdb-test:26257/defaultdb> USE employees; SET Time: 11.83841ms root@example-cockroachdb-public.cockroachdb-test:26257/employees> \d table_name `----------------------` jsontbl (1 row) Time: 3.684617ms
Finally, to view the unstructured JSON from the table, run the
SELECT
command as follows:root@example-cockroachdb-public.cockroachdb-test:26257/employees> SELECT * from jsontbl;
Summary
In this tutorial, we showed you how to get unstructured JSON data from a third-party REST API, created a table in CockroachDB and store the unstructured JSON data into the table. You also learned how to query the unstructured data from CockroachDB table into a pandas dataframe which can be further used for data engineering and data science tasks.
Reference
Refer to the following documentation from Cockroach labs to learn more about the operator and its features.