Kubernetes with OpenShift World Tour: Get hands-on experience and build applications fast! Find a workshop!

Store, graph, and derive insights from interconnected data

Summary

Every day, digital users generate massive amounts of unstructured, interconnected data from social media, online portals, internal business processes, and other sources. Graph databases are particularly well suited for storing and deriving insights from these types of interconnections. This code pattern shows you how to use IBM Watson Studio to store interconnected data and run queries to gain insights using the OrientDB database. You will learn how to cleanse a data set, extract entities and relations, populate the OrientDB database, and execute queries.

Description

Today’s digital world is more interconnected than ever before, and those interconnections hold a wealth of data. Organizations need to unlock the value in that data to better understand their users, the marketplace, and their own expansion potential. They can also derive insight from online searches, product recommendation engines, fraud detection, and more.

The most recognizable example of interconnection, social media, makes it possible to connect with virtually anyone around the globe. Each of these modern connection points open up the possibility of doing business across geographical boundaries, while increasing the need for organizations to unlock data value so they can better understand their users, the marketplace, and their expansion potential.

Graph databases are particularly well suited for unlocking value from interconnected data. They enable you to store and run queries on the interconnections, making it possible to gather insight on various relationships and interactions. OrientDB is a multi-model NoSQL Database with a Graph Database Engine that manages relationships using direct connections between records. A graph database such as OrientDB is also useful for working with business data in disciplines that involve complex relationships and dynamic schema. You can see an example in action each time you encounter a recommendation like “Customers who bought this product also looked at…” in a retail portal. A graph database provides flexibility, enabling you to represent your data in a ways that are understandable to your audience while simultaneously tracking the complex interactions underlying that data.

In this pattern, you’ll use the PyOrient module, an OrientDB driver for Python, to operate on data and to gather insights from OrientDB. The IBM Watson Studio will help you analyze data using Jupyter notebooks. You’ll learn the end-to-end flow, from downloading and cleansing the data set, to extracting entities and relations, to creating a new OrientDB database. You’ll populate the database with node classes, edge classes, vertices, and relations, and you’ll then execute queries to gather insight.

You’ll learn how to:

  • Set up an IPython notebook on Watson Studio, connecting to OrientDB using PyOrient
  • Perform CRUD operations and extract insights from OrientDB database

Social media and other complex interconnections contain a virtually unlimited amount of data that can be mined, analyzed, and leveraged. If you’re a developer who needs to help your organization unlock the value and potential of massive amounts of social and interaction data, your journey starts here.

Flow

flow

  1. The user sets up the Kubernetes cluster using Kubernetes service on IBM Cloud.
  2. The user deploys OrientDB instance on the Kubernetes cluster created in the first step with persistent volume, exposing the ports (2424, 2480) used by OrientDB on IBM Cloud.
  3. The user creates a Jupyter notebook on the IBM Watson Studio powered by Spark. An object storage instance is attached to the notebook for storing the data used by the notebook.
  4. The developer uploads the configuration file (config.json) and the dataset (graph-insights.csv) to the object storage.
  5. The object storage credentials are updated in the notebook and the files from object storage are loaded to create the graph in OrientDB.
  6. The notebook communicates with OrientDB through the PyOrient driver. Various operations are performed on the OrientDB using functions written in the Jupyter notebook.

Instructions

Ready to put this code pattern to use? Complete details on how to get started running and using this application are in the README.

Vishal Chahal
Neha Setia