Taxonomy Icon

Artificial Intelligence

Analyze historical shopping data with Spark and PixieDust in a Jupyter notebook

Get the code

Summary

Jupyter Notebooks is a tool used by many data scientists to wrangle and clean data, visualize data, build and test machine learning models, and even write talks. The reason for this is that the text, code, figures, and tables can be combined, which makes it easy to keep the code structured. This code pattern shows how you can use Jupyter Notebooks in IBM Watson Studio along with the open source Python packages Apache Spark and PixieDust to quickly analyze historical shopping data and produce charts and maps.

Description

Analyzing shopping data can give you a lot of information about customers and products. Although it can give you details about what customers are looking for, often it can be difficult to pull together and analyze the data that you need. Instead of relying on spreadsheets to analyze your data, this code pattern explains how you can analyze historical shopping data in a Jupyter Notebook with the open source Python packages Apache Spark and PixieDust.

To visualize data with Python, there are many packages available, but it might be a little overwhelming when you begin. With PixieDust, you can explore data in a simpler way. PixieDust uses visualization packages to create charts, including matplotlib, bokeh, seaborn, and Brunel. To explore PixieDust, you can use this code pattern where historical shopping data is analyzed with Spark and PixieDust. The data is loaded, cleaned, and then analyzed by creating various charts and maps. Jupyter Notebooks are run in IBM Watson Studio.

When you have completed this code pattern, you should understand how to:

  • Use Jupyter Notebooks in IBM Watson Studio
  • Load data with PixieDust and clean data with Spark
  • Create charts and maps with PixieDust

Flow

flow

  1. Log in to Watson Studio.
  2. Load the provided notebook into Watson Studio.
  3. Load the customer data in the notebook.
  4. Transform the data with Apache Spark.
  5. Create charts and maps with PixieDust.

Instructions

See the README for detailed instructions. These steps explain how to:

  1. Sign up for Watson Studio.
  2. Create a project.
  3. Create a notebook.
  4. Load customer data in the notebook.
  5. Transform the data with Apache Spark.
  6. Create charts and maps with PixieDust.