With PixieDust you can use the power of Python and Jupyter Notebooks when you:
- Have never coded before
- Are an experienced data analyst or data scientist
- Are a developer with little Python experience wanting to quickly explore some data
Jupyter Notebooks is a tool used by many data scientists to wrangle and clean data, visualize data, build and test machine learning models, and even write talks. The reason for this is that text, code, figures, and tables can be combined, which makes it easy to keep the code structured by adding a lot of comments and explanations of your thought processes and decisions.
To visualize data with Python, there are many packages available. When you begin, this might be overwhelming. When you are experienced, it still takes a bit of time to create charts because the syntax of all of these packages is slightly different. It is especially easy to spend a lot of time tweaking your code to create the perfect chart. I must admit I tend to do this as it is so much fun, but definitely not always necessary.
With PixieDust, you can explore data in a simpler way and also spend more time exploring the data instead of going down the rabbit hole of tweaking the code to change the colors, fonts, line styles, axes, and anything else you can manually change.
The main command to create charts from Spark or pandas DataFrames is
display(df). When you run this command in a cell in a notebook, the data is displayed in a table. Now you can scroll through the data, filter the data, or create a chart from a menu. All of this is done by clicking a few buttons.
PixieDust uses other visualization packages to create the charts, including matplotlib, bokeh, seaborn, and Brunel. You can see it as a clever wrapper around these libraries that will save you time while exploring data.
To explore PixieDust, you can go through this code pattern where historical shopping data is analyzed with Spark and PixieDust. The data is loaded, cleaned, and then analyzed by creating various charts and maps. Jupyter Notebooks are run in IBM Watson Studio. The code pattern helps steps you through the process of setting up your IBM Cloud account, creating the notebook, and running the notebook.
In case you want to jump straight to the code, the GitHub repository contains the notebook that you can run both in the cloud or locally.
To learn more about PixieDust and Jupyter Notebooks, the following resources can get you started: