Using plots and charts in data visualization

This is the last of a two part tutorial. Before beginning this tutorial, you should complete the first part, Identify patterns, relationships, and connections using data visualization

This tutorial will focus on data visualization’s functions, settings, applications, and rendering effects for charts. Data visualization provides a variety of charts, including the commonly used line, bar, pie, scatter charts, and histogram. Data visualization also includes a lot of statistical charts, including box plot, error bar, Q-Q plot, time plot, 3D, evaluation and t-SNE charts. With these feature-rich charts, you don’t have to worry about how to choose the analysis at all. You just need to import the data, and the data visualization will automatically help you choose the appropriate chart type. After the analysis is complete, you can download your current chart and its attributes to use or reference with your next analysis.

Estimated time

Completing this tutorial should take about 60 minutes.

Steps

Step 1: Main screen selections

On the main screen, after selecting the data column to be analyzed, you can directly view different charts, and all the required data calculation values are displayed in the chart for reference. Of course, the data can also be selected and replaced according to your requirements. Various parameters can also be selected and adjusted at any time.

Here we use the histogram as an example to show how different parameter settings allow you to get the charts you need.

Chart panel image Figure 1: Chart panel

Step 2: Analytic visualizations

Visualizations in the Data view focus on exploring data characters to get insight. Using Charts provides the ability to analyze data on different views and methods.

Time plot

The time plot chart illustrates time series data (data value at equal intervals of time points). This chart shows characters in the coordinate and polar system for an overview and decomposition of the series.

Time plot image Figure 2: Time plot – Overview

Single time series can be broken into three components: trend cycle, seasonal, and irregular. Turning points are based on the trend cycle component that represents the variation over a long period of time. Outlier points are based on the irregular component that represents the abnormal value of irregular.

Time plot ADF test image Figure 3: Time plot – ADF test

The Augmented Dickey-Fuller (ADF) t-statistic test for the null hypothesis that a unit root is present in a time series sample.

There are three models to be tested:

  • Type 1 : No intercept, no trend
  • Type 2 : Intercept
  • Type 3 : Intercept plus trend

The null hypothesis of these three models is : There is a unit root in the series and the series is not stationary. If p-value is lower than 0.05, it means null hypothesis is rejected. The series is stationary or its behavior can be represented with difference model corresponding to be stationary. Lag indicates that the difference of the process to be stationary.

ACF (Autocorrelation function) and PACF (Partial autocorrelation function) plots present the coefficients and partial correlation of correlation between a time series and lags of itself. In this sample, ACF decays more slowly, and PACF displays a sharp cutoff after lag 1. The characters display AR(1) model. You’ll find it has a significant spike on lag 12 and lag 24.

Time plot ACF PACFimage Figure 4: Time plot chart – ACF/PACF

EACF (Extended autocorrelation function) is a useful tool to identify mixed ARMA model. In some cases, a “mixed” model with both AR and MA terms may provide the best fit to the data.

Time plot EACF image Figure 5: Time plot – EACF

You can see there is a triangle of “O” with a vertex at (1,1), it proposes ARMA(1,1) model.

Spectral analysis is used to identify periodic behavior in the time series. In this example, each of the data points in the time series represent a month. Therefore, an annual periodicity corresponds to a period of 12 in the current data set. Because period and frequency are reciprocals of each other, a period of 12 corresponds to a frequency of 1/12 (or 0.083). So an annual component implies a peak in the periodogram at 0.083, which seems consistent with the presence of the peak just below a frequency of 0.1.

Time plot Spectral analysis image Figure 6: Time plot – Spectral analysis

t-SNE chart

t-distributed Stochastic Neighbor Embedding (t-SNE) is a tool to visualize high-dimensional data. It converts similarities between data points to joint probabilitiy–and tries to represents high dimensional data nonlinearly in reduced dimensions (typically 2D or 3D in order to visualize complex data). You can use t-SNE input to create clusters and edges in data visualizations in reduced dimensions.

t-SNE chart image Figure 7: t-SNE chart – Handwritten digits

Below is an illustration of various embeddings on the handwritten digits dataset. The samples written by 44 writers and make up of 1797 8×8 images. Each image is of a hand-written digit. In dataset it transformed into a vector with length 64.

t-SNE chart 2 image Figure 8: t-SNE chart – Handwritten digits transformed into 64 columns dataset

After applying t-SNE, the dataset will show below result:

t-SNE chart 3 image Figure 9: t-SNE chart – 2D plot result after applied t-SNE

The different digits saved in a high-dimension dataset are separated clearly.

Relationship chart

We’ll use the Relationship chart to analyze weather data. Relationship charts show how columns of data relate to one another and what the strength of that relationship is by using varying types of lines.

Select Relationship in charts list

Click the dropdown arrow to add all supported columns (only discrete data will be supported) into Columns Box. There are 4 discrete columns in Weather data: LEVEL, KEY_POLLUTANT, WEATHER1 and WEATHER2.

Relationship chart image Figure 10: Relationship chart – Weather level relationship with pollutant factors

The largest point in the figure is (LEVEL.good), and the thickest connecting line is (key_pollution.NO2), indicating that good air quality is closely related to the value of NO2. Move the mouse over the biggest green dot labelled (LEVEL.good). You’ll see that all the other values that are not related to (LEVEL.good) will be filtered out. The thickest line from (LEVEL.good is KEY_POLLUTION.NO2) which means for good air quality, the key pollution is most likely NO2.

Parallel chart

Click Parallel chart and the settings will be shown as below. Parallel charts display and compare rows of data (called profiles) to find similarities. Each row is a line, and the value in each column of the row is represented by a point on that line. In the Columns, add “PM10”, “SO2”, “CO”, “NO2”, “O3_8h”, “WINDSPEED_MEAN”. Next, select “LEVEL” as Color Map.

Parallel chart image Figure 11: Parallel chart

Click on the “WINDSPEED_MEAN” axis, and keep the left button of mouse down and move it to define a filter on the axis between 13 to 30. All the lines are either green or blue which means air quality is either good or excellent when the wind speed is higher than 13.

Parallel chart 2 image Figure 12: Parallel chart – filter for discrete data

Click on the filter and move it down without losing the left button of mouse. The lines have all the colors when wind speed is lower than 13. This means the air quality is uncertain when the wind speed is lower.

Change Color map to AQI (continuous column) and keep all of the above columns. The parallel will display as shown below.

Parallel chart 3 image Figure 13: Parallel chart – filter for continuous data

Move the mouse to define the filter AQI on the axis between 100 to 205. it is easier to find that the air quality is higher when wind speed is lower between 0 to 10.

Candlestick chart

Candlestick charts are a type of financial chart that displays price movements of a security, derivative, or currency. A set of stock data can be used to illustrate the candlestick chart. The configuration of each parameter is shown in figure 32. You can choose to display the opening price, closing price, maximum and minimum value, trading volume, and daily average.

Candlesticks chart 3 image Figure 14: Candlestick chart

3D chart

In addition, data Visualization also makes detailed optimization for data graph interaction. 3D charts display data in a 3-D coordinate system by drawing each column as a cuboid to create a 3D effect. It can scale and shift the graph in the coordinate system and show details. In addition, it also makes color and transparency processing for data of different latitude.

3-D visualization is more eye-catching, providing users with different levels of screen configuration items. A few rows of data can be configured to get a picture, as shown below.

3D chart image Figure 15: 3D chart – surface

3D chart 2 image Figure 16: 3D chart – bar

Summary

In this tutorial, you learned more about data visualization’s functions, settings, applications, and rendering effects for charts. In addition, we showed how data visualization provides a powerful chart library that contains a variety of charts and related statistics calculations.