The recipe titled ‘Weather Data meets IBM Cloud. Part 1 – Ingestion and Processing‘ showed how to ingest and process weather data from the Weather Company into the IBM Cloud Platform using IBM Functions and IBM Event Stream. A second recipe titled ‘Weather data meets IBM Cloud. Part 2 – Storage and Query of Weather Data‘ followed up by showing how to store and query weather data using IBM Cloud Object Storage and IBM SQL Query. The third recipe titled ‘Weather Data meets IBM Cloud. Part 3 – Transformation and Aggregation‘ continued by showing how weather data could be transformed and aggregated making it available for further analysis and visualization using e.g. traditional Business Intelligence techniques.
In this recipe we shall show how the weather data can be used to generate typical statistical diagrams using Jupyter Notebooks, Python, pandas and matplotlib in a first step. In a second step, we will then use IBM Cognos Dashboard Embedded to create two dashboards: one showing weather observation data and another showing lightnings world-wide at a given point in time. For the dashboard we will use CSV files as well as PostgreSQL as data source. Key component for all activities will be IBM Watson Studio.
In the recipe you will go through the following:
- Section 3: Get started by provisioning IBM Cognos Dashboard Embedded service and configuring the IBM Watson Studio project.
- Section 4: Import, configure and run a Jupyter notebook using matplotlib to generate statistical diagrams.
- Section 5: Create the dashboards using IBM Cognos Dashboard Embedded.
Before the hands-on exercises we will briefly introduce the overall architecture and introduce the main new components: matplotlib and IBM Cognos Dashboard Embedded.
The underlying architecture is a typical IoT architecture with services for event data ingestion, event handling (message bus), landing zone (object storage) and event data processing to transform, aggregate, load and visualize event data in a form so that it becomes business relevant. In other words, it is a data refinery pipeline turning “raw oil” into “business relevant information” covering data at rest.
In the first part of this series, weather data was retrieved from the Weather Company Data service by IBM Cloud Functions and delegated onwards to IBM Event Streams. In the second part IBM Cloud Object Storage Bridge was configured to write data from IBM Event Streams to IBM Cloud Object Storage. Once that the weather data had been stored in IBM Cloud Object Storage, it became possible to query, transform and aggregate the data using IBM SQL Query, pandas or IBM Analytics Engine (running Apache Spark). Key components in this architecture was IBM Watson Studio and its underlying support for Jupyter notebooks and Python, which was used to define the required Extract, Transform and Load (ETL) jobs that turned the raw event data into something that could be used for analysis. The resulting data was then either stored in IBM Cloud Object Storage or in a PostgreSQL database. As a prerequisite for writing and reading data from IBM services like IBM Cloud Object Storage and IBM Databases for PostgreSQL, proper access permissions had however to be created using IBM Identity and Access Management (behind the scene).
In this part we will look into how to analyse and visualize the data using two additional components of IBM Watson Studio and the IBM Data Platform: the open source library matplotlib and IBM Cognos Dashboard Embedded. The first will be used to go through a few typical steps covering statistical analysis of weather data using functions of pyplot. The second will be used to generate two dashboards: one for showing weather observation data and another for showing lightnings world-wide at a given point in time.
Matplotlib is a plotting library for Python that was originally developed by John Hunter. It provides a large library of various plots such as pie charts, line charts, bar charts, area plots, histograms, boxplots, scatter plots and polar plots to mention a few. It has been developed with an open 3-layer architecture in mind, consisting of the Backend Layer (Canvas, Rendering, Event Handling), the Artist Layer (simple and composite elements like line, text, axis, rectangle) and the Scripting Layer (pyplot). Through a large community contributing to Matplotlib extensions are available e.g. for waffle charts and word clouds. There are also complete libraries available on top of Matplotlib such as Seaborn that can generate visualizations with significantly less code than matplotlib.
In this recipe we shall however use pyplot which is a Matplotlib module that provides a MATLAB-like interface. It has the advantage that it is well supported within Jupyter Notebooks and can be used to render pandas data structures directly calling a plot function.
IBM Cognos Dashboard Embedded
BM Cognos Dashboard Embedded provides end users with the ability to explore data and create visualizations of their data. It is based on capabilities of IBM Cognos Analytics and allows a user to:
- Connect to data sources such IBM Cloud Database and comma-separated values (CSV) files.
- Create dashboards to monitor events or activities that provides key insights and analysis about the data.
- Explore the data that is shown in a visualization by using interactive filters, drilling up or down, and viewing the details of a data point.
- Add visualizations of various kinds to a dashboard and fine tune the rendering using filters, sorting and calculations.
- Embed dashboards in applications.
IBM Cognos Dashboard Embedded is integrated into IBM Watson Studio (see e.g. the blog ‘Cognos Dashboards Now Available Within Watson Studio‘) and allows you to interactively connect to data sources available in IBM Watson Studio. Once this is done, dashboards and visualizations can be constructed to render the data as deemed appropriate for the purpose at hand. In this recipe we shall use IBM Cognos Dashboard Embedded to create a dashboard showing lightnings world-wide:
This recipe builds upon the first 3 parts of the series where weather data has been ingested, processed, stored, queried and transformed. However, in case that you would like to start with this recipe, you can do so by downloading the input files from the GitHub repository ‘https://github.com/EinarKarlsen/weather-data-demo’. You will then need to manually upload the files ‘lightning-data.csv’, ‘weather-observation-data.csv’ and ‘weather-observation-temperatures.csv’ to your IBM Watson Studio project and then use these files for the exercises. In this case you can also skip the provisioning of the PostgreSQL database and use the file ‘weather-observation-temperatures.csv’ instead.
In this section of the recipe you will get started by doing the following:
- Create an instance of IBM Cognos Dashboard Embedded service.
- Associate the service with your project.
- Add data connection to IBM PostgreSQL to accommodate reporting over this data source.
Provision IBM Cognos Embedded Service
To provision the service, go through the following steps:
- Select the IBM Cloud Catalog in the IBM Cloud portal.
- Enter ‘cognos’ as search string.
- Select the ‘IBM Cognos Dashboard Embedded’ service.
- On the IBM Cognos Dashboard Embedded page:
- Provide a name for the service (optionally using ‘Weather Data Demo’ as prefix).
- As region choose the one that you used for the other services or one nearby if it should not be available in your preferred data center.
- As resource group choose the one that you created in the previous recipe (e.g. ‘weather-data-demo’).
- Add a tag (e.g. ‘weather-data-demo’) to the service.
- Scroll down and select for pricing the Free plan.
- Click Create to create the service.
If you go to the list of the resources in the IBM Cloud dashboard and set the filter for the tags to ‘weather-data-demo’ you should now get the following list of services:
Add the service to IBM Watson Studio Project
Add the newly created service to your project by doing the following:
- Go to the dashboard of IBM Watson Studio.
- Select the ‘Weather Data Demo’ project.
- Select the Settings tab on the top of the page.
- Scroll down and select Add Service in the Associated Services section.
- Select Dashboard from the pull-down menu.
- On the IBM Cognos Dashboard Embedded page:
- Select the Existing tab to choose between exiting Cognos services.
- Select the Cognos Embedded service that you just created.
- Click Select.
The service is now associated with your project and will be used within that project to create the any dashboards.
Add Connections to IBM Databases for PostgreSQL
One of the dashboards to be created will use data in the PostgreSQL database. Alternatively, you could also use the data in the file ‘weather-observation-temperatures.csv’ but if the PostgreSQL database is available, it is better to use this one in order to be acquainted with how to establish connection with databases using IBM Watson Studio.
To create the connection, you will need the credentials for the PostgreSQL service that was created in the previous recipe. Having obtained this continue by:
- In IBM Watson Studio, select the Assets tab in the top part of the screen.
- Invoke the Add to Project.
- Select Connection in the ‘Choose asset type’ dialog.
- Select in the section listing IBM Services the connection named ‘Compose for PostgreSQL’. This will open up a dialog for specifying the details of the connection.
- In the New Connection dialog do the following:
- As name for the connection, enter ‘PostgreSQLWeatherData’.
- For the User name, use ‘postgres.authentication.username’ property from the credentials.
- For the Password, use ‘postgres.authentication.password’ property from the credentials.
- For the Host name or IP Address, use the ‘hosts.hostname’ property from the credentials.
- For the Database, use the ‘database’ property from the credentials.
- For the Port, use the ‘hosts.port’ property.
- Click Create to create a new connection.
Your project should now look like:
Analysing Weather Data using Jupyter Notebooks
In this section you will import a Jupyter notebook to your project. The notebook will generate various matplotlib plots related to weather observation temperatures: bar charts, boxplots, scatter plots, histograms and line charts. The notebook will need two datasets: ‘weather-observations.csv’ and ‘weather-observation-temperatures.csv’. Beyond importing the notebook into the project, you will therefore also need to add two code cells for reading the data into pandas data frames.
Start by importing the notebook into your project first:
- In the Asset tab, click the command Add to Project.
- Select the Notebook asset type.
- In the New Notebook dialog, configure the notebook as follows:
- Select the “From URL” tab and enter ‘https://github.com/EinarKarlsen/weather-data-demo/blob/master/Weather%20Observation%20Data%20Analysis.ipynb‘ as the URL for the notebook.
- Enter the name for the notebook, e.g. “Weather Observation Data Analysis”.
- Select the runtime system (e.g. the default Python runtime system which is for free).
- Optionally, enter a short description for the notebook.
- Click Create Notebook.
Next insert the code that reads the CSV file named ‘weather-observations.csv’ into a pandas data frame:
- Select the second code cell (it is empty in your notebook).
- In the right part of the window, select the ‘weather-observations.csv’ data set.
- Click insert to code and select Insert pandas DataFrame. This will add code to the cell for reading the data set into a pandas data frame.
- Change the generated variable name df_data_1 for the data frame to df.
- Save the notebook by invoking File > Save.
- Run the cells 1-12 (inclusive) one by one.
The first two cells import the libraries and the data set. The main libraries used are numpy, pyplot and pandas:
The next cells in the notebook use typical pandas functions for generating descriptive statistics, investigating the type of each column and determining if there are any missing data in the data set.
The next section prints out the distribution of the weather observations with respect ot location using a bar chart.
Boxplot diagrams are then used for showing the statistical distribution of data. Two diagrams are generated – one for showing the statistical distribution of day temperatures and another for showing the statistical distribution of night temperatures. For this purpose, the data frame must be filtered first using a condition on the column ‘day-ind’. Moreover, only the ‘temp’ column is of interest:
The boxplot diagram shows the temperature distribution in Hamburg Finkenwerder from the end of April to the beginning of May in the year 2019. The diagram shows outliers caused by low temperatures around the freezing point (0 degree Centigrade). This reflect what inhabitants in the Hamburg area actually experienced: it was an exceptional cold start of the month. Usually temperatures in the range around 20 degree centigrade would be expected (or wished for) during that time of the year. Can we then, out of this observation, debunk the theory of global warming? Not at all: one thing is the weather, another the climate. Besides that, the data collected is not statistically significant for reaching such a conclusion.
The notebook continues by using a scatter plot to determine if there is any obvious linear or non-linear relationship between the observed temperatures and the pressure. This is obviously not the case considering the following distribution:
The final plots deal with aggregated temperatures on a daily basis. You will need to import code to read the data asset ‘weather-observation-temperatures.csv’ first since the plots uses aggregated information as a basis:
- Select cell number 13.
- Select the Find and add data command in the toolbar.
- For the file ‘weather-observation-temperatures.csv’, click the arrow to the left of the command Insert to code.
- Select the menu item Insert pandas DataFrame.
- Run the cells 13-17.
The notebook will import the data and associate it with a pandas data frame:
Before the notebook can pass the data to the plots it will however sort it and change the index to be the date of the observation. This operation is carried out ‘in place’ treating the data frame as a mutable object.
The final plots show the temperatures day by day according to minimum, average and maximum temperatures. One plot shows the day temperatures the other the night temperatures:
According to German folklore, the “Ice Saints” is a period in the month April/May when night frost has disappeared, and it is safe to start planting sensitive crops outdoors. In 2019 that period started on 11th of May and ended on the 15th. The collected data shows that folklore may indeed embody information of relevance.
Visualization of Weather Data using Cognos Dashboards
In this section we shall define a map to show lightnings world-wide at a given point (or period) in time. You will need to go through the following steps:
- Create an empty dashboard.
- Add a data source for lightnings to the dashboard.
- Create an initial map showing lightnings.
- Fine tune the layout of the map with respect to colours and style used.
To create a dashboard, do the following:
- Open the ‘Weather Data Demo’ project in IBM Watson Studio.
- Select the Assets tab.
- In the Asset tab, click the command Add to Project.
- Select the Dashboard asset type in the ‘Choose Asset type’ dialog.
- In the New Dashboard dialog, configure the dashboard as follows:
- Provide a name for the dashboard, i.e. ‘Weather Data Dashboard’.
- For the Cognos Dashboard Embedded Service, select the service that you created in section 3.
- Click Save
- Wait for the dashboard to be created.
- Select the template Freeform on the ‘Select a template’ page.
- Click Ok to create an empty dashboard.
- Select the tab named ‘Tab 1’.
- Select the pencil for editing the title to the right in the popup toolbar.
- Provide a name for the tab, e.g. ‘Lightnings WW’.
Having created an empty dashboard, you can now continue by adding a data source for the lightning data:
- Select the + button to the right of the screen (see screen shot above). This will open the ‘Select connection source’ dialog.
- Select Data assets in the ‘Select connection source’ dialog.
- Select the asset named ‘lightning-data.csv’.
- Click the Select button.
- The asset will now be added to the set of selected resources in the dashboard.
- Select the asset ‘lightning-data.csv’ to view the data asset.
- Expand the asset to view the schema by clicking the arrow to the left of the name:
Notice that if you click the dots to the right of a column in the table, a pop-up menu will appear that provides you with menu items for creating calculations or for changing the properties of the column. Actually, the column with no name is the pandas index column, so please feel free to rename it to ‘index’ if you like by selecting the Properties menu item.
Having imported the data source, we can continue creating a visualization in form of a world map to display the lightnings:
- Select the Visualizations tab to the left of the screen.
- Select the Map visualization, which will create an empty map where you will need to define the properties to be rendered.
- Expand Latitude/longitude to define the properties of the map.
- Observe that after the map has been created, the view to the left will have swapped back to the Sources view that displays the data source and its properties.
- Next define the properties of the map using the data sources properties:
- Drag and drop the column ‘lightning_data_cvs.latitude’ onto the map property ‘Latitude’.
- Drag and drop the column ‘lightning_data_cvs.longitude’ onto the map property ‘Longitude’. You should now start seeing the lightnings but in grey colour.
- Drag and drop the column ‘lightning_data_cvs.intensity’ onto the map property ‘Pont size’.
- Drag and drop the column ‘lightning_data_cvs.intensity’ onto the map property ‘Pont color’.
You have now created an initial map visualization showing the lightnings:
However, the colour of the circles is not right and the map layout need polishing. Moreover, if you look at the top right corner of the map it shows the intensity values ranging from a minus figure to a plus figure. Electricity has a polarity of plus and minus and so do lightnings. To get the size of the circle to render the intensity of the lightning correctly we will need to compute the absolute number for intensity and then use that as a basis for showing the size of the lightnings:
- Click the 3 dots to the right of the intensity property defining the size of the circle. This will cause a pull-down menu to appear.
- Select Calculation from the pull-down menu. This will open up the Create calculation dialog.
- Click the link Use calculation editor to open up an editor.
- In the calculation editor:
- Set the Name of the computed attribute to ‘absolute_intensity’.
- Set the Expression to ‘abs(lightning_data_csv.intensity)’.
- Click the Validate link to see if the expression is valid.
- Click OK to save the changes.
- You should now see a new computed column for the data source in the sources view.
- Drag and drop the column ‘lightning_data_cvs.absolute_intensity’ onto the map property ‘Pont size’.
- Drag and drop the column ‘lightning_data_cvs.absolute_intensity’ onto the map property ‘Pont color’.
We have now – in the large – finished the definition of the map, at least for what the semantics is concerned. However, it is possible to polish the rendering by changing the style of the map to a satellite view and the colour of the lightnings to red so that it looks like the following:
Do the following:
- Select Properties in the toolbar of IBM Watson Studio. This will open up an editor to the right of the map where you can fine tune the visualization properties.
- Click the Style property and set the style to ‘Satellite’.
- Expand the section Latitude/Longitude.
- Change the Heat palette to red.
- Select the down-arrow in the upper right corner of the map editor to collapse it.
- Resize the map visualization by dragging the bottom right corner in direction further to the right and down so that the map fills the canvas.
The final dashboard can be further fine-tuned. If you select the canvas, there will be options available to change the skin of the dashboard to dark. There are further options for setting the style of the map itself (like ‘Bright’) that may be less cool at a first glance, but more informative than the satellite style. You can also drill down e.g. to view lightings in a special area. Here’s a closer look at Lake Kissimmee in Florida:
In a final step you can create an additional tab on the dashboard to show aggregated temperatures, either taking the data from the PostgreSQL database or from the CSV file named ‘weather-observation-temperatures.csv’ that has been uploaded to GitHub.
The following provides a list of short instructions for creating the dashboard tab using PostgreSQL as data source:
- Select the + button to the right of the ‘Lightnings WW’ dashboard and create a new Freeform dashboard.
- Set the name of the dashboard to ‘Weather Temperatures’.
- Create a connection to the PostgreSQL database named ‘weather-observation-temperatures’ that you created in Part 3 of the recipe series.
- Expand the data source in the Sources tab to view the columns of the table.
- Drag and drop the property ‘weather-observation-temperatures.day_ind’ onto the canvas. This will render a list with two elements ‘D’ and ‘N’.
- Select the Visualizations tab in the toolbar to the left select the Line Chart visualization.
- For the Line Chart:
- Set the x-axis to ‘weather-observation-temperatures.date’.
- Set the y-Axis to ‘weather-observation-temperatures.mintemp’,
- and ‘weather-observation-temperatures.avgtemp’
- and ‘weather-observation-temperatures.maxtemp’.
- Order the temperatures (using drag and drop) so that the diagram becomes appropriately ordered starting with the minimum temperature.
- Collapse the chart
- Resize the chart.
Notice that you can now select ‘D’ or ‘N’ in the visualization to the left to filter the line chart to show day or night temperatures solely. To remove the filter, click the filter icon in the top right part of a visualization, then delete the filter using the popup menu that will appear.
In this recipe it has been shown how weather data can be analysed and visualized using IBM Watson Studio and its support for Jupyter notebook, Python, pandas, matplotlib and IBM Cognos Dashboard Embedded. By doing so, we have turned raw IoT data that has been ingested, processed, stored and transformed in previous recipies into relevant information.
Moreover, we have adopted a generic IoT architecture for the Cloud using a mixture of Open Source and IBM specific solutions. From an architectural perspective this is interesting for two reasons. First and foremost, processing and using temperatures applies to a lot of use cases and is not restricted to the domain of capturing and processing weather data. Other examples are connected cars where the motor temperature becomes relevant or retailers where the temperature of the fridges in the shops are of interest. Second, the architecture adopted with front end ingestion as well as backend IoT event data processing, storage and transformation is generally applicable as well to IoT use case beyond measuring temperatures.