This code pattern is part of the 2020 Call for Code Global Challenge.
In this code pattern, we’ll demonstrate how to analyze a large air quality dataset provided by the EPA. This can be considered as a “smart cities” use case. We demonstrate how to analyze large data sets with Watson Studio and Python data science packages. The Jupyter notebook offers a few different examples of how to take advantage of open source software packages to analyze data sets.
This pattern requires a structured dataset. This data can be generated in a variety of ways. One way is to follow our related pattern titled “Setting up the hardware platform for long-range IoT systems that use LoRaWAN networking,” which goes through the process of deploying a long range network to collect sensor data.
As an alternative, we’ll use a dataset that has been generated by the EPA, which measures pollutant levels at several locations throughout the United States. Measurements are taken hourly throughout the year, which enables us to leverage time series analysis.
When you have completed this code pattern, you will understand how to:
- Create a Juypter notebook in Watson Studio.
- Clean datasets by removing non-essential data.
- Find patterns within datasets using pandas (Python Data Analysis Library).
- Create graphs using the “matplotlib” library to visualize high level data trends
- The end node devices capture sensor data in the field.
- The captured data is sent through through a wireless protocol to a gateway.
- The gateway forwards the sensor data to Watson IoT platform.
- The data packets received by Watson IoT Platform are archived in Cloudant.
- Watson Studio imports archived data and processes the data using Juypter notebooks.
Ready to get started? For detailed instructions, especially for a walk through the analyses done by the Jupyter notebook, please see the README.