by Joy Patra, Amitranjan Gantait, Ayan Mukherjee | Published April 4, 2018
IoT solution governance involves having strategies, architectures, teams, and processes in place to develop, deliver, and maintain successful IoT solutions. Part 1 of this series helped you define your overall IoT governance practices. Two key components of any IoT solution are the devices and the data. Part 2 of this series focused on how to govern your devices, and Part 3 will focus on how to govern your data.
IoT devices generate and transfer a huge amount of data over the internet. An effective governance mechanism is required, both at the enterprise level and at the platform level, to ensure that this data is effectively used by intended stakeholders and not misused by others.
Data governance in IoT has the same characteristics as data governance in an enterprise, such as data collection, data quality, data storage, data processing, and data consumption. However, IoT solutions need to address a few additional areas in the data lifecycle:
When defining your IoT data governance practices, you must take into account the IoT context or use case, such as managing a large-scale infrastructure (a smart city), managing a utility infrastructure (an energy grid), or managing production or manufacturing lines.
Each layer of an IoT architecture must manage the IoT data throughout its lifecycle. The IoT data lifecycle starts at the physical layer (devices and gateways) with data collection, proceeds through the communication layer with data transportation, then through platform services layer with data storage and data processing, and finally at the application layer with data visualization or data reporting.
IoT data management is not only about the sensors, but also about the actuators and the edge gateways. As an example, a visual door lock may have a camera to take a photo of the visitor, may process the photo against a set of pre-approved faces on the edge gateway to automatically recognize a visitor, and actuate the lock to open the door. Thus, you have to manage the sensor data that is collected, manage the reference data that is used to process or filter the sensor data, and finally manage the control data that is sent to the actuator.
In order to process the data (either real-time data or historical data), data coming from sensors must be stored in appropriate storage. Depending on the data format, frequency of the data and processing need, a proper storage needs to be decided and the data needs to be transported securely to the storage platform. An end to end IoT data management solution needs to address secured transportation, storage and consumption of the data.
Subsequent sections provide more details about managing IoT data lifecycle.
IoT data collection begins from the hundreds, possibly thousands, of sensors. These sensors might be incorporated in static devices, such as a smoke alarm, or in mobile devices, such as in a navigation device. These sensors might be manufactured by a widely varying set of manufacturers, and, therefore, might have a wide variety of standards in data formats and protocols. They will likely need regular calibration. Depending on the sensor standards, the data might need to be aggregated or otherwise filtered before sending to the server.
You need to model the data from these disparate sources to make it uniform based on the specific need of the IoT solution and you must ensure that the data meets the required quality requirements in order to transform the data from “raw” data to “usable” data. This phase is often seen as the first gateway between the worlds of unknown, unseen, or unorganized IoT data and structured, meaningful, or usable inputs for the next lifecycle phases.
You need to begin with standard metadata for your IoT solution context and maybe also at an enterprise context for your organization that defines the data structure that you want to enforce. While the metadata standards for sensors in your one factory or a section of that factory might be unique to that solution, you might want to establish enterprise-level metadata standards for your supply chain that might begin at one continent and end at another.
Due to the varied nature of the devices and of the business solutions, it is quite challenging to have a suitable metadata standard that is at the right level of detail. You might end up generalizing too much or too little. It’s important to not to take the “one size fits all” approach. Depending on the business context or solution context, choose the granularity that works across the devices and manufacturers but that allows you to enforce your governance policies.
It could be critical for you to ensure that the sensors are well calibrated, especially in demanding industrial situations or in highly sensitive instruments. You will want to collect data at a sufficient granularity and at sufficient frequency that allows you determine, or at least infer, whether the sensor calibration is off. If it is not possible to determine if calibration is off, you might have to set policies for regular manual verification, which might be more expensive.
IBM Watson IoT Platform supports defining a logical device schema to abstract out complexity and isolate IoT applications from vendor-specific device details. In device schemas, you can use meaningful attribute names for different data that is coming from devices, especially devices from different vendors. Once you have the schema defined, you can use that to create data rules and corresponding actions based on the logical device schema.
In addition to data modeling, you must consider data quality, in the form of consistency, completeness, timeliness, and reliability.
Review the IoT networking guide for more information about transport protocols. Also, you can read more about securing IoT data over the network in part 2 of our other series, “Design and build secure IoT solutions.”
IoT raw data comes from different discrete sources with varying formats, structures, and importance. Transmitting a high volume of raw data across a network in real time and storing that raw data is expensive. Data is typically sent from one step to the next by using either wired or wireless protocols. When you design your IoT system, you must carefully select an appropriate transport based on the nature of the data and your IoT devices. There are many transport protocols, including the standard protocols HTTP and MQTT, but also a number of proprietary protocols.
Please refer to Watson IoT platform and IBM Message Hub for more information on IoT message transportation and transformation.
For more details on IoT data storage and analytics, read this article, “Making sense of IoT data“.
When we start talking about storing IoT data, several interesting questions must be addressed in your IoT governance:
Devices and sensors generate a huge amount of raw data that needs to be processed to extract meaningful information for users and applications to consume or use. Before you can apply advanced analytics on the data, which is coming from different devices (of different vendors) in different formats, you may need to:
Processing data for an IoT solution requires that you:
Therefore, the IoT governance team has to focus the computing power with these IoT data governance policies:
IBM Watson IoT Platform provides a rich set of analytical services to analyze IoT data that is produced by wide variety of devices. Depending on the need of a specific application, developers can start with basic real-time processing using rules and actions to creating advance machine learning models to predict possible outcomes based on data sent by the devices.
The data consumption phase talks about data selection, enrichment, cleansing, reporting, visualizing, and other housekeeping activities. At a very high level, there are two strategies driving this phase.
To get value from the IoT solution, raw or processed data (by an IoT application) needs to be made available to external users and other applications in a secure way. The main objectives of this phase and the related governance policies are as follows:
IBM Watson IoT platform has in-built support for visualizing real-time data by using boards and cards. Developers can build custom visualization based on the data stored in Watson IoT platform accessing data using secured API provided by the platform.
The data that is collected and stored by different IoT devices, services, and systems are increasingly being scrutinized by regulatory agencies and government. Tighter laws and regulations are being introduced around protection of sensitive and personal data. It is becoming more important for the developers and designers to know exactly what data is being stored and why it is being stored. Also, there could be legislations or regulations that are specific to countries or regions that make the physical location of the IoT data an important consideration. In the US, there is the Health Insurance Portability and Accountability Act (HIPPA). In the EU, the General Data Protection Regulation (GDPR) was adopted in April 2016 and becomes enforceable in May 2018. And, if your IoT solution involves the use of drones, there are cyber laws for drones in Israel.
You can read more about securing IoT data over the network in part 2 of our other series, “Design and build secure IoT solutions.”
As you design your IoT solution, you need to take these IoT data privacy and security guidelines in mind:
Data governance for IoT requires well-documented policies to ensure that the data that is generated and used by the IoT solution conforms to all the requirements and standards. Data governance requires a significant focus on security policies to allow for the valid consumption of data.
The Data Policy Reference Model (see the diagram below) can assist organizations in understanding and creating a complete set of data governance policies. The model consists of:
Policy life cycle management, which manages the authoring, transformation, enforcement, and monitoring of the policies.
Data policy layers, which is a vertical dimension for policy classification and provides a level of abstraction for policies including business, architectural, and operational.
Data policy domains, which is a horizontal dimension for policy classification and identifies the policy domains that each of the policy layers must address or at least consider. This includes business, process, service, information, and non-functional requirements.
Data policy enablers, who assist in the proper management of the policy life cycle, including policy auditing and logging, which provides traceability, distribution and transformation, and monitoring and reporting.
Data lifecycle management, which covers the policies related to data collection, transportation, storage, and processing.
The indicative policies for IoT data governance include these policies:
In this article, we have discussed why data governance is a key aspect of designing and operating an IoT solution. We have described key data lifecycle phases, such as how data is to be collected, how data is transported and stored, and how data is to be processed in an IoT solution. Data security is a key dimension of the data governance solution and proper care must be taken to address data security issues at each phase of data lifecycle. Finally, we presented a common set of IoT data governance policies.
To address the challenges inherent in planning and implementing complex IoT solutions, teams need a governance model. This article series…
Learn how to govern your devices by choosing the right device for your solution and managing the lifecycle of those…
Back to top