Every data project starts with data. Data is a very broad term. It can be structured or unstructured, big or small, fast or slow, and accurate or noisy.

IoT analytics solutions like anomaly detection require deep learning, as I explained in my previous article where I introduced deep learning and long-short term memory networks. To effectively demo the process of creating a deep learning solution, I need data. I need structured, fast, and big data, which can be noisy too.

To have a simple framework for creating data, I've written a test data simulator.

This simulator generates data from sampling various physical models, and you can decide on the degree of noise and switch between different states (healthy and broken) of the physical model for anomaly detection and classification tasks.

I’m using the Lorenz Attractor model. This is a very simple, but still very interesting, physical model. Lorenz was one of the pioneers of chaos theory, and he was able to show that a very simple model that consists of just three equations and four model parameters can create a chaotic system that is highly sensitive to initial conditions and that also oscillates between multiple semi-stable states where state transitions are very hard to predict.

If you want to learn more on Node-RED, check out these videos.

I'm using Node-RED as the runtime platform for the simulator because it is a very fast way of implementing data-centric applications. Node-RED is open source and runs entirely on Node.js.

Prerequisites

To complete this tutorial, you need an IBM Cloud account.

This tutorial requires an IBM Cloud Pay-As-You-Go account. To upgrade your Lite account, go to your account settings. In the Account Upgrade section, click Add credit card to upgrade to a Pay-As-You-Go account, or click Upgrade for a Subscription account. See Upgrading your account for more information.

This tutorial provides instructions for deploying an app to IBM Cloud Code Engine, which is a fully managed, serverless platform that runs your containerized workloads and manages the underlying infrastructure for you on top of Kubernetes Knative. IBM Cloud Code Engine provides 100000 vCPU seconds per month at no charge. Your Node-RED flow will often scale to 0, which means that you won't incur any charges for light to moderate usage. Make sure that you review your consumption and confirm your billing on a regular basis.

Steps

Log in to your IBM Cloud account.
Open IBM Cloud Code Engine.
Click Projects in the left nav, and create a project and give it a name.
Open the project, click Applications in the left nav, and then create an application.
In the Choose the code to run section, make sure Container image is selected, and in the Image reference field, specify romeokienzler/node-red-codeengine.

And, in the Listening port field, specify 1880.
Scroll to the bottom of this Create application page. In the Runtime settings > Environment variables section, click the Add button. Then, add a literal value environment variable that contains the link to your Node-RED flow file. In our example, specify FLOW_FILE as the Environment variable name and https://github.com/romeokienzler/ibm-developer/blob/master/lorenzattractor/simulator_flow_complete.json as the Value.
On the Create application page, click the Create button.
After your application has been created and is in the Ready state, in the Test application section, click Application URL.

You should be able to see Node-RED and the test data generator application:

Screen capture of IBM Cloud Code Engine Create application

Initially, the test data generator is set up in “healthy” state and sends 3000 measurements to an MQTT broker. Whenever you want to have it create “broken” data just click on the “broken” inject node. After you click on the “reset” node, you’ll get another 3000 messages in either “healthy” or “broken” state.

The test data generator is implemented to send messages (data) to a remote MQTT broker - with Node-RED this behavior can be easily changed to a (relational) database, cloud object store, or a Kafka message queue. The mqtt-broker node uses the free HiveMQ broker service, so you might want to change to MQTT topic in the MQTT connector at the end of this Node-RED flow to not collide with other peers taking this tutorial.

Conclusion

You've successfully deployed a test data simulator creating a time series of events sampled from a physical model. You can also switch between two states (healthy and broken) for anomaly detection and classification.

In the next tutorial, I'll focus on using TensorFlow and Keras to implement a scalable anomaly detection algorithm using the concepts above for time series anomaly detection. We’ll containerize this algorithm such that it can serve as the model inside a Docker, Kubernetes, or serverless Knative environment like IBM Code Engine.

Generating data for anomaly detection

Generating data for anomaly detection