IBM Developer Day | Bengaluru | March 14th Register now
Tutorial
By Dirk Schober | Updated November 22, 2017 - Published November 21, 2017
AnalyticsIoTCloud
This tutorial is based on the Harlem Shake game that Romeo Kienzler developed and presented in his tutorial titled Create a fun, simple IoT accelerometer game. Kienzler’s game uses simple movement data from a smartphone, streams the data to the cloud, captures the data in a Cloudant database, and then analyzes and determines the winner using IBM Watson Studio (formerly IBM Data Science Experience).
In this tutorial, we’ll start with Kienzler’s basics by also using IBM Cloud (formerly IBM Bluemix) and the IBM Watson IoT Platform services, including Node-RED, MQTT (in Watson IoT Platform), and Cloudant. We’ll differ from his Harlem Shake game in these ways:
Playing with sensor input from a smartphone is fun, but are there business applications for sensor data? The short answer is, yes. Imagine you have a production facility and the sensor data tells you that anytime a truck is driving near it, the production quality lowers. That would be useful information to know. A “movement” is nothing more than a combination of three-dimensional sensor data about position and speed. Keep in mind there is more than moving sensors and you can consider other sensor input like temperature, pressure, or even optical recognition of the person who enters a shop. All this data can influence business key performance indicators (KPIs) like quality, throughput, sales, or the efficiency of people working. Now, if you know what the situation is, you could take action on and improve the situation.
We will build our game in several steps:
This first step is a big one. As we mentioned before, this tutorial is based on the Harlem Shake game published by Romeo Kienzler. To get started, go to Create a fun, simple IoT accelerometer game, and complete the steps in Steps 1 through 5. When you’re asked to name your application, you can call it anything you like, but for the examples in this tutorial I called my application anythinguniquewilldods123.
anythinguniquewilldods123
Before you come back to this tutorial to extend Kienzler’s work, you’ll have deployed a game application using one-click deployment, replaced the Internet of Things Platform service, ensured the MQTT message broker can receive data, set up a Cloudant NoSQL database to store the data, and streamed the data to Cloudant using Node-RED. (Don’t worry – it’s not as much work as it sounds, and it’s fun!)
After completing those steps in Kienzler’s article, we will next check if data arrives in the table. Ensure the game app on your smartphone is still sending data by looking at the debug tab in Node-RED.
Everything should look good. The smartphone shakes, the data streams up to the cloud, and the database is holding the data. But at this point, we still do not know if data actually arrives in the Cloudant instance. Kienzler’s approach was to use IBM Watson Studio, but we’re going to use SPSS Modeler. Before we get to that point, however, we need to make a few more changes.
Remember, Cloudant is a NoSQL (“Not-Only-SQL”) database system that we use to store the data. We can perform a basic check to see if data is arriving by using the Cloudant Dashboard.
We need to modify the Harlem Shake application to not only record the x, y, z position data of the smartphone, but also the acceleration data and a few more data points.
iotmovements
msg.payload = { X : msg.payload.d.ax, Y : msg.payload.d.ay, Z : msg.payload.d.az, alpha: msg.payload.d.oa, beta: msg.payload.d.ob, gamma: msg.payload.d.og, stime: msg.payload.d.ts, mtype: "roll", SENSORID : msg.payload.d.id }; return msg;
The app running on your smartphone delivers the parameters X, Y, Z, alpha, beta, gamma, and stime. The mtype parameter is set to the first of our experimental movement types. We go deeper into this parameter in the coming steps. Optionally, you can give the function a useful name (I used flatten_for_training.). The final result should look like this.
X
Y
Z
alpha
beta
gamma
stime
mtype
flatten_for_training
Note: The SENSORID value should reflect the alphanumeric ID you used when you activated your smartphone. Do not worry about the format of the timestamp. We transform it later into a more readable format.
SENSORID
We record some example data for three different movement types:
Wiggle – hold the smartphone vertically and then rotate your hand from side to side (or left to right).
You can experiment with your own ideas later.
To identify the movement types in the data records, we have to tag them inside the code. We already did this for the first movement type roll in the previous step. For testing, we disconnected the Cloudant node. We will finalize the pre-work now, and start the collection of sample data.
roll
turn
wiggle
Now that we recorded sample data for different movement types in the database, we need to connect our database to our analysis tool, SPSS Modeler. We build a statistical model and “teach it” the data structure of the movements.
Before we can connect our database to our modeling tool, we first have to install the SPSS Modeler. The exact steps depend on your version of SPSS Modeler and your operating system. If you already have SPSS Modeler installed, you can use your installation.
SPSS Modeler is open for extensions using public assets. We use one of these public assets to connect to the Cloudant database. With a working installation, install these necessary extensions.
cloudant_import_demo_complete_real_sensor.str
We now have a working connection between our SPSS Modeler desktop-based analysis workbench tool and a cloud-based database. The data can be analyzed just like any local database or file.
We next create a predictive model that “knows” how to identify the movement type out of the raw data.
SPSS Modeler “learns” from existing data to create a model and uses the resulting model to apply what it has learned to new data.
In this step, we teach the model about the data we recorded. The resulting model “knows” the combination of parameters to identify the different movement types.
In statistical terms, this is a classification or decision tree model. The huge advantage of SPSS Modeler is that we do not have to know anything more about statistics – the tool finds the correct model automatically.
Our tutorial reflects a fact about real-life data science projects: eighty to ninety percent of the work is getting the data and transforming it in some way. In our case, we need to perform two simple steps:
The timestamp is a good example of raw data provided by a sensor that has to be transformed to be usable. The timestamp from the smartphone app is just a serial number based on 1 January 1970 as starting point. We can set this reference point in the stream’s properties.
1970
timestamp
datetime_timestamp(stime/1000)
The new field contains a time and date value that SPSS Modeler can use for graphs and analysis. We do not use it in this tutorial – it is just an example transformation – but you can pick it up later if you want to chart some data.
The raw data for the movement in X, Y, and Z direction might not be sufficient for a good prediction. It is very typical for a data science project to use the raw data to calculate a new measure. In his original tutorial, Kienzler calculates the overall energy (something like the relative movement in all directions). We pick up this example here again by adding another Derive node.
energy
sqrt((X*X)+(Y*Y)+(Z*Z))
In this step, we tell SPSS Modeler what exactly should be predicted – What is our target? This is done using a Type node.
After the second Derive node called energy, add a new Type node.
We use a classification model to “learn” the connection between the raw data, the calculated energy, and the movement types. The model learns which combination of input data is typically observed for the three different movement types.
SPSS Modeler knows different algorithms for a classification (or decision tree) and even can find the best working model automatically. Because we want to deploy the model later to IBM Machine Learning in IBM Cloud we use one specific algorithm here.
We now have all the knowledge about moving the smartphone programmed into the classification model nugget. We can now collect some new data (without the movement type) and let the model tell us which movement type this might be. We call this the “deployment of the model.”
In this step, we collect some new data and find out how the smartphone moved.
We set up a new Cloudant database to collect the scoring data (raw data as before, but without the movement type identification – imagine someone moved the smartphone in a hidden place) and run it manually in SPSS Modeler.
First, create the new database instance.
iotmovements_scoring
We created a new empty database instance and we redirect the sensor data collected from the smartphone into this new instance.
mtype: "-",
We’ve created the database and given it the code to collect the data. Now it’s time to test it.
We need to create a working deployment for predicting the movement types.
We now have a working deployment for finding a “best guess” on the movement types. Sometimes it might be more useful to score the sensor data “on the fly” instead of recording it in a database before scoring.
The IBM Cloud platform lets you quickly and easily deploy a predictive data stream to the cloud. It is immediately available for scoring without the need for your own software infrastructure.
The Machine Learning service in IBM Cloud can make use of a SPSS Modeler stream file (.str) for cloud-based scoring. You can use a Cloud Foundry App (like the one we are already using) to feed data into this stream. We again use Node-RED to set up the needed data flow. The Machine Learning service omits (or cuts off) the first and the last node from the stream and places the remains into the Node-RED data stream. We will try this again to completely understand the idea.
.str
Review the existing SPSS Modeler stream. It currently looks like this.
Imagine how the stream would look if the first and the last node are cut off. This wouldn’t work. The User Input node at the beginning is purely technical (remember, the Cloudant node needs an input) and without the last node there is no output at all. What we don’t need is the Cloudant node for the deployment because the data flows in directly from the Cloud Foundry App. However, we still need the same structure of fields for the rest of the stream. There is an easy help for this.
user input
scoringdata
We prepared the SPSS stream file for deployment in the IBM Cloud Machine Learning service. Now we have everything prepared for the deployment itself.
The list now should look similar to this.
We can now connect all the pieces:
First, we have to work in Node-RED again.
We replace the current database storage with an online scoring. If you are familiar with the environment, you might also establish both database storage and online scoring.
https://<URL>/pm/v1/score/<CONTEXTID>?accesskey=<ACCESS_KEY>
https://ibm-watson-ml.mybluemix.net/pm/v1/score/iotmovements?accesskey=7Axxxxxxxx1WXWeZv
The input for the http request is a complex object that contains a tuple for the headers of the fields and a matching tuple with the necessary values. We use the function node before the http request node for the necessary transformation and do not go into too much detail here. This function connects to the scoringdata node that we previously created and renamed. a. Double-click the function node and copy the following code into the Function field:
msg.headers = { "Content-type" : "application/json" }; msg.payload = {
"tablename":"scoringdata", "header":["X_id", "X_rev", "X", "Y", "Z", "alpha", "beta","gamma","stime","mtype","SENSORID"], "data":[["X_id","x_rev", msg.payload.d.ax, msg.payload.d.ay, msg.payload.d.az, msg.payload.d.oa, msg.payload.d.ob, msg.payload.d.og, msg.payload.d.ts, "-", msg.payload.d.id]]
}; return msg;
b. Name the function node with a useful name. c. Click Done. The final result should look like this.
After replacing the database with the cloud-based scoring, we can test to see if the scoring is working.
Transform the resulting JSON object into a more usable format that contains only the necessary information.
msg.payload = { X : msg.payload[0].data[0][2], Y : msg.payload[0].data[0][3], Z : msg.payload[0].data[0][4], alpha : msg.payload[0].data[0][5], beta : msg.payload[0].data[0][6], gamma : msg.payload[0].data[0][7], timestamp : msg.payload[0].data[0][8], device : msg.payload[0].data[0][10], energy : msg.payload[0].data[0][12], pred_mtype : msg.payload[0].data[0][13], pred_score : msg.payload[0].data[0][14] }; return msg;
Finally, for later reporting you might want to display the results on a web page, collect it for another database, or both. For the database, you should know now what to do. For the web page, this might be an interesting tutorial for you to write yourself.
This tutorial was born from my personal confusion about IoT, cloud, sensors, and especially the question “where is the magic?” Technically, we connected a sensor (our smartphones) to the cloud, stored the produced data in the cloud, and analyzed the data using SPSS Modeler. We created a model that is able to decide about “what happened with the device” just by looking at the sensor data. And, we learned different ways to bring the created model to life.
For me, the best learning was about all the interconnection on the IBM Cloud platform. Before IBM Cloud I did many similar test setups and much of the configuration and simple setup steps were confusing. There are a lot of moving parts to consider and it helps to know that IBM Cloud is one place to go and to get started. Furthermore, it was great to use the statistical and predictive workbench SPSS Modeler in connection with a cloud setup.
What should you do next or how can you learn more? Feel free to extend in all directions. The first logical step would be to make the model more accurate. Consider working on your own real-world example. If you find your own realistic case with real-world sensor data, start with something small to get a basic understanding of the data and especially the business case behind it. For me, there is absolutely no need to work with or on data without a proper goal or business case.
From a business standpoint, the question would be very down to earth, for example, “how to get sensor data from older machines without replacing them?” On the data side, ask yourself “do we really need all the detailed sensor data that all the world will produce within the next years?” If so, how do we decide what data is important? One answer to this question could be to extend the architecture by cognitive methods that you can also find in IBM Cloud.
IoTNode-RED+
Back to top