IBM Developer Day | Bengaluru | March 14th Register now
By Beth Hoffman, Rupashree Bhattacharya | Published July 20, 2016
AnalyticsArtificial IntelligenceDeep LearningLinuxMachine Learning
Raw data in its unprocessed state does not offer much value, but with the right analytics techniques can offer rich insights that can aid various aspects of life such as making business decisions, political campaigns, and advancing medical science.
As shown in Figure 1, the analytics cycle can be broadly classified into four categories or phases: descriptive, diagnostic, predictive and prescriptive. Machine Learning is an approach to data analysis that automates analytical model building and is used in all four types of analytics.
Four types of analytics
Prescriptive analytics: This type goes beyond prediction to provide suggestions on how to best change future situations to meet your goals.
The relevance and the growing use of analytics using machine learning can be demonstrated by its widespread use in the 2016 US presidential election campaign. Unprecedented growth in the availability of useful information coupled with advancements in technology are making it attractive to use analytics to build and run a better campaign. Campaign teams analyze voter sentiment, population segmentation, and historic voting patterns and use this information to better plan on which states and voter profiles to focus their campaign efforts on in order to ensure maximum turnout. Machine Learning is at the core of what makes this possible. With this new trend, the real asset in any political campaign has rapidly changed from funds to voter’s data which is collected from pollsters, fundraisers, field workers, consumer databases, private companies as well as through cookies and tracker programs on campaign websites and social media apps.
Applying Machine Learning algorithms on this colossal repository of voter’s data has changed the landscape of election campaigns by delivering action-oriented insights: predictions for each individual voter. These insights are used by the campaigns to strategize ways to raise funds, better target advertisements, and create detailed models of swing-state voters. Machine Learning has the potential to increase the effectiveness of campaign efforts by calculating the likelihood a candidate would show up at the polls, the likelihood a supporter who did not consistently vote could be motivated to go to the polls this time, and finally, how persuadable someone would be by the various means of campaign contact. As a result, Machine Learning enables campaigns to be more metrics-driven.
Machine Learning algorithms iteratively learn from data, thus allowing computers to find hidden insights without being explicitly programmed where to look. Machine Learning is essentially teaching the computer to solve problems by creating algorithms that learn by looking at hundreds or thousands of examples, and then using that experience to solve the same problem in new situations. Machine Learning tasks are typically classified into the following three broad categories, depending on the nature of the learning signal or feedback available to a learning system:
Supervised learning: The algorithm trains on labeled historic data and learns general rules that map) input to output/target. For example, based on historic data of voters (voter details labeled with their votes (label) in the previous years), the presidential campaigns can predict which kinds of voters are likely to vote for a given candidate or which kinds of voters are persuadable by campaign efforts and use this information to better plan resource utilization.
In supervised learning, the discovery of relationships between the input variables (for example, the voter details such as age and income) and the label/target variable (for example, the vote cast by a particular voter in the last election) is done with a training set. The computer/machine learns from the training data.
A test set is used to evaluate whether the discovered relationships hold and the strength and utility of the predictive relationship is assessed by feeding the model with the input variables of the test data and comparing the label predicted by the model with the actual label of the data.
The decision on proportional split between train data and test data_is often considered tricky. Having a greater proportion of data as _test data ensures a better validation of model performance. Too little training data provides less data for the model to learn from. Opinions on a good split generally range from a 60:40 to 80:20 ratio of train and test data.
The most widely used supervised learning algorithms are Support Vector Machines, Linear Regression, Logistic Regression, Naive Bayes, and Neural Networks (multilayer perceptron).
Deep Learning is a special type of Machine Learning that involves a deeper level of automation. One of the great challenges of Machine Learning is feature extraction where the programmer needs to tell the algorithm what kinds of things it should be looking for, in order to make a decision and just feeding the algorithm with raw data is rarely effective. Feature extraction places a huge burden on the programmer especially in complex problems, such as object recognition. The algorithm’s effectiveness relies heavily on the skill of the programmer. Deep Learning models address this problem as they are capable of learning to focus on the right features by themselves and requires little guidance from the programmer, making the analysis better than what humans can do. Deep Learning models have been very effective in complex tasks, such as sentiment analysis and computer vision. However, Deep Learning algorithms, due to their slow learning process associated with a deep layered hierarchy of learning data abstractions and representations from a lower-level layer to a higher-level layer, are often prohibitively computationally-intensive.
Using Machine Learning requires a variety of technical and engineering skills. Making use of Machine Learning at your company will likely require a team of experts possessing the knowledge and skills in different aspects of data and analytics. The skills range from understanding and having access to the data to be used, knowing how to use data cleansing tools, understanding Machine Learning concepts and algorithms, having experience with analytics tools, programming applications, and setting up the necessary hardware and software to implement and deploy the Machine Learning processing environment.
Here is a view of the common steps for using Machine Learning:
There are many benefits of running Machine Learning and Deep Learning workloads on IBM Power Systems. These workloads can be floating-point compute-intensive and require a lot of memory and I/O bandwidth, and thus can take advantage of graphics processing unit (GPU) acceleration for increased performance. IBM Power server’s larger caches along with its ability to push data to the numerical coprocessor or GPU makes it suitable for running these workloads. The Deep Learning frameworks also provide prebuilt open source options to easily install on POWER8 processor-based server with GPUs. Because IBM Power® platforms are able to converge Big Data and Deep Learning on the same platform, these workloads can directly run close to the Big Data / Hadoop infrastructure on an IBM Power server without using extract, transform, and load (ETL).
There are several options for using Machine Learning and Deep Learning on IBM Power systems.
IBM SPSS on POWER / Linux or AIX
IBM SPSS® supports several Machine Learning algorithms. Use SPSS Modeler to create and test the model along with SPSS Collaboration and Deployment Services to run the model. Refer to the blog that describes how to run and tune SPSS Modeler on IBM POWER8 processor-based servers to achieve superior performance.
SAS on POWER / AIX
SAS supports a broad analytics tooling portfolio on IBM Power Systems. Enterprise Miner supports building and testing models with several Machine Learning algorithms.
For more information, refer: http://www.sas.com
Open source Machine Learning and Deep Learning libraries available on POWER / Linux
Many open source Machine Learning libraries have become popular. Several open source Machine Learning and Deep Learning libraries are available to run on IBM Power Systems including Caffe, Torch, and Theano, and others are coming in the future.
Apache Spark MLlib on POWER / Linux
Apache Spark is a distributed processing environment. One of the key components of Spark is MLlib, which is a Machine Learning library. The library can be used by Spark’s supported programming languages: Java, Scala, Python and SparkR. MLlib supports dozens of algorithms and utilities which can be found in the Spark MLlib guide.
Read the blog by Raj Krishnamurthy and Randy Swanberg about how Apache Spark Runs 2X Faster on IBM POWER8.
The blog summarizes three key Machine Learning workloads (Logistic Regression, Support Vector Machine, and Matrix Factorization) and the recommended configuration to achieve superior performance on IBM POWER8 over x86 processors.
Join IBM Watson Studio to interact and collaborate with other data scientists as you get started using Machine Learning and Deep Learning on IBM Power Systems.
Last month OpenPOWER announced a hackathon called OpenPOWER Developer’s Challenge which is open for submissions through September 1. One of the tracks is on Deep Learning with Apache Spark on OpenPOWER servers (The Accelerated Spark Rally). This is a great chance to try out Deep Learning! To participate, go here for more information and to register.
Back to top