This article originally appeared on IBM Data Magazine.
Connecting everything to the Internet—the Internet of Everything—brings an interesting problem to the forefront: data onslaught. Examples of data onslaught in the new digital economy includes the 2.5 quintillion bytes of new data collected every single day (and it is expected to increase three times by 2017), or the 2.5 PB of data collected by a major retailer every hour or the fact that by 2015, 1 trillion devices are expected to be connected to the Internet and generate data for consumption.
A key point that almost every organization seems to miss in the data economy is that just because they are collecting so much data doesn’t mean they are collecting the right data, or even enough data. They may be either collecting very little of something very important or not collecting the right data at all. Even more appalling are situations in which organizations collect huge amounts of data and do absolutely nothing with it. People often make the mistake of connecting value with voluminous data.
A major reason why these situations with data are happening is because the digital society has figured out how to effectively collect massive amounts of data—from the Internet of Things, devices, sensors, and so on—and efficiently process and store it (such as big data.) However, many organizations are still trying to figure out how to get meaningful insights from that immense amount of data.
[Image courtesy: Piccolo]
As they move on to build data-driven enterprises, insights into collected data are a must. To effectively predict the future, based on past events, organizations need to undertake deep learning exercise of their data and sequence of events. If they don’t, they will end up in a garbage-in, garbage-out (GIGO) situation.
Scaling to the data onslaught
Essentially, this GIGO situation means organizations need to analyze the data when it is collected—stream analysis—and at a later time—data mining—to identify patterns based on circumstances, or pattern recognition. They also need a solution that can scale to the data onslaught. This point is where many in-house business intelligence (BI) systems tend to fail. They are not only unable to scale to this volume of data, but they also cannot do dynamic analysis of stream data. If an organization waits for 24 hours or a week to analyze its data, it will be way too late in the digital economy.
A related issue is that the data is often too disparate, and many BI solutions look into a narrow scope to analyze it. These solutions cannot see the big picture, and they often cannot handle machine data that is too diversified, time sequenced, or in multiple machine formats.
Cognitive analytics alternatives such as IBM Watson™ platform and/or machine-learning solutions such as BigML can help organizations face these challenges. They are complementary solutions that can be deployed to help solve the big puzzle that data onslaught creates.
One problem with machine data or sensor data is that it is normally difficult to interpret, because it is highly cryptic and very voluminous; plus, putting the machine data out of sequence can create undesirable results. The lack of industry standards only exacerbates the situation, and this scenario makes it difficult for humans and for existing systems to make any sense of such data. However, when it comes to machine learning, if a machine can produce it, then a machine can learn from it, analyze it, and get insights out of it. This situation is where human data scientists tend to fail.
Matchmaking based on action and data insights
I want to present an interesting usecase that I listened to recently from the GigaOm Structure 2014 conference. Listening to Match.com’s SVP of analytics speak about how they use analytics and machine learning for matching people was interesting. They have about 15 years of very sensitive and deep personal data, with millions of samples to choose from.
First, they tried to apply basic psychology for matches—such as the concept of human behavior presented in John Gray’s book, Men Are from Mars, Women Are from Venus (HarperCollins, 1992). Then they tried the concept of Pavlov’s classical conditioning and other concepts with varying success. But, in order to create more successful match-ups, they went back and re-analyzed successful relationships and conducted a deep-learning exercise to find out the ultimate success formula.
[Image courtesy: TimoElliott.com]
Specifically, Match.com figured out that building models based on what people say—their wants—is not enough. In other words, it determined that users’ actions and their actual needs are totally different from their wants. Ultimately, Match.com was able to predict user behavior based on users’ actions on its Match.com website with an enhanced success rate.
For example, people often rated the income criterion as their number-one choice in a perfect match. However, when it came down to selecting a partner, many accepted someone who made far less money than desired—essentially accepting the fact that they were not gold diggers, but they were there to find that perfect match. However, even though they indicated they were looking for nonsmokers as the last criterion when it came to picking a partner, people drew a line in the sand for that parameter. They rejected an otherwise would-be perfect match every single time, if that match was a smoker.
After Match.com adjusted its algorithms based on its customers’ actions instead of basing them on their profile listings for a perfect match, predictiveness of its algorithms increased two to three times. This result is an incredibly powerful testament to the strength of machine learning. It shows that when it comes to matchmaking, machine learning from historical data can be more successful in choosing a match than trained relationship counselors, PhDs, or even one’s own “preferences.”
Now consider bringing this premise to the business world. If an organization applies machine learning to what its customers say alone, the organization may not get the perfect results. However, applying machine learning to customers’ actions—rather than just their spoken words—can successfully predict their business future and their customer needs.
Visualizing successful outcomes
Thomas J. Watson Sr. of IBM once said, “Analyze the past, consider the present, and visualize the future.” Approximately one century later, the digital world is now in a situation to do exactly that. Organizations can mine past events from big data, dynamically update predictive models with the stream of current and live data, and be able to foresee and predict the future.
These certainly are good times!
To help you with the digitization of your enterprise and to engage in the digital economy, IBM provides necessary tools such as IBM Watson Predictive APIs, and intelligent platforms such as IBM API management to design, create, assemble, manage, publish, secure and monetize your enterprise grade APIs. Reach out to me at @AndyThurai to continue this conversation.