We’re giving away 1,500 more DJI Tello drones. Enter to win ›
by Divakar Mysore, Shrikant Khupat, Shweta Jain | Updated November 7, 2013 - Published October 7, 2013
Before making the decision to invest in a big data solution, evaluate the data available for analysis; the insight that might be gained from analyzing it; and the resources available to define, design, create, and deploy a big data platform. Asking the right questions is a good place to start. Use the questions in this article to guide your investigation. The answers will begin to reveal more about the characteristics of the data and the problem you’re trying to solve.
Although organizations generally have a vague understanding of the type of data that needs to be analyzed, it’s quite possible that the specifics are not as clear. After all, the data might hold keys to patterns that have not been noticed before, and once a pattern is recognized, the need for additional analysis becomes obvious. To help uncover these unknown unknowns, start by implementing a few basic use cases, and in the process, collect and gather data that was not previously available. As the data repository is built and more data is collected, a data scientist is better able to determine the key data and better able to build predictive and statistical models that will generate more insight.
It may also be the case that the organization already knows what it does not know. To address these known unknowns, the organization must start by working with a data scientist to identify the external or third-party data sources and to implement a few use cases that rely on this external data.
This article first tries to answer some of the questions typically raised by most CIOs prior to taking up a big data initiative, then focuses on a dimensions-based approach that will help in assessing the viability of a big data solution for an organization.
For the most part, organizations choose to implement a big data solution incrementally. Not every analytical and reporting requirement requires a big data solution. For projects that perform parallel processing on a large dataset or ad-hoc reporting from multiple data sources, a big data solution may not be necessary.
With the advent of big data technologies, organizations are asking themselves: “Is big data the right solution to my business problem, or does it provide me with a business opportunity? Are business opportunities hiding in the big data? Here are some of the typical questions we hear from CIOs:
To answer these questions, this article proposes a structured approach for evaluating the viability of a big data solution according to the dimensions shown in the following figure.
For each dimension, we include key questions. Assign a weight and priority for each dimension, according to the business context. The assessment will vary by business case and by organization. Consider working through these questions in a series of workshops with the relevant business and IT stakeholders.
Many organizations wonder if the business insights they are seeking can be addressed by a big data solution. There are no definitive guidelines that define the insights that can be derived from big data. The scenarios need to be identified by the organization and they evolve over time. A data scientist is key to determining and identifying the business use cases and scenarios that, if implemented, will bring significant value to the business.
The data scientist must be able to understand the key performance indicators and apply statistical and complex algorithms to the data to get a list of use cases. The use cases vary by industry and business. It’s helpful to study the market for what competitors are doing, which market forces are at work, and primarily, what customers are looking for. The following examples show use cases from various industries.
E-retailers like eBay are constantly creating target offers to boost customer lifetime value (CLV); deliver consistent cross-channel customer experiences; harvest customer leads from sales, marketing, and other sources; and continuously optimize back-end processes.
Fraud management helps improve customer profitability by predicting the likelihood that a given transaction or customer account is experiencing fraud. Solutions analyze transactions in real time and generate recommendations for immediate action, which is critical to stopping third-party fraud, as well as first-party fraud and deliberate misuse of account privileges. Solutions are typically designed to detect and prevent a wide variety of fraud and risk types across multiple industries, including:
Web and digital media Much of the data we currently work with is the direct consequence of increased social media and digital marketing. Customers generate a trail of “data exhaust” that can be mined and put to use.
Utilities run big, expensive, complicated systems to generate power. Each grid includes sophisticated sensors that monitor voltage, current, frequency and other important operating characteristics. Efficiency means paying careful attention to all of the data streaming from the sensors.
Utilities are now leveraging Hadoop clusters to analyze power generation (supply) and power consumption (demand) data via smart meters.
The adoption of smart meters has resulted in a deluge of data flowing at unprecedented levels. Most utilities are ill-prepared to analyze the data once the meters are turned on.
In the cable industry, big data can be used to analyze set-top box data on a daily basis by large cable operators such as Time Warner, Comcast, and Cox Communications. This data can be leveraged to adjust advertising or promotional activity.
Potential customers are generating huge amounts of new data on social networks and review sites. Within the enterprise, transactional data and web logs are growing as customers switch to online channels to conduct business and interact with companies.
Start by creating an inventory of the data that exists within the enterprise. Identify data that exists in the internal systems and applications and data coming in from third parties. If a business problem can be solved with existing data, data from external sources might not be required.
Consider the cost of building a big data solution and weigh it against the value of the new insight to the business.
When this new data is analyzed in the context of the archived data about existing customers, businesses gain insight into new business opportunities.
Big data can offer a viable solution if:
When evaluating the business value to be gained by a big data solution, consider your whether your current environment can be expanded and weigh the cost of this investment.
Ask the following questions to determine if you can augment the existing data warehouse platform?
If the answer to any of these questions is yes, explore ways to augment the existing data warehouse environment.
The cost and feasibility of extending an existing data warehouse platform or IT environment vs. implementing a big data solution depends on:
It also depends on the volume of data that will be gathered and collected from new data sources, the complexity of business use cases, the analytical complexity of processing, and how expensive it is to get the data and people with the right skill set. Can the existing pool of resources develop new big data skills or can the resources with niche skills be hired externally?
Keep in mind that the effect of a big data initiative on other projects under way. Acquiring data from new sources is costly. It’s important to first identify any data that exists internally in the systems and applications and in third-party data being received currently. If a business problem can be solved with existing data, data from external sources may not be required.
Assess the application portfolio of the organization before procuring new tools and applications. For example, a plain vanilla Hadoop platform may not be sufficient for the requirements, and it may be necessary to buy specialized tools. Or in contrast, a commercial version of Hadoop may be expensive for the current use case, but may be needed as a long-term investment to support a strategic big data platform. Consider the cost of the infrastructure, hardware, software, and maintenance required by for big data tools and technologies.
When deciding whether to implement a big data platform, an organization might be looking at new data sources and new types of data elements where the ownership of the day is not clearly defined. Certain industry regulations govern the data that is acquired and used by an organization. For example, in the case of healthcare, is it legitimate to access patient data to derive insight from the data? Similar rules govern all industries. In addition to issues of IT governance, business processes of an organization may also need to be redefined or modified to enable the organization to acquire, store, and access external data.
Consider the following governance-related issues in the context of your situation:
A big data solution can be incrementally implemented. It’s helpful to clearly define the scope of the business problem and to set, in measurable terms, the expected business revenue gain.
For the foundational business case, take care in outlining the scope of the problem and projected benefits from the solution. If the scope is too small, the business benefits will not be realised, and if it’s too large, it will be challenging to get the funding and complete the project inside an appropriate timeframe. Define the core functions in the first iteration of the project, so that it’s easy to win the confidence of stakeholders.
Specific skills are required to understand and analyze the requirements and maintain the big data solution. These skills include industry knowledge, domain expertise, and technical knowledge on big data tools and technologies. Data scientists with expertise in modeling, statistics, analytics, and math are key to the success of any big data initiative.
Before undertaking a new big data project, make sure the right people are on board:
All organizations have quite a lot of data not being harnessed for business insight. Pockets include log files, errors files, and operational data from applications. Don’t overlook this data as a potential source of valuable information.
Look for hints that the complexity of data has increased, especially with regard to volume, variety, velocity, and veracity.
You may want to consider a big data solution if:
The variety of data might demand a big data solution if:
Consider whether your data:
Consider a big data solution if:
A big data solution might be appropriate if there is reasonable complexity in the volume, variety, velocity, or veracity of the data. For more complex data, assess any risks associated with implementing a big data solution. For less complex data, traditional solutions should be assessed.
Not all big data situations require a big data solution. Look for hints in the market. What are competitors doing? What market forces are at work? What are the customers demanding?
Use the questions in this article to help you determine whether a big data solution is appropriate for your business situation and for the business insight you need. If you’ve decided it’s time to embark on a big data project, watch for the next article on defining a logical architecture and determining the key components required for your big data solution.
May 20, 2019
Get started with Machine Learning and AI in this three part series.
Back to top