We’re giving away 1,500 more DJI Tello drones. Enter to win ›
by Divakar Mysore, Shrikant Khupat, Shweta Jain | Updated October 14, 2013 - Published October 15, 2013
Part 2 of this “Big data architecture and patterns” series describes a dimensions-based approach for assessing the viability of a big data solution. If you have already explored your own situation using the questions and pointers in the previous article and you’ve decided it’s time to build a new (or update an existing) big data solution, the next step is to identify the components required for defining a big data solution for the project.
Logical layers offer a way to organize your components. The layers simply provide an approach to organizing components that perform specific functions. The layers are merely logical; they do not imply that the functions that support each layer are run on separate machines or separate processes. A big data solution typically comprises these logical layers:
Big data sources: Think in terms of all of the data available for analysis, coming in from all channels. Ask the data scientists in your organization to clarify what data is required to perform the kind of analyses you need. The data will vary in format and origin:
Format— Structured, semi-structured, or unstructured.
Velocity and volume— The speed that data arrives and the rate at which it’s delivered varies according to data source.
Collection point— Where the data is collected, directly or through data providers, in real time or in batch mode. The data can come from a primary source, such as weather conditions, or it can come from a secondary source, such as a media-sponsored weather channel.
Location of data source— Data sources can be inside the enterprise or external. Identify the data to which you have limited-access, since access to data affects the scope of data available for analysis.
Each layer includes several types of components, as illustrated below.
This layer includes all the data sources necessary to provide the insight required to solve the business problem. The data is structured, semi-structured, and unstructured, and it comes from many sources:
Because incoming data characteristics can vary, components in the data massaging and store layer must be capable of reading data at various frequencies, formats, sizes, and on various communication channels:
This is the layer where business insight is extracted from the data:
This layer consumes the business insight derived from the analytics applications. The outcome of the analysis is consumed by various users within the organization and by entities external to the organization, such as customers, vendors, partners, and suppliers. This insight can be used to target customers for product offers. For example, with the business insight gained from analysis, a company can use customer preference data and location awareness to deliver personalized offers to customers as they walk down the aisle or pass by the store.
The insight can also be used to detect fraud by intercepting transactions in real time and correlating them with the view that has been built using the data already stored in the enterprise. A customer can be notified of a possible fraud while the fraudulent transaction is happening, so corrective actions can be taken immediately.
In addition, business processes can be triggered based on the analysis done in the data massaging layer. Automated steps can be launched — for example, the process to create a new order if the customer has accepted an offer can be triggered automatically, or the process to block the use of a credit card can be triggered if a customer has reported fraud.
The output of analysis can also be consumed by a recommendation engine that can match customers with the products they like. The recommendation engine analyzes available information and provides personalized and real-time recommendations.
The consumption layer also provides internal users the ability to understand, find, and navigate federated data within and outside the enterprise. For the internal consumers, the ability to build reports and dashboards for business users enables the stakeholders to make informed decisions and to design appropriate strategies. To improve operational effectiveness, real-time business alerts can be generated from the data and operational key performance indicators can be monitored:
Aspects that affect all of the components of the logical layers (big data sources, data massaging and storage, analysis, and consumption) are covered by the vertical layers:
Big data applications acquire data from various data origins, providers, and data sources and are stored in data storage systems such as HDFS, NoSQL, and MongoDB. This vertical layer is used by various components (data acquisition, data digest, model management, and transaction interceptor, for example) and is responsible for connecting to various data sources. Integrating information across data sources with varying characteristics (protocols and connectivity, for example) requires quality connectors and adapters. Accelerators are available to connect to most of the known and widely used sources. These include social media adapters and weather data adapters. This layer can also be used by components to store information in big data stores and to retrieve information from big data stores for processing. Most of the big data stores have services and APIs available to store and retrieve the information.
Data governance is about defining guidelines that help enterprises make the right decisions about the data. Big data governance helps in dealing with the complexities, volume, and variety of data that is within the enterprise or is coming in from external sources. Strong guidelines and processes are required to monitor, structure, store, and secure the data from the time it enters the enterprise, gets processed, stored, analyzed, and purged or archived.
In addition to normal data governance considerations, governance for big data includes additional factors:
This layer is responsible for defining data quality, policies around privacy and security, frequency of data, size per fetch, and data filters:
Systems management is critical for big data because it involves many systems across clusters and boundaries of the enterprise. Monitoring the health of the overall big data ecosystem includes:
For developers, layers offer a way to categorize the functions that must be performed by a big data solution, and suggest an organization for the code that must address these functions. For business users wanting to derive insight from big data, however, it’s often helpful to think in terms of big data requirements and scope. Atomic patterns, which address the mechanisms for accessing, processing, storing, and consuming big data, give business users a way to address requirements and scope. The next article introduces atomic patterns for this purpose.
Get started with Machine Learning and AI in this three part series.
April 23, 2019
Back to top