Sandip Chowdhury | Updated May 28, 2014 - Published May 27, 2014
AnalyticsData managementData storesDatabases
This article describes the big data technologies, which are based on Hadoop, that can be implemented to augment existing data warehouses. Traditional data warehouses are built primarily on relational databases that analyze data from the perspective of business processes.
Part 1 of this series describes the current state of the data warehouse, its landscape, technology, and architecture. It identifies the technical and business drivers for moving to big data technologies and identifies use cases for augmenting existing data warehouses by incorporating big data technologies.
As organizations look for the business value that is hidden within non-structured data, they encounter the challenge of how to analyze complex data. Because business decisions are influenced by many factors, analysis models become increasingly complex to take into account many facets.
A traditional IT infrastructure is not able to capture, manage, and process big data within a reasonable time. It cannot accommodate data sets with volumes that range from a few dozen terabytes to many petabytes.
Traditionally, data warehouses analyze structured, transactional data that is contained within relational databases. These warehouses apply key performance indicators and model-driven architecture.
Until recently, the data management landscape that is shown in Figure 1 was simple.
Usually, enterprises analyze the structural data sources that are generated within the organization.
Each layer performs a particular function:
The current BI reference architecture that is shown in Figure 2 is supported by many products:
IBM® Cognos® Business Intelligence: Provides reports, analysis, dashboards, and scoreboards to help support the way people think and work when they are trying to understand business performance.
Changes in demand for data analysis are driving the need to implement technology to handle new requirements. Examples of new demands include:
Organizations built data warehouses to analyze business activity and to produce insights that enable decision makers to act on and improve business performance and operational effectiveness. Despite the maturity of the market, business intelligence (BI) technology remains at the forefront of IT investment. As more data is created, advances in analytical relational database technology improve BI software.
Businesses are driven to adopt big data technology for many reasons:
Decision makers in business organizations can ask themselves the following questions to gauge the need for big data technology:
The situations that are described by these questions can be improved by augmenting the existing data warehouse environment with big data technologies.
For many organizations, Apache Hadoop offers a first step to begin implementing big data analysis. This open source software enables distributed processing of large data sets across clusters of commodity servers.
IBM InfoSphere BigInsights™ combines Apache Hadoop (including the MapReduce framework and the Hadoop Distributed File Systems) with unique, enterprise-ready technologies and capabilities from across IBM, including Big SQL, built-in analytics, visualization, BigSheets, and security. InfoSphere BigInsights is a single platform to manage all of the data. InfoSphere BigInsights offers many benefits:
To implement Hadoop, you need guidance on how to build, configure, administer, and manage production-quality Hadoop clusters at scale (more than 1,000 nodes, potentially). With IBM® PureData™ for Hadoop, an integrated platform for Hadoop implementation, get access to information and resources to help overcome implementation challenges. PureData for Hadoop offers:
To explore and implement a big data project, you can augment existing data warehouse environments by introducing one or more use cases at a time, as the business requires. This approach enables organizations to act with agility, reduce cost of ownership, and provide faster time to market with an increased business value and competitiveness.
Consider applying big data technologies in the following ways:
In the past, inadequate tools and technologies to handle big data forced organizations to build analytics solutions that are based on structured data. Therefore, existing data processing engines and data storage solutions accommodate a low throughput for data, rather than the volume and variety of data that constitutes big data.
Faced with an expanding analytical ecosystem, BI architects need to make many technology choices. Perhaps the most difficult involves selecting a data processing system to power various analytical applications.
With new technologies such as Hadoop, organizations can cost-effectively consume and analyze large volumes of semi-structured data. Big data technology complements traditional, top-down, data-delivery methods with more flexible, bottom-up approaches that promote ad hoc exploration and rapid application development.
Part 2 of this series describes Use case 1: Using big data technologies to build an enterprise landing zone. It also explains how the enterprise can reuse raw data (structured and unstructured) to support ad hoc and real-time analytics.
Back to top