Building your first data warehouse doesn’t have to be as enterprise-y and scary as it sounds, especially in the era of cloud services. A good first iteration of your warehousing environment is a simple architecture that we BI architects like to call an “Operational Data Store” (ODS).

Data warehouse go RAWR!

Old-school DWs can be scary beasts. But don’t worry! Your first DW doesn’t have to be if you land data in its native form.

In this post, I discuss how my job in the new IBM Cloud Data Services (CDS) group led me to design our first data warehouse. SPOILER ALERT: It was an ODS.

Simple Data Pipe Origin Story

It all starts with the Simple Data Pipe project that David Taieb and I designed. “The Pipe” was born out of solving a real-world business reporting problem for the new IBM CDS group. We needed to consolidate a number of data sources into a central database and hook up Looker business intelligence visualizations to it.

At the top of my list was Salesforce.com data. Of all the things business managers want reports on, CRM data is the obvious place to start. So I sketched some architectures. On a flight home from a conference one night, David and I outlined a piece of software we called the Simple Data Pipe, which would land our Salesforce data in IBM’s cloud data warehouse service dashDB, where we could view it alongside data from other sales and billing systems. It has since become the one-click solution for data movement in our Enterprise Data Warehouse environment for CDS.

Feed the Beast Using ETL

What we needed was a simple ETL tool to pump data from operational data sources into an operational data store (hence the “O” in ODS). If you don’t know, ETL stands for “Extract, Transform and Load,” and it’s the process that forms the “plumbing” of a data warehouse system.

In the case of some old-school enterprise data warehouses, the “Transform source format” step can get pretty beastly. That’s where the enterprise ETL software vendors make their money: Informatica, Oracle, Pentaho, and, of course, IBM ;-)

In the case of building an ODS, you don’t need all that. Your ETL process is simplified to just “EL.” Since the goal of an ODS is to consolidate data while preserving the format of the operational source, you only need a tool that “Extracts from source” and “Loads to warehouse.” This is what the Simple Data Pipe does. It’s an “EL” tool that works well for cloud operational data sources.

We have made one connector thus far for Salesforce. We have connectors for Stripe, SugarCRM, and Amazon Redshift under development. Of course, you could also fork our pipes GitHub repo and start working on your own. Contributions are most welcome!

Data Consolidation Is a Common Pain

Business reporting is the reason the practice of data warehousing exists in the first place. Without a warehouse, someone in your organization, maybe even you, is going to ten different systems, grabbing different exports and metrics, and consolidating that data on some giant Excel spreadsheet. They’re spending at least one day a month doing it.

What management really wants is a dashboard that sees across all your operational systems for a holistic view of their business. In fact they’re probably asking for this view, wistfully looking out into the void of all the disparate operational systems they know their business depends on. This situation is exactly what I encountered upon joining IBM CDS.

An Operational Data Store Solves Data Consolidation Headaches

In order to collect data from disparate systems, you need to land them in a common data store. Then you connect your analytics platform of choice to this new data store where you’re consolidating information, which now becomes an official ODS. It is the most basic and most achievable version of data warehouse architecture.

High-level ODS.

A closer look at our architecture shows that a proper ODS sources tables from your operational systems and creates its own copies of this data. The simplest first approach is to take a fresh copy every night.

Sourcing tables to the ODS.

Now with all the tables landed next to each other in the ODS, you can produce the business intelligence reports your management has been dreaming of. We used Looker in our Simple Data Pipe tutorial, but you can use the BI software of your choice: Tableau, Cognos, Qlik — whatever!

Coming Soon: More Connectors

Check back here soon for an update on our progress creating data connectors for Stripe, SugarCRM, and Amazon Redshift. If all goes as planned, we’ll provide an overview of the implementation and share a code template that will help you design your own connectors for Simple Data Pipe. The more operational data sources, the better!

Join The Discussion

Your email address will not be published. Required fields are marked *