Building your first data warehouse doesnâ€™t have to be as enterprise-y and scary as it sounds, especially in the era of cloud services. A good first iteration of your warehousing environment is a simple architecture that we BI architects like to call an â€śOperational Data Storeâ€ť (ODS).
In this post, I discuss how my job in the new IBM Cloud Data Services (CDS) group led me to design our first data warehouse. SPOILER ALERT: It was an ODS.
Simple Data Pipe Origin Story
It all starts with the Simple Data Pipe project that David Taieb and I designed. â€śThe Pipeâ€ť was born out of solving a real-world business reporting problem for the new IBM CDS group. We needed to consolidate a number of data sources into a central database and hook up Looker business intelligence visualizations to it.
At the top of my list was Salesforce.com data. Of all the things business managers want reports on, CRM data is the obvious place to start. So I sketched some architectures. On a flight home from a conference one night, David and I outlined a piece of software we called the Simple Data Pipe, which would land our Salesforce data in IBMâ€™s cloud data warehouse service dashDB, where we could view it alongside data from other sales and billing systems. It has since become the one-click solution for data movement in our Enterprise Data Warehouse environment for CDS.
Feed the Beast Using ETL
What we needed was a simple ETL tool to pump data from operational data sources into an operational data store (hence the â€śOâ€ť in ODS). If you donâ€™t know, ETL stands for â€śExtract, Transform and Load,â€ť and itâ€™s the process that forms the â€śplumbingâ€ť of a data warehouse system.
In the case of some old-school enterprise data warehouses, the â€śTransform source formatâ€ť step can get pretty beastly. Thatâ€™s where the enterprise ETL software vendors make their money: Informatica, Oracle, Pentaho, and, of course, IBM ;-)
In the case of building an ODS, you donâ€™t need all that. Your ETL process is simplified to just â€śEL.â€ť Since the goal of an ODS is to consolidate data while preserving the format of the operational source, you only need a tool that â€śExtracts from sourceâ€ť and â€śLoads to warehouse.â€ť This is what the Simple Data Pipe does. Itâ€™s an â€śELâ€ť tool that works well for cloud operational data sources.
We have made one connector thus far for Salesforce. We have connectors for Stripe, SugarCRM, and Amazon Redshift under development. Of course, you could also fork our pipes GitHub repo and start working on your own. Contributions are most welcome!
Data Consolidation Is a Common Pain
Business reporting is the reason the practice of data warehousing exists in the first place. Without a warehouse, someone in your organization, maybe even you, is going to ten different systems, grabbing different exports and metrics, and consolidating that data on some giant Excel spreadsheet. Theyâ€™re spending at least one day a month doing it.
What management really wants is a dashboard that sees across all your operational systems for a holistic view of their business. In fact theyâ€™re probably asking for this view, wistfully looking out into the void of all the disparate operational systems they know their business depends on. This situation is exactly what I encountered upon joining IBM CDS.
An Operational Data Store Solves Data Consolidation Headaches
In order to collect data from disparate systems, you need to land them in a common data store. Then you connect your analytics platform of choice to this new data store where youâ€™re consolidating information, which now becomes an official ODS. It is the most basic and most achievable version of data warehouse architecture.
A closer look at our architecture shows that a proper ODS sources tables from your operational systems and creates its own copies of this data. The simplest first approach is to take a fresh copy every night.
Now with all the tables landed next to each other in the ODS, you can produce the business intelligence reports your management has been dreaming of. We used Looker in our Simple Data Pipe tutorial, but you can use the BI software of your choice: Tableau, Cognos, Qlik — whatever!
Coming Soon: More Connectors
Check back here soon for an update on our progress creating data connectors for Stripe, SugarCRM, and Amazon Redshift. If all goes as planned, weâ€™ll provide an overview of the implementation and share a code template that will help you design your own connectors for Simple Data Pipe. The more operational data sources, the better!