Win $20,000. Help build the future of education. Answer the call. Learn more

IBM Developer Blog

Follow the latest happenings with IBM Developer and stay in the know.

Leverage automation to install Cloudera Data Platform


In this second blog post in our series, we talk about Cloudera Data Platform for IBM Cloud Pak for Data. Much like IBM Cloud Pak for Data, the Cloudera Data Platform is a data and AI platform that can be installed on-premises. In fact, many IBM customers are also Cloudera customers. IBM Cloud Pak for Data is built on Red Hat OpenShift and breaks down silos to enable all of your data users to collaborate from a single, unified interface.

Like most modern platforms, installation is much more than just unzipping a file or clicking a “next” button on a wizard. Luckily, the Cloudera team recently announced it would open source Ansible playbooks that we will leverage to make this whole process easier for our own purposes.

This blog post is intended to share our experience in using Ansible to install Cloudera Data Platform on IBM Cloud. It’s worth mentioning that the automation used is open source and follows the best practices recommended by the Cloudera Professional Services team.

Our environment

We used Virtual Servers on IBM Cloud as the target for our Cloudera Data Platform installation. A total of 8 VMs, each 32 vCPU by 128 GB of RAM running CentOS, were selected. We also had another Windows-based VM to run Active Directory, to best mimic what customers most often use in their environments. And a single bastion node was provisioned to simplify the communication between the user and the hosts. IBM Cloud Pak for Data was also provisioned, but the details of that are out of scope for this post.

List of virtual servers on IBM Cloud Figure 1. List of virtual servers on IBM Cloud

When put together, our environment resembled the architecture diagram below.

Diagram of environment used for integrating Cloudera Data Platform and IBM Cloud Pak for Data Figure 2. An architecture diagram of the environment used for integrating Cloudera Data Platform and IBM Cloud Pak for Data

The Ansible playbooks

As mentioned earlier, to install Cloudera Data Platform on IBM Cloud, we leveraged existing Ansible playbooks that were open sourced.

The installation takes approximately 30-60 minutes to complete, depending on machine specifications. The longest part is when the installer pulls down the necessary artifacts and pushes them to each host.

Cloudera Manager installing Cloudera Data Platform Figure 3. A screenshot of the Cloudera Manager installing Cloudera Data Platform

Next steps

If you’re an IBMer looking to get your hands on Cloudera, or interested in learning more about using Ansible playbooks to install Cloudera, check out the GitHub repo. If you enjoyed this, check out A technical deep-dive on integrating Cloudera Data Platform and IBM Cloud Pak for Data. You can also learn more about the Cloudera Data Platform for IBM Cloud Pak for Data joint offering.