Task

This article will walk you through the steps required to setup Ambari Hadoop Cluster to use a single PostgreSQL database instance.

Introduction

While setting up a Hadoop cluster using Ambari for IOP 4.x (BigInsights 4.x), different services namely Ambari, Oozie and Hive require databases to persist their data. By default, Ambari uses PostgreSQL, Oozie uses Derby and Hive uses MySQL. At an enterprise level implementation these default settings demands the customer to maintain three different databases. This can be eliminated by using one single database for all the services. In this article we will use PostgreSQL as a single database (single instance) to store information for all the three services.

Test-Environment

The environment used for this article are 4 servers with Redhat. Installation has not been done yet since the services will be setup with PostgreSQL during installation. The environment after installation will have 3 management nodes and 1 datanode. All the servers are running Redhat 7.

Using PostgreSQL with Ambari

By default, when Ambari-server is installed, it installs postgres database if its already not installed on the server. Since we start with installing Ambari-server as the first step, we are going to proceed with using this instance of postgres for other services in the cluster.

Setting up Postgres for Oozie & Hive Pre-requisite

To use Postgres that is being used by Ambari-server for Oozie/Hive service, you need to install the jdbc driver package on the linux machine. This machine will be the node where postgres is installed for Ambari-server so that other services can connect to postgres using a jdbc connection.

Install PostgreSQL Jdbc Driver package

To install the jdbc driver package, please run the following command on the postgres installed node:


yum install postgresql-jdbc

Setup Ambari-server to Use PostgreSQLJDBC Driver

After the postgresql-jdbc package is installed, the next step is to set the environment variable to this location so that when Oozie tries to connect with Postgres via Ambari-installation wizard, it is able to find this driver.
To do so, please run the following command:


ambari-server setup –jdbc-db=postgres –jdbc-driver=/path/to/jdbc/driver.jar

Add Oozie and Hive database connection Authorization in PostgreSQL

The Oozie services uses Oozie named database to persist its metadata and similarly hive uses Hive named database to persist its metadata. We will see in detail how to create these databases. But before that we need to add authorization for these databases in Postgres configuration file. This step will allow a remote server to connect with Oozie/Hive database in Postgres server respectively. To do so, please locate the pg_hba.conf file on your Postgres server. Once you find this file, please add following lines in the file to allow the server (which will host oozie/hive service) to connect with Oozie/Hive database on Postgres server.
Please see the screenshot for better understanding.

PosgreSQL conf file
PostgreSQL conf file

Note: If you do not perform above step correctly, you will receive an error in Ambari while trying to use Postgres for either of the services as shown below


ERROR: Unable to connect to the DB. Please check DB connection properties.
org.postgresql.util.PSQLException: FATAL: no pg_hba.conf entry for host “xxx.xxx.xxx.xxx”, user “hive”, database “hive”, SSL off

Restart Ambari-Server

After the above mentioned steps,
– Stop Ambari Server
– Stop Postgres
– Start Postgres
– Start Ambari server
We need to restart Postgres so that our changes can take effect.

Setup Oozie To Use PostgreSQL

Create Oozie Database and Grant privileges on it to User Oozie

After setting up the ambari-server to user the jdbc driver for connecting to Postgres, the next step is to create a database ‘Oozie’. This needs to be done since you are not installed Derby and using an existing database.
Next step is to create a role Oozie (user) with a password. This user/role will access the Oozie database.
Last step is to Grant all privileges to user Oozie on the database Oozie.
Please see the commands run to achieve these steps below:


[root@STAMPEDE01A ~]# su – postgres
Last login: Tue Oct 27 20:56:25 CDT 2015
-bash-4.2$ psql
psql (9.2.13)
Type “help” for help.
postgres=# CREATE DATABASE oozie;
CREATE DATABASE
postgres=# CREATE USER oozie with PASSWORD ‘bigdata’;
CREATE ROLE
postgres=# GRANT ALL PRIVILEGES ON DATABASE oozie to oozie;
GRANT

Once this has been achieved, you need to use the same user/password in Ambari and test your connection for Oozie service. See the screenshot below for better understanding.
oozie-jdbc
This confirms that Postgres is reachable and can be configured correctly for Oozie service.

Setup Hive to Use Postgres

Create Hive Database and Grant all privileges to User Hive

As show below, create hive database, hive user and grant all privileges on hive database to user hive.


[root@STAMPEDE01A ~]# su – postgres
Last login: Tue Oct 27 20:56:25 CDT 2015
-bash-4.2$ psql
postgres=# CREATE DATABASE HIVE;
CREATE DATABASE
postgres=# CREATE USER HIVE with PASSWORD ‘bigdata’;
CREATE ROLE
postgres=# GRANT ALL PRIVILEGES ON DATABASE hive to hive;
GRANT

Once the above steps are completed successfully, you need to use the same credentials in Ambari-Hive setup and test the connection. Please see the screenshot below:
hive_jdbc

Conclusion

In this article, I have successfully demonstrated that how PostgreSQL can be for Ambari, Oozie and Hive. This requires the customer/Hadoop admin to maintain just one Database service compared to default setup of 3 different databases.

4 comments on"Configure an Ambari Hadoop cluster to use a single PostgreSQL database instance"

  1. KaliyugAntagonist June 13, 2016

    Hi Vinayak,

    This post written by you was exactly what I needed – worked like a breeze and saved time that I would have invested on trying out the steps myself

    Thanks and regards !

  2. […] use of single instance of either RDBMS to be used by the entire cluster for metadata management.–Use PostgreSQL instance –Use MySQL […]

  3. Hi, Vinayak

    This article was pretty much useful to me in order to simulate a production environment. Thank you for sharing!
    For the ones who are working with RHEL 7.x, there are some changes on how to restart PostgreSQL:
    systemctl restart postgresql.service
    All the rest works smoothly =)

    Best regards,
    Ana Paula

Join The Discussion

Your email address will not be published. Required fields are marked *