Overview

Ambari based Hadoop distributions such as IBM Open Platform with Apache Hadoop (IOP) and Hortonworks Data Platform (HDP) will configure multiple database servers to manage the metadata for the different components by default, .  In IOP 4.0 and 4.1, by default the Ambari Server is configured to use PostgreSQL, the Hive Metastore service is configured to use MySQL, and the Oozie service is configured to use Derby.  Having multiple and heterogeneous database servers in a cluster makes it difficult for an administrator to perform tasks such as performance tuning, back-up and restore due to differences in admin commands and configurations.

An IOP or HDP cluster can be configured to only a single database server instance for all its components requiring databases.  Managing a single database instance is much simpler, and can also facilitate delegation of administrative tasks to a centralized database administration team in the company.   In this blog, I will present a step-by-step guide on how to configure an IOP or HDP cluster to use a single database instance.

Setting up the database server instance

To setup the Ambari Hadoop cluster to use a single database instance,  we need to settle on a database server.   There are two common open source database choices that support by all the components:  MySQL and PostgreSQL.  In this blog, we will choose MySQL as the common database server to use for the cluster.

Install and configure MySQL server instance

First, you need to choose the node that you want the MySQL instance to run on.  This node can be part of the Hadoop cluster or not, but it will need to be accessible by the Hadoop cluster, specifically the nodes running the Ambari-server, the Hive Metastore and the Oozie services.

  • As root or a sudo user, log into the node planned for running MySQL instance
  • Install MySQL server binary if not already installed.¬† We will assume that the cluster is being installed inRHEL 6 environment:
    [root@host]# yum install mysql-server
  • Start the MySQL instance service:
    [root@host]# /etc/init.d/mysqld start
  • Set the password for the local root user to secure the MySQL Server instance:
    [root@host]# mysqladmin -u root password $rootpwd

By default, MySQL will be listening on port 3306, but it can be configured to listen on another port.  Make sure the configured port is open in your firewall.

Setting up the databases

Create databases:

Since we plan to set up the Ambari Server, the Hive Metastore service and the Oozie service to use the newly created MySQL instance,  we will create a separate database for each.  Alternatively, you can use a single database for all the services, but this is not recommended as it would add more complexity in term of service authorization isolation and potential database object naming conflicts.

  • Create databases forAmbari, Hive andOozie on the same node that MySQL was installed:

    [root@host]# mysqladmin -u root -p$rootpwd create ambaridb
    [root@host]# mysqladmin -u root -p$rootpwd create hivedb
    [root@host]# mysqladmin -u root -p$rootpwd create ooziedb

Configure users and permission:

After creating the databases, we will need create the users for accessing these databases by the components.  We will also need to grant the proper permissions so that the services can perform CRUD operations against their respective databases.   You will need to go into MySQL command shell to do this.

  • Invoke the MySQL command shell from the node where MySQL was installed:

    [root@host]# mysql-u root -p$rootpwd

  • Create database owner forambaridb and grant permission to the database:

    mysql> USE ambaridb;
    mysql> CREATE USER 'ambari'@'localhost' IDENTIFIED BY 'ambari_password';
    mysql> GRANT ALL PRIVILEGES ON *.* TO 'ambari'@'localhost';
    mysql> CREATE USER 'ambari'@'%' IDENTIFIED BY 'ambari_password';
    mysql> GRANT ALL PRIVILEGES ON *.* TO 'ambari'@'%';
    mysql> FLUSH PRIVILEGES;

  • Create database owner forhivedb and grant permission to the database:

    mysql> USE hivedb;
    mysql> CREATE USER 'hive'@'localhost' IDENTIFIED BY 'hive_password';
    mysql> GRANT ALL PRIVILEGES ON *.* TO 'hive'@'localhost';
    mysql> CREATE USER 'hive'@'%' IDENTIFIED BY 'hive_password';
    mysql> GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%';
    mysql> FLUSH PRIVILEGES;

  • Create database owner forooziedb and grant permission to the database:

    mysql> USE ooziedb;
    mysql> CREATE USER 'oozie'@'localhost' IDENTIFIED BY 'oozie_password';
    mysql> GRANT ALL PRIVILEGES ON *.* TO 'oozie'@'localhost';
    mysql> CREATE USER 'oozie'@'%' IDENTIFIED BY 'oozie_password';
    mysql> GRANT ALL PRIVILEGES ON *.* TO 'oozie'@'%';
    mysql> FLUSH PRIVILEGES;

  • Exit out of MySQL command shell:

    mysql> EXIT;

Now, all the databases, users are created and proper permissions are granted to the users.  Next, we need to configure the different components and services to use the databases.

Setting up the components

Setup Ambari server to use ambaridb:

Before configuring any of the components to use MySQL, the MySQL JDBC driver will need to be installed on the Ambari Server node for later deployment to the cluster.

[root@host]# yum install mysql-connector-java*

Setting up Ambari server to use the MySQL database is done during the setup of Ambari-server after its installation.    The following output shows the steps to configure Ambari server to use the ambaridb database created in MySQL previously.  The blue text highlight the input required for this step:

[root@host]# ambari-server setup
 Using python /usr/bin/python2.6
 Setup ambari-server
 Checking SELinux...
 SELinux status is 'disabled'
 Customize user account for ambari-server daemon [y/n] (n)?
 Adjusting ambari-server permissions and ownership...
 Checking firewall status...
 WARNING: iptables is running. Confirm the necessary Ambari ports are accessible. Refer to the Ambari documentation for more details on ports.
 OK to continue [y/n] (y)?
 Checking JDK...
 [1] OpenJDK 1.8.0
 [2] OpenJDK 1.7.0 (deprecated)
 [3] Custom JDK
 ==============================================================================
 Enter choice (1):
 Downloading JDK from http://birepo-build.svl.ibm.com/repos/IOP-UTILS/RHEL6/x86_64/1.1/openjdk/jdk-1.8.0.tar.gz to /var/lib/ambari-server/resources/jdk-1.8.0.tar.gz
 jdk-1.8.0.tar.gz... 100% (56.5 MB of 56.5 MB)
 Successfully downloaded JDK distribution to /var/lib/ambari-server/resources/jdk-1.8.0.tar.gz
 Installing JDK to /usr/jdk64/
 Successfully installed JDK to /usr/jdk64/
 Completing setup...
 Configuring database...
 Enter advanced database configuration [y/n] (n)? y
 Configuring database...
 ==============================================================================
 Choose one of the following options:
 [1] - PostgreSQL (Embedded)
 [2] - Oracle
 [3] - MySQL
 [4] - PostgreSQL
 ==============================================================================
 Enter choice (1): 3
 Hostname (localhost): host
 Port (3306): 3306
 Database name (ambari): ambaridb
 Username (ambari): ambari
 Enter Database Password (bigdata): 
 Re-enter password: 
 Configuring ambari database...
 Copying JDBC drivers to server resources...
 Configuring remote database connection properties...
 WARNING: Before starting Ambari Server, you must run the following DDL against the database to create the schema: /var/lib/ambari-server/resources/Ambari-DDL-MySQL-CREATE.sql
 Proceed with configuring remote database connection properties [y/n] (y)? y
 Extracting system views...
 ....ambari-admin-2.1.0_IBM_5.jar
 .
 Adjusting ambari-server permissions and ownership...
 Ambari Server 'setup' completed successfully.
 [root@host]#

Next,  we need to create the schema for Ambari Server in the ambaridb database  and start ambari-server.  If your MySQL host is different from the node running Ambari server,  you can use -h option to supply the host, and -P option to supply the port.  Review MySQL documentation for more details: Connecting to MySQL

[root@host]# mysql -u ambari -pambari_password ambaridb <  /var/lib/ambari-server/resources/Ambari-DDL-MySQL-CREATE.sql
[root@host]# ambari-server start

Finally, ensure that the MySQL driver is available to Ambari and the Hadoop cluster.  On the node running Ambari server, issue:

[root@host]# ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar

ambari-server is now started and ready for you to perform cluster installation and configuration.  If you are not familiar with Ambari cluster installation,  please review the appropriate documentation specific to your distribution.  For IOP,  see IOP 4.1 Installation.

Setup Hive Metastore to use hivedb:

During the cluster creation, at the “Customize Services” step, you will be asked to configure the Hive.¬† You will need to configure the Hive Metastore to use the hivedb database created before:

Customize Services

As seen in the above image, both Hive and Oozie are flagged as needing attention since they both need to be configured for database access.

Click on the [Hive] tab to configure Hive.¬† Click on the [Advanced] tab to show the advanced configuration, then choose the “Existing MySQL Database” radio button option and provide the appropriate values for the database host, database name, user and password.

Configure Hive

Click on the [Test Connection] button to verify that your connection to the hivedb is configured correctly.

Setup Oozie to use ooziedb:

After configuring the Hive service, proceed to configure Oozie to use the ooziedb database created in MySQL previously.

To configure Oozie, click on the [Oozie] tab, and choose the “Existing MySQL Database” radio button option.¬†¬† Enter the appropriate values for the database host, database name, user and password for Oozie.

Configure Oozie

Again, click on the [Test Connection] button to verify that your connection to the hivedb is configured correctly.

After customizing the configuration for all the services, click the [Next] button to deploy the cluster.  After cluster deployment is successful,  the Hive and Oozie services should be using the configured MySQL databases.   Note that both Hive and Oozie services will automatically create their schema in the configured databases by default.  If you intentionally configure Hive and Oozie to not do so by default,  you will need to manually create the schema for the respective databases following the documentation for Hive (Hive Schema Tool) and Oozie (Oozie Database Configuration).

Summary

This blog shows users how to configure a single MySQL database instance to provide database services for the Hadoop cluster.   With such a setup in your Hadoop cluster, database administration is simplified to dealing with a single database server on a single host.  This will further help facilitating delegation and automation of backup and restore tasks for the databases.

If there are any custom service needing a database to be installed after the initial cluster installation,  you should be able to follow the steps described in this blog to create the new database and setup the service to use the database.

1 comment on"Configure an Ambari Hadoop cluster to use a single MySQL database instance"

  1. Very useful, at the date of 07/01/2019 with a HDP 3.1 and APACHE AMBARI 2.7.3.0 is still up to date.

Join The Discussion

Your email address will not be published. Required fields are marked *