Ambari based Hadoop distributions such as IBM Open Platform with Apache Hadoop (IOP) and Hortonworks Data Platform (HDP) will configure multiple database servers to manage the metadata for the different components by default, .¬† In IOP 4.0 and 4.1, by default the Ambari Server is configured to use PostgreSQL, the Hive Metastore service is configured to use MySQL, and the Oozie service is configured to use Derby.¬† Having multiple and heterogeneous database servers in a cluster makes it difficult for an administrator to perform tasks such as performance tuning, back-up and restore due to differences in admin commands and configurations.
An IOP or HDP cluster can be configured to only a single database server instance for all its components requiring databases.¬† Managing a single database instance is much simpler, and can also facilitate delegation of administrative tasks to a centralized database administration team in the company.¬†¬† In this blog, I will present a step-by-step guide on how to configure an IOP or HDP cluster to use a single database instance.
Setting up the database server instance
To setup the Ambari Hadoop cluster to use a single database instance,¬† we need to settle on a database server.¬†¬† There are two common open source database choices that support by all the components:¬† MySQL and PostgreSQL.¬† In this blog, we will choose MySQL as the common database server to use for the cluster.
Install and configure MySQL server instance
First, you need to choose the node that you want the MySQL instance to run on.¬† This node can be part of the Hadoop cluster or not, but it will need to be accessible by the Hadoop cluster, specifically the nodes running the Ambari-server, the Hive Metastore and the Oozie services.
- As root or a sudo user, log into the node planned for running MySQL instance
- Install MySQL server binary if not already installed.¬† We will assume that the cluster is being installed inRHEL 6 environment:
- Start the MySQL instance service:
- Set the password for the local root user to secure the MySQL Server instance:
mysqladmin -u root password $rootpwd
By default, MySQL will be listening on port 3306, but it can be configured to listen on another port.¬† Make sure the configured port is open in your firewall.
Setting up the databases
Since we plan to set up the Ambari Server, the Hive Metastore service and the Oozie service to use the newly created MySQL instance,¬† we will create a separate database for each.¬† Alternatively, you can use a single database for all the services, but this is not recommended as it would¬†add more complexity in term of service authorization isolation and potential database object naming conflicts.
- Create databases forAmbari, Hive andOozie on the same node that MySQL was installed:
[root@host]# mysqladmin -u root -p$rootpwd create ambaridb
[root@host]# mysqladmin -u root -p$rootpwd create hivedb
[root@host]# mysqladmin -u root -p$rootpwd create ooziedb
Configure users and permission:
After creating the databases, we will need create the users for accessing these databases by the components.¬† We will also need to grant the proper permissions so that the services can perform CRUD operations against their respective databases.¬†¬† You will need to go into MySQL command shell to do this.
- Invoke the MySQL command shell from the node where MySQL was installed:
[root@host]# mysql-u root -p$rootpwd
- Create database owner forambaridb and grant permission to the database:
mysql> USE ambaridb;
mysql> CREATE USER 'ambari'@'localhost' IDENTIFIED BY 'ambari_password';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'ambari'@'localhost';
mysql>¬†CREATE USER 'ambari'@'%' IDENTIFIED BY 'ambari_password';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'ambari'@'%';
mysql> FLUSH PRIVILEGES;
- Create database owner forhivedb and grant permission to the database:
mysql> USE hivedb;
mysql> CREATE USER 'hive'@'localhost' IDENTIFIED BY 'hive_password';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'hive'@'localhost';
mysql> CREATE USER 'hive'@'%' IDENTIFIED BY 'hive_password';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%';
mysql> FLUSH PRIVILEGES;
- Create database owner forooziedb and grant permission to the database:
mysql> USE ooziedb;
mysql> CREATE USER 'oozie'@'localhost' IDENTIFIED BY 'oozie_password';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'oozie'@'localhost';
mysql> CREATE USER 'oozie'@'%' IDENTIFIED BY 'oozie_password';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'oozie'@'%';
mysql> FLUSH PRIVILEGES;
- Exit out of MySQL command shell:
Now, all the databases, users are created and proper permissions are granted to the users.¬† Next, we need to configure the different components and services to use the databases.
Setting up the components
Setup Ambari server to use ambaridb:
Before configuring any of the components to use MySQL, the MySQL JDBC driver will need to be installed on the Ambari Server node for later deployment to the cluster.
yum install mysql-connector-java*
Setting up Ambari server to use the MySQL database is done during the setup of Ambari-server after its installation.¬†¬†¬† The following output shows the steps to configure Ambari server to use the ambaridb database created in MySQL previously.¬† The blue text highlight the input required for this step:
[root@host]# ambari-server setup Using python /usr/bin/python2.6 Setup ambari-server Checking SELinux... SELinux status is 'disabled' Customize user account for ambari-server daemon [y/n] (n)? Adjusting ambari-server permissions and ownership... Checking firewall status... WARNING: iptables is running. Confirm the necessary Ambari ports are accessible. Refer to the Ambari documentation for more details on ports. OK to continue [y/n] (y)? Checking JDK...  OpenJDK 1.8.0  OpenJDK 1.7.0 (deprecated)  Custom JDK ============================================================================== Enter choice (1): Downloading JDK from http://birepo-build.svl.ibm.com/repos/IOP-UTILS/RHEL6/x86_64/1.1/openjdk/jdk-1.8.0.tar.gz to /var/lib/ambari-server/resources/jdk-1.8.0.tar.gz jdk-1.8.0.tar.gz... 100% (56.5 MB of 56.5 MB) Successfully downloaded JDK distribution to /var/lib/ambari-server/resources/jdk-1.8.0.tar.gz Installing JDK to /usr/jdk64/ Successfully installed JDK to /usr/jdk64/ Completing setup... Configuring database... Enter advanced database configuration [y/n] (n)? y Configuring database... ============================================================================== Choose one of the following options:  - PostgreSQL (Embedded)  - Oracle  - MySQL  - PostgreSQL ============================================================================== Enter choice (1): 3 Hostname (localhost): host Port (3306): 3306 Database name (ambari): ambaridb Username (ambari): ambari Enter Database Password (bigdata): Re-enter password: Configuring ambari database... Copying JDBC drivers to server resources... Configuring remote database connection properties... WARNING: Before starting Ambari Server, you must run the following DDL against the database to create the schema: /var/lib/ambari-server/resources/Ambari-DDL-MySQL-CREATE.sql Proceed with configuring remote database connection properties [y/n] (y)? y Extracting system views... ....ambari-admin-2.1.0_IBM_5.jar . Adjusting ambari-server permissions and ownership... Ambari Server 'setup' completed successfully. [root@host]#
Next,¬† we need to create the schema for Ambari Server in the ambaridb database¬† and start ambari-server.¬† If your MySQL host is different from the node running Ambari server,¬† you can use -h option to supply the host, and -P option to supply the port.¬† Review MySQL documentation for more details: Connecting to MySQL
[root@host]# mysql -u ambari -pambari_password ambaridb <¬† /var/lib/ambari-server/resources/Ambari-DDL-MySQL-CREATE.sql
[root@host]# ambari-server start
Finally, ensure that the MySQL driver is available to Ambari and the Hadoop cluster.¬† On the node running Ambari server, issue:
[root@host]# ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar
ambari-server is now started and ready for you to perform cluster installation and configuration.¬† If you are not familiar with Ambari cluster installation,¬† please review the appropriate documentation specific to your distribution.¬† For IOP,¬† see IOP 4.1 Installation.
Setup Hive Metastore to use hivedb:
During the cluster creation, at the “Customize Services” step, you will be asked to configure the Hive.¬† You will need to configure the Hive Metastore to use the hivedb database created before:
As seen in the above image, both Hive and Oozie are flagged as needing attention since they both need to be configured for database access.
Click on the [Hive] tab to configure Hive.¬† Click on the [Advanced] tab to show the advanced configuration, then choose the “Existing MySQL Database” radio button option and provide the appropriate values for the database host, database name, user and password.
Click on the [Test Connection] button to verify that your connection to the hivedb is configured correctly.
Setup Oozie to use ooziedb:
After configuring the Hive service, proceed to configure Oozie to use the ooziedb database created in MySQL previously.
To configure Oozie, click on the [Oozie] tab, and choose the “Existing MySQL Database” radio button option.¬†¬† Enter the appropriate values for the database host, database name, user and password for Oozie.
Again, click on the [Test Connection] button to verify that your connection to the hivedb is configured correctly.
After customizing the configuration for all the services, click the [Next] button to deploy the cluster.¬† After cluster deployment is successful,¬† the Hive and Oozie services should be using the configured MySQL databases. ¬† Note that both Hive and Oozie services will automatically create their schema in the configured databases by default.¬† If you intentionally configure Hive and Oozie to not do so by default,¬† you will need to manually create the schema for the respective databases following the documentation for Hive (Hive Schema Tool) and Oozie (Oozie Database Configuration).
This blog shows users how to configure a single MySQL database instance to provide database services for the Hadoop cluster. ¬† With such a setup in your Hadoop cluster, database administration is simplified to dealing with a single database server on a single host.¬† This will further help facilitating delegation and automation of backup and restore tasks for the databases.
If there are any custom service needing a database to be installed after the initial cluster installation,¬† you should be able to follow the steps described in this blog to create the new database and setup the service to use the database.