Apache Hive is an open-source data warehouse system, built on top of Hadoop. Hive provides data summarization and facilitates query, and analysis of large distributed datasets. It provides a simple SQL-like language called Hive Query Language (HQL) for querying and analyzing the data stored in Hadoop clusters.
In IOP 4.1, you can create a Hive view that facilitates writing and running Hive queries quite easily. This view is one among a set of user views that are pre-deployed in the Ambari web user interface. In this post, we will see how to deploy an instance of a Hive view from Ambari.
View Cluster configuration:
Before creating the Hive view, you need to perform the following steps to configure your cluster for the view:
Note: In the example used in this post, the logged-in user is admin and ambari-server is running as root.
1. Set up an HDFS proxy user for the Ambari daemon user account:
To allow the process user to impersonate the logged in user, setup a proxy user for root.
- From the Ambari dashboard, navigate to Advanced tab on the HDFS service Configs tab.
- Expand the Custom core-site section and add the following two new properties:
- Save the configuration changes and restart services as recommended.
2. Create the /user/admin folder on HDFS:
Since the view stores user metadata in HDFS under the /user/<logged-in-user> folder, create this folder if it does not exist. To do this, execute the following commands as the hdfs user (admin is the logged-in user in this case.) :
Now that the cluster configuration is complete, we are ready to create an instance of the view.
Creating a view instance:
1. Navigate to admin->Manage Ambari. This takes you to the Ambari admin view, and to see the pre-deployed views, click on Views link under section Views.
2. Expand HIVE and click on Create Instance to open up the Create Instance UI. In the Details section, specify the instance name, display name and description for the view.
3. Specify settings to match the Hive configuration you have in the Settings section. In the example used in this post, Hive uses the default authentication mode, so we will use the defaults and no changes need to be made to the view settings.
4. You can select the local Ambari managed cluster or specify a Custom url in the Cluster Configuration section. We will use the local cluster for this post.
5. Click on Save. You will see a popup that notifies you that the view was successfully created.
Executing the Hive view:
Note: Before executing the Hive view, make sure to grant permissions to the appropriate users and groups. For the example in this post, we will grant user ‘ambari-qa’ permissions to use the view.
To do this, open up the definition of HiveView and specify the user in the Permissions section.
Now that we’ve created a Hive view, you can run it using one of the ways listed below:
1. Clicking on Go to Instance in the Hive instance creation UI
2. Clicking on HiveView under the icon where the view is now listed.
This will show you the user interface of the Hive view. Now you are ready to author and run queries on Hive.
Using the Hive view:
The Hive view has tabs for browsing databases and running queries, viewing saved queries and history of queries executed, and a tab for adding UDFs. Some of the functionality available in the Hive view is shown in the screenshots below.
For the example used in this post, we use a simple table created in the default database. The table students stores the name, GPA and ages of students and a view topofclass, created on the table, fetches the name and GPA of students with GPA above 3.9.
1. Query Tab:
The Query tab includes the Database Explorer, Query Editor and Query Process Results window (which is visible when you execute a query in the Query Editor).
a. You can browse the default database and other databases in the Database Explorer and issue queries against database objects in the Query Editor. In the example below, we insert records into the students table.
b. The Query Process Results window show you the log and the results of the execution of the Query. You can see the details of the map-reduce job that was submitted and the url to track it.
c. You can save the results of the query execution to HDFS or Download the results as a CSV file.
d. You can also create new worksheets to execute queries. The Results tab in the Query Process Results window displays the results of executing queries.
e. To save the query for later reference, click on the “Save as..” button.
f. Worksheets can be renamed by clicking twice on the worksheet tab.
g. You can also get a textual explanation of the query by clicking on Explain button.
2. Saved Queries:
The Saved Queries tab lists any queries that may have been previously saved.
In the History tab, you can see a history of all queries executed. You can also filter the results as required.
In the UDFs tab, you can add user-defined functions (the definition of which is in a jar file on HDFS) to your query.