Overview
Hue is a set of web applications that enable users to interact with a Hadoop cluster through a web UI. It provides applications to create Oozie workflows, run Hive queries, access HBase, run Spark programs, access HDFS and Hadoop job information and many more.
This article describes how Hue can be installed and configured along with a cluster that is based on IBM Open Platform with Apache Hadoop. In addition, the article provides a script that automates the configuration of Hue to work with the IOP cluster.
Installation
Pre-requisites
Hue needs to be downloaded and installed on a node where the Hadoop, Hive, and Spark configurations are available. This can be a node in the IOP cluster where the clients are installed.
IOP 4.2 has been tested with Hue 3.10. Download the Hue 3.10 tarball from http://gethue.com/hue-3-10-with-its-new-sql-editor-is-out/.
The following dependencies are required to install and run Hue. In RHEL/CentOS distributions, the dependencies may be installed via yum:
# yum install ant
# yum install python-devel
# yum install krb5-devel
# yum install krb5-libs
# yum install libxml2
# yum install python-lxml
# yum install libxslt-devel
# yum install mysql-devel
# yum install openssl-devel
# yum install libffi-devel
# yum install sqlite-devel
# yum install openldap-devel
# yum install gmp-devel
# yum install python
Install Hue
After installing all the dependencies, as the root user, extract the tarball and run make install
to build and install Hue. This will create the Hue home directory under /usr/local/hue
.
sudo su tar -xzf hue-3.10.0.tgz cd hue-3.10.0 make install
Create a Hue user and group and change the owner of /usr/local/hue
to hue.
groupadd hue useradd -g hue hue chown -R hue:hue /usr/local/hue
Configuration Changes Required in Ambari UI
To use some of the services in Hue, configuration changes are required in Ambari.
- HDFS
- Ensure WebHDFS is enabled
- Add properties to custom core-site.xml
- Oozie
- Add properties to custom oozie-site.xml
- Hive
- Add properties to custom webhcat-site.xml
Restart the three services from the Ambari UI.
Without these changes, Hue will not be able to access the applications and the Hue UI will show potential misconfigurations.
Hue Configurations
The Hue configurations need to be updated with values from the cluster. The configuration for Hue is stored in hue.ini located in /usr/local/hue/desktop/conf/hue.ini
. This article explains two ways to configure hue.ini, automatically via a script provided, or manually.
Automatic Configuration
This article provides a script to retrieve the configurations using the Ambari REST APIs and updates hue.ini automatically. If the Ambari cluster has Kerberos enabled, please follow the steps in Security in Hue before running this script.
Download the archive containing the scripts at configHueIOP42.zip. To use this script, run these commands as the root user:
# unzip configHueIOP.zip # cd configHueIOP # ./config_hue_iop.sh -ambariuser=admin -ambaripassword=admin -ambariserver=hostname.abc.com -ambariport=8080 -services=ALL
This script takes in five parameters.
ambariuser
: Ambari admin user ID.ambaripassword
: Password of Ambari admin user.ambariserver
: Hostname of Ambari server.ambariport
: Port of Ambari server.services
: Comma-separated list of Ambari services that should be configured in Hue. To configure all services, specifyALL
$ ./config_hue_iop.sh -help Usage: config_hue_iop.sh -help config_hue_iop.sh -ambariuser=admin -ambaripassword=admin -ambariserver=hostname.abc.com -ambariport=8080 -services=HDFS config_hue_iop.sh -ambariuser=admin -ambaripassword=admin -ambariserver=hostname.abc.com -ambariport=8080 -services=HDFS,YARN,HIVE config_hue_iop.sh -ambariuser=admin -ambaripassword=admin -ambariserver=hostname.abc.com -ambariport=8080 -services=ALL
The script will prompt for additional user input, for example whether to start the HBase Thrift Server. After the script finished, check the generated log output for errors that may have occurred while starting the services, such as port conflicts.
Example of the output generated by the script:
./config_hue.sh -ambariuser=admin -ambaripassword=admin -ambariserver=hostname.abc.com -ambariport=8081 -services=ALL Logging output to hue_20151022_1327.log Use default hue.ini configuration file /usr/local/hue/desktop/conf/hue.ini ? y/n (y) Creating backup of hue.ini...hue.ini_20151022_1327.bak We will start the HBASE Thrift Server on this node: (hostname.abc.com), if you would like to start it on a different host, please select (n)o Do you want to configure and start HBASE Thrift Server now? y/n (y) Use default Hue Cluster name and port number for HBASE Thrift Server? y/n (y) nohup: redirecting stderr to stdout Kerberos is enabled in the cluster, will set security_enabled=true for all services TODO:: Update hue.ini manually to configure Kerberos keytab and principal for the hue user. [[kerberos]] hue_keytab=/etc/security/keytabs/hue.service.keytab hue_principal=hue/hostname.abc.com kinit_path=/path/to/kinit Do you want to start the Spark Livy Server? y/n (y) Do you want to start Zookeeper Rest Services? y/n (y) nohup: redirecting stderrto stdout Configuring hue.ini at /usr/local/hue/desktop/conf/hue.ini blocklisted Applications: impala,sqoop Updated HBASE hbase_clusters=(C1|hostname.abc.com:9090) Started HBASE Thrift Server, pid: 3633270 Please check hue_20151022_1327.log for any port conflict errors Updated HDFS fs_defaultfs=hdfs://hostname.abc.com:8020 fs_defaultfs=hostname.abc.com:50070 Updated HIVE hive_server_host=hostname.abc.com hive_server_port=10000 Updated MAPREDUCE2 history_server_api_url=hostname.abc.com:19888 Updated OOZIE oozie_url=http://hostname.abc.com:11000/oozie Updated PIG local_sample_dir=/usr/iop/current/pig-client/piggybank.jar Updated SOLR solr_url=http://hostname.abc.com:8983/solr/ Updated SPARK spark_history_server_url=http://hostname.abc.com:18080 livy_server_host=hostname.abc.com Started Hue Spark Livy Server, pid: 3633539 Updated YARN resourcemanager_host=hostname.abc.com resourcemanager_port=8050 resourcemanager_api_url=hostname.abc.com:8088 proxy_api_url=http://hostname.abc.com:8088 Updated ZOOKEEPER host_ports=hostname.abc.com:2181 Started Zookeeper Rest Services, pid: 3633717 rest_url=http://hostname.abc.com:9998/
Manual Configuration
This section explains the manual steps of updating the Hue configurations if not using the provided script.
Hue.ini has a section for each configured service and the properties that need to be updated with the values from the IOP cluster are marked in bold.
- desktop.secret_key
[desktop] # Set this to a random string, the longer the better. # This is used for secure hashing in the session store. secret_key=
If this key is not configured, Hue will show a misconfiguration in the UI:
In the automatic configuration, the secret_key is randomly generated. - Hadoop
[hadoop] # Configuration for HDFS NameNode # ------------------------------------------------------------------------ [[hdfs_clusters]] # HA support by using HttpFs [[[default]]] # Enter the filesystem uri fs_defaultfs=hdfs://hostname.abc.com:8020 # Use WebHdfs/HttpFs as the communication mechanism. # Domain should be the NameNode or HttpFs host. # Default port is 14000 for HttpFs. webhdfs_url=http://hostname.abc.com:50070/webhdfs/v1 # Directory of the Hadoop configuration ## hadoop_conf_dir=$HADOOP_CONF_DIR when set or '/etc/hadoop/conf' # Configuration for YARN (MR2) # ------------------------------------------------------------------------ [[yarn_clusters]] [[[default]]] # Enter the host on which you are running the ResourceManager resourcemanager_host=hostname.abc.com # The port where the ResourceManager IPC listens on resourcemanager_port=8050 ... # URL of the ResourceManager API resourcemanager_api_url=http://hostname.abc.com:8088 # URL of the ProxyServer API proxy_api_url=http://hostname.abc.com:8088 # URL of the HistoryServer API history_server_api_url=http://hostname.abc.com:19888 # URL of the Spark History Server spark_history_server_url=http://hostname.abc.com:18080
- Oozie
[liboozie] # The URL where the Oozie service runs on. This is required in order for # users to submit jobs. Empty value disables the config check. oozie_url=http://hostname.abc.com:11000/oozie
- Hive
[beeswax] # Host where HiveServer2 is running. # If Kerberos security is enabled, use fully-qualified domain name (FQDN). hive_server_host=hostname.abc.com # Port where HiveServer2 Thrift server runs on. hive_server_port=10000 # Hive configuration directory, where hive-site.xml is located ## hive_conf_dir=/etc/hive/conf
- Pig
[pig] # Location of piggybank.jar on local filesystem. local_sample_dir=/usr/iop/current/pig-client/piggybank.jar # Location piggybank.jar will be copied to in HDFS. ## remote_data_dir=/user/hue/pig/examples
- HBase
To be able to use the HBase application, the HBase thrift server needs to be started. The HBase thrift server is not managed by Ambari. To start the server, run the following command as root user:nohup hbase thrift start &
By default, the HBase thrift server runs on port 9090. To use a different port, pass in a new port with the start command:
nohup hbase thrift start --port <custom_port> &
If HBase thrift server is not started, Hue shows the following error:
HBase Thrift 1 server cannot be contacted: Could not connect to hostname.abc.com:9090
Once the HBase Thrift Server is up and running, update hue.ini with the hostname and port number of the thrift server.
[hbase] # Comma-separated list of HBase Thrift servers for clusters in the format of '(name|host:port)'. # Use full hostname with security. # If using Kerberos we assume GSSAPI SASL, not PLAIN. hbase_clusters=(Cluster|hostname.abc.com:9090)
- Solr
[search] # URL of the Solr Server solr_url=http://hostname.abc.com:8983/solr/
- Zookeeper
To enable ZNode browsing in the Zookeeper application, the Zookeeper REST service needs to be started. The Zookeeper REST service is not managed by Ambari. To start the service, run the following command as hue user:/usr/jdk64/java-1.8.0-openjdk-1.8.0.45-28.b13.el6_6.x86_64/bin/java -cp /usr/iop/current/zookeeper-server/contrib/rest/*:/usr/iop/current/zookeeper-server/contrib/rest/lib/*:/usr/iop/current/zookeeper-server/zookeeper.jar:/usr/iop/current/zookeeper-server/conf/rest org.apache.zookeeper.server.jersey.RestMain
By default, the Zookeeper REST service runs on port 9998. To use a different port, update the value for
rest.port
in/usr/iop/current/zookeeper-server/conf/rest/rest.properties
.
Once the Zookeeper REST service is up and running, update hue.ini with the hostname and port number of the Zookeeper REST service.[zookeeper] [[clusters]] [[[default]]] # Zookeeper ensemble. Comma separated list of Host/Port. # e.g. localhost:2181,localhost:2182,localhost:2183 host_ports=hostname.abc.com:2181 # The URL of the REST contrib service (required for znode browsing). rest_url=http://hostname.abc.com:9998
- Spark
To be able to use the Spark application, the Spark Livy server needs to be started. Livy is a REST service on top of Spark and is not managed by Ambari. To start the server, run the following command as hue user:cd /usr/local/hue /usr/loca/hue/build/env/bin/hue livy_server
By default, Livy server runs on port 8998.
The Livy server has to be started in a directory that the Hue user has write access to. When starting the Livy server, the logs will be written tologs/livy_server.log
. Once the Livy server is up and running, update hue.ini with the hostname and port number of the Livy server.[spark] # Host address of the Livy Server. livy_server_host=hostname.abc.com # Port of the Livy Server. ## livy_server_port=8998
Start Hue
After completing the configuration, start Hue by running the following command as the root
user:
/usr/local/hue/build/env/bin/supervisor
By default, Hue will be running with port 8888
. To update the port number, configure http_port
in hue.ini.
To access the Hue web UI, open a browser with URL http://hostname.abc.com:8888
.
User Management
On the first login, Hue prompts to create a Hue superuser who will have admin access to the web UI.
As the Hue superuser, create additional users who can log into Hue. These users may be granted permissions to access certain Hue applications, such as allowing a user Bob to only have access to launch the File Browser Application. Below is an example on how to create a user hive, who is part of the hadoop group, in Hue.
- Login to Hue UI with your Hue superuser and navigate to the top right corner of the taskbar. Click on the down arrow next to your superuser name, and click on Manage Users
- First create a new group named hadoop and assign the permissions for the group.
- Next, create a user, hive, that will be part of the hadoop group.
- On step 2 of creating a user, assign the hive user to the hadoop group.
- On step 3 of creating a user, make sure the user is “active.” The “superuser status” option will give the new user the same superuser status as the first user created when starting Hue for the first time.
Security in Hue
To configure Hue with LDAP, follow the instructions from http://gethue.com/making-hadoop-accessible-to-your-employees-with-ldap.
To configure Hue with Kerberos, follow the steps below:
- Follow the steps from Setting Up Kerberos for Use with Ambari to setup a KDC and kerberize the Ambari cluster.
- After kerberizing the Ambari cluster, configure Hue
To create a keytab and principal for the Hue user, run the commands in bold:
# kadmin Authenticating as principal root/admin@IBM.COM with password. Password for root/admin@IBM.COM: kadmin: addprinc -randkey hue/hostname.abc.com@IBM.COM WARNING: no policy specified for hue/hostname.abc.com@IBM.COM; defaulting to no policy Principal "hue/hostname.abc.com@IBM.COM" created. kadmin: xst -k /etc/security/keytabs/hue.service.keytab hue/hostname.abc.com@IBM.COM Entry for principal hue/hostname.abc.com@IBM.COM with kvno 2, encryption type aes256-cts-hmac-sha1-96 added to keytab WRFILE:/etc/security/keytabs/hue.service.keytab. Entry for principal hue/hostname.abc.com@IBM.COM with kvno 2, encryption type aes128-cts-hmac-sha1-96 added to keytab WRFILE:/etc/security/keytabs/hue.service.keytab. Entry for principal hue/hostname.abc.com@IBM.COM with kvno 2, encryption type des3-cbc-sha1 added to keytab WRFILE:/etc/security/keytabs/hue.service.keytab. Entry for principal hue/hostname.abc.com@IBM.COM with kvno 2, encryption type arcfour-hmac added to keytab WRFILE:/etc/security/keytabs/hue.service.keytab. # kinit -k -t /etc/security/keytabs/hue.service.keytab hue/hostname.abc.com@IBM.COM # klist Ticket cache: FILE:/tmp/krb5cc_0 Default principal: hue/hostname.abc.com@IBM.COM Valid starting Expires Service principal 09/08/15 19:18:48 09/09/15 19:18:48 krbtgt/IBM.COM@IBM.COM renew until 09/08/15 19:18:48
- Update
/usr/local/hue/desktop/conf/hue.ini
.
Uncomment the line ## security_enabled=false
in all the services and set it to true
security_enabled=true
Modify the kerberos
section to the location of the hue_keytab and principal that were created in step 2.
[[kerberos]] # Path to Hue's Kerberos keytab file hue_keytab=/etc/security/keytabs/hue.service.keytab # Kerberos principal name for Hue hue_principal=hue/hostname.abc.com@IBM.COM # Path to kinit kinit_path=/usr/bin/kinit
- Configure HBase hbase-site.xml in Ambari to add the kerberos principal name. Restart HBase service in Ambari
- Restart Hue as root:
/usr/lib/hue/build/env/bin/supervisor
Potential Kerberos-Related Problems when starting the Hue Server
- Permission denied while getting credentials
[09/Sep/2015 10:02:00 -0700] kt_renewer INFO Reinitting kerberos from keytab: /usr/bin/kinit -r 3600m -k -t /etc/security/keytabs/hue.service.keytab -c /tmp/hue_krb5_ccache hue/hostname.abc.com@IBM.COM [09/Sep/2015 10:02:00 -0700] kt_renewer ERROR Couldn't reinit from keytab! `kinit' exited with 1. kinit: Permission denied while getting initial credentials
Solution: Ensure hue keytab is readable by hue user.
# chown hue:hue /etc/security/keytabs/hue.service.keytab
- Permission denied: ‘/tmp/hue_krb5_ccache’
IOError: [Errno 13] Permission denied: '/tmp/hue_krb5_ccache'.
Solution: Ensure /tmp/hue_krb5_ccache is writable by Hue user
# chown hue:hue /tmp/hue_krb5_ccache
- Error: Couldn’t renew Kerberos ticket
[09/Sep/2015 10:22:23 -0700] kt_renewer ERROR Couldn't renew kerberos ticket in order to work around Kerberos 1.8.1 issue. Please check that the ticket for 'hue/hostname.abc.com@IBM.COM' is still renewable: $ kinit -f -c /tmp/hue_krb5_ccache If the 'renew until' date is the same as the 'valid starting' date, the ticket cannot be renewed. Please check your KDC configuration, and the ticket renewal policy (maxrenewlife) for the 'hue/hostname.abc.com@IBM.COM' and `krbtgt' principals.
Solution: Modify Hue principal to allow renewable tickets
# kadmin.local Authenticating as principal root/admin@IBM.COM with password. kadmin.local: modprinc -maxrenewlife 7day krbtgt/IBM.COM@IBM.COM Principal "krbtgt/IBM.COM@IBM.COM" modified. kadmin.local: modprinc -maxrenewlife 7day +allow_renewable hue/hostname.abc.com@IBM.COM Principal "hue/hostname.abc.com@IBM.COM" modified.
Limitations and Workarounds
Limitations
- Sqoop2 and Impala applications are not supported in Hue when installing over IOP. Modify
/usr/local/hue/desktop/conf/hue.ini
to blocklist those applications.# Comma separated list of apps to not load at server startup. # e.g.: pig,zookeeper app_blocklist=impala,sqoop
- The Solr examples are not working with the version of Solr in IOP.
- Spark Yarn mode is not supported; instead, use the default spark process mode.
- Knox is not supported, please see https://issues.apache.org/jira/browse/KNOX-44
Workarounds
- When installing the Pig example, the following error may occur when executing the Pig script:
ERROR 1070: Could not resolve org.apache.pig.piggybank.evaluation.string.UPPER using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Could not resolve org.apache.pig.piggybank.evaluation.string.UPPER using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Solution: Modify one line in the example Pig script
from “upper_case = FOREACH data GENERATE org.apache.pig.piggybank.evaluation.string.UPPER(text);
”
to “upper_case = FOREACH data GENERATE UPPER(text);
” - When installing the Hive examples as the Hue admin user, a permission denied error may occur because the Hue admin user does not have access to the Hive warehouse directory in HDFS.
Solution: Install the Hive examples using another user who have access to the Hive warehouse directory.