IBM Support

How to Install Hue on top of IOP 4.2 - Hadoop Dev

Technical Blog Post


Abstract

How to Install Hue on top of IOP 4.2 - Hadoop Dev

Body

Overview

Hue is a set of web applications that enable users to interact with a Hadoop cluster through a web UI. It provides applications to create Oozie workflows, run Hive queries, access HBase, run Spark programs, access HDFS and Hadoop job information and many more.
This article describes how Hue can be installed and configured along with a cluster that is based on IBM Open Platform with Apache Hadoop. In addition, the article provides a script that automates the configuration of Hue to work with the IOP cluster.

Installation

Pre-requisites

Hue needs to be downloaded and installed on a node where the Hadoop, Hive, and Spark configurations are available. This can be a node in the IOP cluster where the clients are installed.

IOP 4.2 has been tested with Hue 3.10. Download the Hue 3.10 tarball from http://gethue.com/hue-3-10-with-its-new-sql-editor-is-out/.

The following dependencies are required to install and run Hue. In RHEL/CentOS distributions, the dependencies may be installed via yum:

  1. # yum install ant
  2. # yum install python-devel
  3. # yum install krb5-devel
  4. # yum install krb5-libs
  5. # yum install libxml2
  6. # yum install python-lxml
  7. # yum install libxslt-devel
  8. # yum install mysql-devel
  9. # yum install openssl-devel
  10. # yum install libffi-devel
  11. # yum install sqlite-devel
  12. # yum install openldap-devel
  13. # yum install gmp-devel
  14. # yum install python

Install Hue

After installing all the dependencies, as the root user, extract the tarball and run make install to build and install Hue. This will create the Hue home directory under /usr/local/hue.

sudo su  tar -xzf hue-3.10.0.tgz  cd hue-3.10.0  make install  

Create a Hue user and group and change the owner of /usr/local/hue to hue.

groupadd hue  useradd -g hue hue  chown -R hue:hue /usr/local/hue  

Configuration Changes Required in Ambari UI

To use some of the services in Hue, configuration changes are required in Ambari.

  1. HDFS
      • Ensure WebHDFS is enabled

    webhdfs

      • Add properties to custom core-site.xml

    core-site-IOP42

  2. Oozie
      • Add properties to custom oozie-site.xml

    oozie-site

  3. Hive
      • Add properties to custom webhcat-site.xml

    webhcat-site

Restart the three services from the Ambari UI.
Without these changes, Hue will not be able to access the applications and the Hue UI will show potential misconfigurations.
error_NoProxyUsersAddedInAmbari

Hue Configurations

The Hue configurations need to be updated with values from the cluster. The configuration for Hue is stored in hue.ini located in /usr/local/hue/desktop/conf/hue.ini. This article explains two ways to configure hue.ini, automatically via a script provided, or manually.

Automatic Configuration

This article provides a script to retrieve the configurations using the Ambari REST APIs and updates hue.ini automatically. If the Ambari cluster has Kerberos enabled, please follow the steps in Security in Hue before running this script.

Download the archive containing the scripts at configHueIOP42.zip. To use this script, run these commands as the root user:

# unzip configHueIOP.zip  # cd configHueIOP  # ./config_hue_iop.sh -ambariuser=admin -ambaripassword=admin -ambariserver=hostname.abc.com -ambariport=8080 -services=ALL  

This script takes in five parameters.

  • ambariuser: Ambari admin user ID.
  • ambaripassword: Password of Ambari admin user.
  • ambariserver: Hostname of Ambari server.
  • ambariport: Port of Ambari server.
  • services: Comma-separated list of Ambari services that should be configured in Hue. To configure all services, specify ALL
$ ./config_hue_iop.sh -help  Usage:  config_hue_iop.sh -help  config_hue_iop.sh -ambariuser=admin -ambaripassword=admin -ambariserver=hostname.abc.com -ambariport=8080 -services=HDFS  config_hue_iop.sh -ambariuser=admin -ambaripassword=admin -ambariserver=hostname.abc.com -ambariport=8080 -services=HDFS,YARN,HIVE  config_hue_iop.sh -ambariuser=admin -ambaripassword=admin -ambariserver=hostname.abc.com -ambariport=8080 -services=ALL  

The script will prompt for additional user input, for example whether to start the HBase Thrift Server. After the script finished, check the generated log output for errors that may have occurred while starting the services, such as port conflicts.

Example of the output generated by the script:

./config_hue.sh -ambariuser=admin -ambaripassword=admin -ambariserver=hostname.abc.com -ambariport=8081 -services=ALL  Logging output to hue_20151022_1327.log  Use default hue.ini configuration file /usr/local/hue/desktop/conf/hue.ini ? y/n (y)  Creating backup of hue.ini...hue.ini_20151022_1327.bak  We will start the HBASE Thrift Server on this node: (hostname.abc.com), if you would like to start it on a different host, please select (n)o  Do you want to configure and start HBASE Thrift Server now? y/n (y)  Use default Hue Cluster name and port number for HBASE Thrift Server? y/n (y)  nohup: redirecting stderr to stdout  Kerberos is enabled in the cluster, will set security_enabled=true for all services  TODO:: Update hue.ini manually to configure Kerberos keytab and principal for the hue user.    [[kerberos]]      hue_keytab=/etc/security/keytabs/hue.service.keytab      hue_principal=hue/hostname.abc.com      kinit_path=/path/to/kinit  Do you want to start the Spark Livy Server? y/n (y)  Do you want to start Zookeeper Rest Services? y/n (y) nohup: redirecting stderrto stdout    Configuring hue.ini at /usr/local/hue/desktop/conf/hue.ini  blocklisted Applications: impala,sqoop  Updated HBASE   hbase_clusters=(C1|hostname.abc.com:9090)  Started HBASE Thrift Server, pid: 3633270  Please check hue_20151022_1327.log for any port conflict errors  Updated HDFS   fs_defaultfs=hdfs://hostname.abc.com:8020   fs_defaultfs=hostname.abc.com:50070  Updated HIVE   hive_server_host=hostname.abc.com   hive_server_port=10000  Updated MAPREDUCE2   history_server_api_url=hostname.abc.com:19888  Updated OOZIE   oozie_url=http://hostname.abc.com:11000/oozie  Updated PIG   local_sample_dir=/usr/iop/current/pig-client/piggybank.jar  Updated SOLR   solr_url=http://hostname.abc.com:8983/solr/  Updated SPARK   spark_history_server_url=http://hostname.abc.com:18080   livy_server_host=hostname.abc.com  Started Hue Spark Livy Server, pid: 3633539  Updated YARN   resourcemanager_host=hostname.abc.com   resourcemanager_port=8050   resourcemanager_api_url=hostname.abc.com:8088   proxy_api_url=http://hostname.abc.com:8088  Updated ZOOKEEPER   host_ports=hostname.abc.com:2181  Started Zookeeper Rest Services, pid: 3633717   rest_url=http://hostname.abc.com:9998/  

Manual Configuration

This section explains the manual steps of updating the Hue configurations if not using the provided script.
Hue.ini has a section for each configured service and the properties that need to be updated with the values from the IOP cluster are marked in bold.

  1. desktop.secret_key
    [desktop]      # Set this to a random string, the longer the better.    # This is used for secure hashing in the session store.    secret_key=  

    If this key is not configured, Hue will show a misconfiguration in the UI:
    secret_key
    In the automatic configuration, the secret_key is randomly generated.

  2. Hadoop
    [hadoop]          # Configuration for HDFS NameNode        # ------------------------------------------------------------------------        [[hdfs_clusters]]          # HA support by using HttpFs            [[[default]]]            # Enter the filesystem uri            fs_defaultfs=hdfs://hostname.abc.com:8020              # Use WebHdfs/HttpFs as the communication mechanism.            # Domain should be the NameNode or HttpFs host.            # Default port is 14000 for HttpFs.            webhdfs_url=http://hostname.abc.com:50070/webhdfs/v1              # Directory of the Hadoop configuration            ## hadoop_conf_dir=$HADOOP_CONF_DIR when set or '/etc/hadoop/conf'          # Configuration for YARN (MR2)        # ------------------------------------------------------------------------        [[yarn_clusters]]            [[[default]]]            # Enter the host on which you are running the ResourceManager            resourcemanager_host=hostname.abc.com                # The port where the ResourceManager IPC listens on            resourcemanager_port=8050            ...            # URL of the ResourceManager API            resourcemanager_api_url=http://hostname.abc.com:8088              # URL of the ProxyServer API            proxy_api_url=http://hostname.abc.com:8088              # URL of the HistoryServer API            history_server_api_url=http://hostname.abc.com:19888              # URL of the Spark History Server            spark_history_server_url=http://hostname.abc.com:18080  
  3. Oozie
    [liboozie]      # The URL where the Oozie service runs on. This is required in order for    # users to submit jobs. Empty value disables the config check.    oozie_url=http://hostname.abc.com:11000/oozie  
  4. Hive
    [beeswax]          # Host where HiveServer2 is running.        # If Kerberos security is enabled, use fully-qualified domain name (FQDN).        hive_server_host=hostname.abc.com          # Port where HiveServer2 Thrift server runs on.        hive_server_port=10000          # Hive configuration directory, where hive-site.xml is located        ## hive_conf_dir=/etc/hive/conf  
  5. Pig
    [pig]        # Location of piggybank.jar on local filesystem.        local_sample_dir=/usr/iop/current/pig-client/piggybank.jar          # Location piggybank.jar will be copied to in HDFS.        ## remote_data_dir=/user/hue/pig/examples  
  6. HBase
    To be able to use the HBase application, the HBase thrift server needs to be started. The HBase thrift server is not managed by Ambari. To start the server, run the following command as root user:

    nohup hbase thrift start &  

    By default, the HBase thrift server runs on port 9090. To use a different port, pass in a new port with the start command:

    nohup hbase thrift start --port <custom_port> &  

    If HBase thrift server is not started, Hue shows the following error:

    HBase Thrift 1 server cannot be contacted: Could not connect to hostname.abc.com:9090  

    Once the HBase Thrift Server is up and running, update hue.ini with the hostname and port number of the thrift server.

    [hbase]    # Comma-separated list of HBase Thrift servers for clusters in the format of '(name|host:port)'.    # Use full hostname with security.    # If using Kerberos we assume GSSAPI SASL, not PLAIN.    hbase_clusters=(Cluster|hostname.abc.com:9090)  
  7. Solr
    [search]        # URL of the Solr Server        solr_url=http://hostname.abc.com:8983/solr/  
  8. Zookeeper
    To enable ZNode browsing in the Zookeeper application, the Zookeeper REST service needs to be started. The Zookeeper REST service is not managed by Ambari. To start the service, run the following command as hue user:

    /usr/jdk64/java-1.8.0-openjdk-1.8.0.45-28.b13.el6_6.x86_64/bin/java -cp /usr/iop/current/zookeeper-server/contrib/rest/*:/usr/iop/current/zookeeper-server/contrib/rest/lib/*:/usr/iop/current/zookeeper-server/zookeeper.jar:/usr/iop/current/zookeeper-server/conf/rest org.apache.zookeeper.server.jersey.RestMain  

    By default, the Zookeeper REST service runs on port 9998. To use a different port, update the value for rest.port in /usr/iop/current/zookeeper-server/conf/rest/rest.properties.
    Once the Zookeeper REST service is up and running, update hue.ini with the hostname and port number of the Zookeeper REST service.

    [zookeeper]          [[clusters]]            [[[default]]]            # Zookeeper ensemble. Comma separated list of Host/Port.            # e.g. localhost:2181,localhost:2182,localhost:2183            host_ports=hostname.abc.com:2181              # The URL of the REST contrib service (required for znode browsing).            rest_url=http://hostname.abc.com:9998  
  9. Spark
    To be able to use the Spark application, the Spark Livy server needs to be started. Livy is a REST service on top of Spark and is not managed by Ambari. To start the server, run the following command as hue user:

    cd /usr/local/hue  /usr/loca/hue/build/env/bin/hue livy_server  

    By default, Livy server runs on port 8998.
    The Livy server has to be started in a directory that the Hue user has write access to. When starting the Livy server, the logs will be written to logs/livy_server.log. Once the Livy server is up and running, update hue.ini with the hostname and port number of the Livy server.

    [spark]        # Host address of the Livy Server.        livy_server_host=hostname.abc.com          # Port of the Livy Server.        ## livy_server_port=8998  

Start Hue

After completing the configuration, start Hue by running the following command as the root user:

/usr/local/hue/build/env/bin/supervisor  

By default, Hue will be running with port 8888. To update the port number, configure http_port in hue.ini.
To access the Hue web UI, open a browser with URL http://hostname.abc.com:8888.

User Management

On the first login, Hue prompts to create a Hue superuser who will have admin access to the web UI.

initial_login

 

 

As the Hue superuser, create additional users who can log into Hue. These users may be granted permissions to access certain Hue applications, such as allowing a user Bob to only have access to launch the File Browser Application. Below is an example on how to create a user hive, who is part of the hadoop group, in Hue.

    • Login to Hue UI with your Hue superuser and navigate to the top right corner of the taskbar. Click on the down arrow next to your superuser name, and click on Manage Users

hue_user_home

    • First create a new group named hadoop and assign the permissions for the group.

hue_create_group

    • Next, create a user, hive, that will be part of the hadoop group.

hue_create_user_step1

    • On step 2 of creating a user, assign the hive user to the hadoop group.

hue_create_user_step2

    • On step 3 of creating a user, make sure the user is “active.” The “superuser status” option will give the new user the same superuser status as the first user created when starting Hue for the first time.

hue_create_user_step3

Security in Hue

To configure Hue with LDAP, follow the instructions from http://gethue.com/making-hadoop-accessible-to-your-employees-with-ldap.
To configure Hue with Kerberos, follow the steps below:

    1. Follow the steps from Setting Up Kerberos for Use with Ambari to setup a KDC and kerberize the Ambari cluster.
    2. After kerberizing the Ambari cluster, configure Hue

To create a keytab and principal for the Hue user, run the commands in bold:

# kadmin  Authenticating as principal root/admin@IBM.COM with password.  Password for root/admin@IBM.COM:  kadmin:  addprinc -randkey hue/hostname.abc.com@IBM.COM    WARNING: no policy specified for hue/hostname.abc.com@IBM.COM; defaulting to no policy    Principal "hue/hostname.abc.com@IBM.COM" created.    kadmin:  xst -k /etc/security/keytabs/hue.service.keytab hue/hostname.abc.com@IBM.COM    Entry for principal hue/hostname.abc.com@IBM.COM with kvno 2, encryption type aes256-cts-hmac-sha1-96 added to keytab WRFILE:/etc/security/keytabs/hue.service.keytab.    Entry for principal hue/hostname.abc.com@IBM.COM with kvno 2, encryption type aes128-cts-hmac-sha1-96 added to keytab WRFILE:/etc/security/keytabs/hue.service.keytab.    Entry for principal hue/hostname.abc.com@IBM.COM with kvno 2, encryption type des3-cbc-sha1 added to keytab WRFILE:/etc/security/keytabs/hue.service.keytab.    Entry for principal hue/hostname.abc.com@IBM.COM with kvno 2, encryption type arcfour-hmac added to keytab WRFILE:/etc/security/keytabs/hue.service.keytab.    # kinit -k -t /etc/security/keytabs/hue.service.keytab hue/hostname.abc.com@IBM.COM  # klist  Ticket cache: FILE:/tmp/krb5cc_0  Default principal: hue/hostname.abc.com@IBM.COM  Valid starting     Expires            Service principal  09/08/15 19:18:48  09/09/15 19:18:48  krbtgt/IBM.COM@IBM.COM          renew until 09/08/15 19:18:48  
  • Update /usr/local/hue/desktop/conf/hue.ini.

Uncomment the line ## security_enabled=false in all the services and set it to true

security_enabled=true  

Modify the kerberos section to the location of the hue_keytab and principal that were created in step 2.

[[kerberos]]    # Path to Hue's Kerberos keytab file  hue_keytab=/etc/security/keytabs/hue.service.keytab  # Kerberos principal name for Hue  hue_principal=hue/hostname.abc.com@IBM.COM  # Path to kinit  kinit_path=/usr/bin/kinit  
  • Configure HBase hbase-site.xml in Ambari to add the kerberos principal name. Restart HBase service in Ambari

hbase_kerberos

  • Restart Hue as root: /usr/lib/hue/build/env/bin/supervisor

Potential Kerberos-Related Problems when starting the Hue Server

    • Permission denied while getting credentials
[09/Sep/2015 10:02:00 -0700] kt_renewer   INFO     Reinitting kerberos from keytab: /usr/bin/kinit -r 3600m -k -t /etc/security/keytabs/hue.service.keytab -c /tmp/hue_krb5_ccache hue/hostname.abc.com@IBM.COM  [09/Sep/2015 10:02:00 -0700] kt_renewer   ERROR    Couldn't reinit from keytab! `kinit' exited with 1.  kinit: Permission denied while getting initial credentials  

Solution: Ensure hue keytab is readable by hue user.

# chown hue:hue /etc/security/keytabs/hue.service.keytab  
  • Permission denied: ‘/tmp/hue_krb5_ccache’
IOError: [Errno 13] Permission denied: '/tmp/hue_krb5_ccache'.  

Solution: Ensure /tmp/hue_krb5_ccache is writable by Hue user

# chown hue:hue /tmp/hue_krb5_ccache  
  • Error: Couldn’t renew Kerberos ticket
[09/Sep/2015 10:22:23 -0700] kt_renewer   ERROR    Couldn't renew kerberos ticket in order to work around Kerberos 1.8.1 issue. Please check that the ticket for 'hue/hostname.abc.com@IBM.COM' is still renewable:      $ kinit -f -c /tmp/hue_krb5_ccache    If the 'renew until' date is the same as the 'valid starting' date, the ticket cannot be renewed. Please check your KDC configuration, and the ticket renewal policy (maxrenewlife) for the 'hue/hostname.abc.com@IBM.COM' and `krbtgt' principals.  

Solution: Modify Hue principal to allow renewable tickets

# kadmin.local  Authenticating as principal root/admin@IBM.COM with password.  kadmin.local:  modprinc -maxrenewlife 7day krbtgt/IBM.COM@IBM.COM  Principal "krbtgt/IBM.COM@IBM.COM" modified.  kadmin.local:  modprinc -maxrenewlife 7day +allow_renewable hue/hostname.abc.com@IBM.COM  Principal "hue/hostname.abc.com@IBM.COM" modified.  

Limitations and Workarounds

Limitations

  • Sqoop2 and Impala applications are not supported in Hue when installing over IOP. Modify /usr/local/hue/desktop/conf/hue.ini to blocklist those applications.
      # Comma separated list of apps to not load at server startup.    # e.g.: pig,zookeeper    app_blocklist=impala,sqoop  
  • The Solr examples are not working with the version of Solr in IOP.
  • Spark Yarn mode is not supported; instead, use the default spark process mode.
  • Knox is not supported, please see https://issues.apache.org/jira/browse/KNOX-44

Workarounds

  • When installing the Pig example, the following error may occur when executing the Pig script:
    ERROR 1070: Could not resolve org.apache.pig.piggybank.evaluation.string.UPPER using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]  org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Could not resolve org.apache.pig.piggybank.evaluation.string.UPPER using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]  

    Solution: Modify one line in the example Pig script
    from “upper_case = FOREACH data GENERATE org.apache.pig.piggybank.evaluation.string.UPPER(text);
    to “upper_case = FOREACH data GENERATE UPPER(text);

  • When installing the Hive examples as the Hue admin user, a permission denied error may occur because the Hue admin user does not have access to the Hive warehouse directory in HDFS.
    Solution: Install the Hive examples using another user who have access to the Hive warehouse directory.

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm16260061