By default HDFS, Yarn, MapReduce Hadoop web interfaces are not “Kerberized”. In IOP, access to Hadoop web interfaces is through Apache Knox. Apache Knox is a very powerful proxy optimized for Hadoop components access. Without Knox Apache we would need to access Hadoop web interfaces directly without any authentication protection. To turn such authentication protection end-to-end it is necessary to enable Kerberos in the Hadoop Web interfaces.
This means to enable SPNEGO protocol since Web and REST access means HTTP access. Knox provides an authenticated access to the Hadoop Web Interfaces using either PAM or LDAP through Basic Authentication, then in turn Knox proxy accesses the Hadoop Web interfaces using SPNEGO.
Following are the steps and some discussion on enabling Hadoop Web interfaces with Kerberos. Do this after you have enable your Kerberos cluster
1. Create a “secret” file to be used by “Hadoop SPNEGO” filter.
dd if=/dev/urandom of=/etc/security/your_secret_data_file bs=1024 count=1 chown hdfs:hadoop /etc/security/your_secret_data_file chmod 440 /etc/security/your_secret_data_file
Your file your_secret_data_file, is only needed on the namenode servers. This is the file used by the Hadoop HTTP SPNEGO, see reference at “Hadoop Auth,Java HTTP SPNEGO” from Hadoop documentation, to sign the HTTP cookie used for the SPNEGO protocol.
2. Change HDFS configuration
The following entry gives proxy knox user access to Hadoop servlets secure paths. You will need to change your hdfs-site (through Ambari advanced hdfs-site).
3. The following entries setup Hadoop SPNEGO filter.
You need to add these settings to the core-site file (using Ambari Custom hdfs core-site)
hadoop.http.authentication.simple.anonymous.allowed false hadoop.http.authentication.signature.secret.file /etc/security/your_secret_data_file hadoop.http.authentication.type kerberos hadoop.http.authentication.kerberos.keytab /etc/security/keytabs/spnego.service.keytab hadoop.http.authentication.kerberos.principal HTTP/_HOST@YOUR_REALM hadoop.http.filter.initializers org.apache.hadoop.security.AuthenticationFilterInitializer hadoop.http.authentication.cookie.domain YOUR_REALM
YOUR_REALM e.g. MYSAMPLE.COM
4. Change Yarn configurations in either of the following options.
The following entry gives user knox access to YARN REST/UI API when Hadoop web interface is secure. ( this versus dr.who – Hadoop static default user when not authenticated).
To make the following changes you need to modify the Yarn Resource Manager settings.
Another alternative is to use (done in the Ambari Yarn custom yarn-site settings).:
The second alternative relies on the RMAuthenticationFilter class which extends the Hadoop security class DelegationTokenAuthenticationFilter. This accomplishes a couple of things: First it allows the Yarn Console to use Delegation Token authentication (vs Kerberos SPNEGO), and second it permits the "user" name doing the request to show on the "Logged in as" and not the proxy super user "knox". Using RMAuthenticationFilter maybe a more secure option since yarn.admin.acl may grant super user knox or group more access than what it is desired.
5. Restart from Ambari admin server all servers involved: HDFS, MapReduce, Yarn, KNOX.
One thing to notice is that if the Hadoop console uses Kerberos authentication (not Delegation token) thus it shows as user "knox" which is the proxy "super user" since to Hadoop web code it shows the user making the request (request remote user), which in the case of Knox is the proxy user "knox".
An alternative for the case of Yarn is to use a Delegation Token filter which means we are not using SPNEGO protocol authentication but Delegation token Authentication in a Kerberos channel (It is assumed that we are running on a Kerberized installation). In this alternative case the requesting user is set the request wrapper of the DelegationTokenAuthenticationFilter (parent class of RMAuthenticationFilter) which uses the logged user short name (the user shortname of the Kerberos principal).
It would be argued that the current solution of using "dfs.cluster.administrators: hdfs,knox,hadoop" is not as secure since this setup ACL controls who can access the default servlets in HDFS UI. In addition, since user "knox" being the proxy super user would have access to the web resources like "/logs" any user authenticated through Knox would have access to these logs. We should explore with the HDFS community whether it makes sense to provide such a setting to provide Delegation Token on the Namenodes web UI and/or to give finer access authorization to the logs from the HDFS UI.
We re-opened a Hadoop Jira 13119, that is related to our solution above on using "dfs.cluster.administrators" to gain access to "/logs" servlet. This web path is not secure since it is not included on the secure section of the HDFS web descriptor thus is doesn't take advantage of the web authentication on HDFS web application but it relies on dfs.cluster.administrators membership. This would seem an inconsistency on HDFS web application that we would need to reconcile through the open source community.
Finally, another option my team discovered is that both the proxyed user name as well as the access to "/logs" issues can be fixed by using the org.apache.hadoop.security.token.delegation.web.DelegationAuthenticationHandler class which is also an AuthenticationHandler. This class implements Kerberos and supports Delegation Token authentication, thus in section 3. we could set up the "hadoop.http.filter.initializers" to the DelegationTokenAuthenticationFilter which when Kerberos is enable uses the DelegationAuthenticationHandler. In this case the authentication with Kerberos is through Delegation Token using RPC sasl rather than the SPNEGO protocol.
hadoop.http.authentication.simple.anonymous.allowed false hadoop.http.authentication.signature.secret.file /etc/security/your_secret_data_file hadoop.http.authentication.type kerberos hadoop.http.authentication.kerberos.keytab /etc/security/keytabs/spnego.service.keytab hadoop.http.authentication.kerberos.principal HTTP/_HOST@YOUR_REALM hadoop.http.filter.initializers org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter hadoop.http.authentication.cookie.domain YOUR_REALM