Big Data environments are characterized by a multiplicity of technologies, distributed data repositories, and parallel computation systems with different deployment models. With all that complexity, organizations must maintain data privacy and prevent unauthorized access at all levels. Organizations also need to provide a unified security mechanism that allows single sign-on, ensuring that all services connected to the data cluster go through the authentication process to be permitted to access the data.
This article describes the series of steps required to set up an IBM Big Data environment using Kerberos for host validation and authentication of client applications. The environment settings were based on the requirements of an actual IBM customer, as described in the next section of this article.
Following are the list of the system requirements for this tutorial:
- The system must manage a large number of documents and the metadata for those documents. The documents are classified into a variety of topics and categories.
- The system should handle many different document types (such as html, PDF, spreadsheets etc.) that are originated by many systems.
- The system should provide a federated search that considers the documents as well as the relevant topics that are associated with them.
- The document categories are mapped to different authorization groups. Users belonging to those groups will have access to the corresponding documents.
- The metadata is added to throughout the documentâ€™s life cycle.
The solution documented in this article demonstrates the ability to apply a single sign-on mechanism in a subset of the proposed environment while using a Kerberos ticket to authenticate hosts, users and add-on services to the BigInsights Hadoop cluster. Check out the table of contents below and download the full article using the link at bottom.
Table of Contents
Topology solution and hosts
Setting up users and groups in open ldap:
Step 1: Setting up the Linux machines:
1. Host Name setup :
Host name requirements:
Passwordless ssh for root user
3. Install ldap client (on each Linux node)
4. Install DB2 prerequisites (on each Linux node)
5. Install Kerberos V5 client libraries on each of the Linux machines (4 total)
6. Install various prerequisites
7. Disable IPV6 on all nodes
8. Disable firewall
9. Disable Selinux
10. Create disks for data store
11. Configure Sudo permissions for admin user:
12. Configure limits.conf on each BI node:
13. Configure /etc/ssh/sshd_config on each BI node
14. Configure pam_ ldap module
15. Configure SSHD at /etc/pam.d/sshd
16. Configure System auth at /etc/pam.d/system-auth
17. Configure ladp configuiration at /etc/openldap/ldap.conf
18. Configure name service daemon at /etc/nslcd.conf
19. Configure name service switch at /etc/nsswitch.conf
20. Configure pam_ldap.conf at /etc/pam_ldap.conf
21. Copy certs from openLDAP server to all of the BigInsights nodes
22. Start local name service daemon (nslcd)
Step 2: Setting up IBM JDK and JCE:
Download and Install IBM JDK and JCE on Linux servers:
Step 3: Open LDAP time synchronization
Step 4: Configuring Kerberos client on all BigInsights nodes
1. /etc/krb5.conf on each of your Linux machines (4 total)
2. Add Kerberos service definitions to each /etc/services (all Linux machines)
Step 5: Creating and deploying host keytabs
1. Create the host keytabs
2. Configure sssd (security deamon) file on each node
3. Caching enablement
4. Deploy initialize and test the host keytabs
Step 6: Create the service Keytabs:
Step 7: Initialize the service keytabs
Step 8: Create the cluster hosts file for the BigInsights installer
Step 9: Run BigInsights installer prechecker
Step 10: BigInsights installation
Prefix 1: Complete users LDIF file
Prefix 2: Complete groups LDIF file
Prefix 3: Complete hosts LDIF file
Download the complete article here
Additional article on how to set up kerberos on Microsoft Active Directory
Configuring kerberos for hadoop with Active Directory
IBM Kerberos Automation Toolkit for Hadoop
An automation toolkit is available for download to ease setting up a secure environment. Download the latest version of the automation toolkit here:
download IBM Kerberos Automation Toolkit for hadoop