In this article we will walk through accessing data stored in a Hadoop encryption zone using the IBM Streams HDFS toolkit.
BigInsights Hadoop supports encryption at the HDFS layer. Once encryption is enabled, data will be written and read from encrypted zone transparently without the need to modify the application code. Data is encrypted and decrypted by HDFS client. The key used for encryption or decryption is stored in an external keystore, HDFS does not handle key management. For the example used in this article we are using Apache key management server (KMS).
The following versions of software was used for testing the scenario
- IBM Streams 4.1
- IBM BigInsights 4.0
The following steps prepares the Streams cluster for accessing the encryption zone in hadoop
1. Configure the HDFS toolkit in Streams using the information provided in the
Streams knowledge center
2. In addition to the instructions specified in step 1., the following two HDFS properties related to key management server configuration need to be added in core-site and hdfs-site files respectively.
a) set the property hadoop.security.key.provider.path to kms://email@example.com:16000/kms in the file core-site.xml
b) set the property dfs.encryption.key.provider.uri to kms://firstname.lastname@example.org:16000/kms in the file hdfs-site.xml
3. Make sure your Streams cluster can access the machine and port (16000) where KMS is installed and running .
For example, Linux provides nc (netcat) utility to check if a host and port is reachable as shown following.
The following command returns 0 if connection was successful else 1.
[root@bdavm321 ~]# nc bdavm327 16000 &> /dev/null; echo $? 0
4. Install unrestricted JCE policy files for Streams Java (/opt/ibm/InfoSphere_Streams/220.127.116.11/java). The JCE policy files can be downloaded from this link for IBM java. This step is required to decrypt files stored using 256 bit or higher bit encryption keys.
5. User running the Streams application should be granted access to encryption keys in KMS ACL.
a)The username can be identified by running whoami command in linux
b) Add an entry for the user in kms-site.xml specifically for GENERATE_EEK and DECRYPT_EEK properties.
kms-site.xml can be found on a KMS server node. In my test cluster, it’s present in the /usr/kms-demo/hadoop/etc/hadoop
folder. This will grant decrypt encryption key and generate encryption key privilege to the Streams user.
More details on kms-site.xml and significance of these properties can be found in Apache KMS documentation site.
<property> <name>key.acl.key1.GENERATE_EEK</name> <value>streamsadmin hdfs</value> <description> default ACL for GENERATE_EEK operations for all key acls that are not explicitly defined. </description> </property> <property> <name>key.acl.key1.DECRYPT_EEK</name> <value>streamsadmin hdfs</value> <description> default ACL for DECRYPT_EEK operations for all key acls that are not explicitly defined. </description> </property>
Once all the preparation steps are executed properly the HDFS2 sample application can be used to test the integration.
In this article I used HDFS2FileSinkSampleLineFormat SPL from HDFS toolkit samples folder to test the scenario. Once the job finished you can list the contents of encryption zone to view the files being created.
[streamsadmin@streamsqse bin]$ ./hadoop fs -ls hdfs://bdavm327.svl.ibm.com:8020/securelogs Found 7 items -rw-r--r-- 3 streamsadmin hdfs 96 2016-01-24 20:21 hdfs://bdavm327.svl.ibm.com:8020/securelogs/pattern00.txt -rw-r--r-- 3 streamsadmin hdfs 80 2016-01-24 20:21 hdfs://bdavm327.svl.ibm.com:8020/securelogs/pattern01.txt -rw-r--r-- 3 streamsadmin hdfs 96 2016-01-24 20:21 hdfs://bdavm327.svl.ibm.com:8020/securelogs/pattern02.txt -rw-r--r-- 3 streamsadmin hdfs 64 2016-01-24 20:21 hdfs://bdavm327.svl.ibm.com:8020/securelogs/pattern03.txt -rw-r--r-- 3 streamsadmin hdfs 64 2016-01-24 20:21 hdfs://bdavm327.svl.ibm.com:8020/securelogs/pattern04.txt -rw-r--r-- 3 streamsadmin hdfs 64 2016-01-24 20:21 hdfs://bdavm327.svl.ibm.com:8020/securelogs/pattern05.txt -rw-r--r-- 3 streamsadmin hdfs 32 2016-01-24 20:21 hdfs://bdavm327.svl.ibm.com:8020/securelogs/pattern06.txt
Given following are some of the exceptions you may encounter and the likely causes for them
|java.security.InvalidKeyException: Illegal key size||Unlimited JCE is not applied for Streams java.|
|java.io.IOException: No KeyProvider is configured, cannot access an encrypted file||KMS path and uri property is not configured in biginsights client|
|User [streamsadmin] is not authorized to perform [DECRYPT_EEK] on key with ACL name [key1]!!||streamsuser is not added to KMS ACLs|