IBM Support

Hadoop file upload utility for secure BigInsights clusters running on cloud using webhdfs and Knox Gateway. - Hadoop Dev

Technical Blog Post


Abstract

Hadoop file upload utility for secure BigInsights clusters running on cloud using webhdfs and Knox Gateway. - Hadoop Dev

Body

Recently I have been asked by customers if there is a way for their applications running on mobile or desktop to upload a file to BigInsights clusters hosted on a secure cloud. Since secure clouds do not allow direct connections to HDFS ports. Instead connections from outside world applications has to be routed via Apache Knox gateway which has REST URL mappings defined for webhdfs .

Apache Knox Gateway provides authentication support and a single rest interface to access several Bigdata services namely HDFS ,AMBARI , HIVE . Hence Knox is a perfect choice for routing the external traffic to Bigdata clusters.

Customer applications hosted externally, can connect to the Knox Gateway using any Http client and interact with Hadoop using predefined operations and REST API Calls. More details about the Webhdfs is documented in Apache Hadoop documentation
https://hadoop.apache.org/docs/r1.0.4/webhdfs.html”

In this article I have made an attempt to show users how to build their own upload manager for uploading files to HDFS. The logic can be embedded in any desktop or mobile application allowing users to interact with their Bigdata cluster remotely .

File upload utility.

The utility uses Apache HttpClient library , release 4.5.3 and supports file sizes of upto 5 MB.

The project can be downloaded from my git repo https://github.com/bharathdcs/hadoop_fileuploader

The Application expects a properties file as input , the format and a sample is shown following

  knoxHostPort=bluemixcluster.ibm.com:8443  knoxUsername=guest  knoxPassword=guest-password  hdfsFileUrl=/tmp/hdfsfile.txt  dataFile=/Users/macadmin/Desktop/input.txt  

Most of the parameters are self explanatory , the interface is made simple as against uploading the file manually to webhdfs using utilities like curl , as documented in the following support technote http://www-01.ibm.com/support/docview.wss?uid=swg21976974

The file is created with default permission of world writeable if a different permissions is desired pass the octal value using the following parameter in the properties file

  hdfsFilePermission=440  

The steps to run the application is as follows .

  mvn exec:java -Dexec.mainClass="twc.webhdfs.App" -Dexec.args="/Users/macadmin/Desktop/input.properties"  

Once finished you should see the following message confirming the file creation

  File creation successfull  

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm16260017