Recently I have been asked by customers if there is a way for their applications running on mobile or desktop to upload a file to BigInsights clusters hosted on a secure cloud. Since secure clouds do not allow direct connections to HDFS ports. Instead connections from outside world applications has to be routed via Apache Knox gateway which has REST URL mappings defined for webhdfs .
Apache Knox Gateway provides authentication support and a single rest interface to access several Bigdata services namely HDFS ,AMBARI , HIVE . Hence Knox is a perfect choice for routing the external traffic to Bigdata clusters.
Customer applications hosted externally, can connect to the Knox Gateway using any Http client and interact with Hadoop using predefined operations and REST API Calls. More details about the Webhdfs is documented in Apache Hadoop documentation
https://hadoop.apache.org/docs/r1.0.4/webhdfs.html”
In this article I have made an attempt to show users how to build their own upload manager for uploading files to HDFS. The logic can be embedded in any desktop or mobile application allowing users to interact with their Bigdata cluster remotely .
File upload utility.
The utility uses Apache HttpClient library , release 4.5.3 and supports file sizes of upto 5 MB.
The project can be downloaded from my git repo https://github.com/bharathdcs/hadoop_fileuploader
The Application expects a properties file as input , the format and a sample is shown following
knoxHostPort=bluemixcluster.ibm.com:8443 knoxUsername=guest knoxPassword=guest-password hdfsFileUrl=/tmp/hdfsfile.txt dataFile=/Users/macadmin/Desktop/input.txt
Most of the parameters are self explanatory , the interface is made simple as against uploading the file manually to webhdfs using utilities like curl , as documented in the following support technote http://www-01.ibm.com/support/docview.wss?uid=swg21976974
The file is created with default permission of world writeable if a different permissions is desired pass the octal value using the following parameter in the properties file
hdfsFilePermission=440
The steps to run the application is as follows .
mvn exec:java -Dexec.mainClass="twc.webhdfs.App" -Dexec.args="/Users/macadmin/Desktop/input.properties"
Once finished you should see the following message confirming the file creation
File creation successfull