This article discusses about how one can Implement batches on the data sets which are sent to message hub topics. There are two ways in which data could be sent to Message Hub topics – a) record by record b) group of batches. This article is a further extension of the previous article which discusses a design pattern for Message Hub. The Implementation is further extended to simulate flexible batches based on user parameters. The datasets used in this article are in the order of 100,1000,10000,100000 and 5000,50000, 500000. The data sets could be located at following link CSV Data.
Batch Application Architecture
Batch Param which defines the batch size is sent as parameter to the application. Depending on the batch size data set is divided into equal sized batches. If data set is not perfectly divisible by batch size last remaining set is sent as a batch even if it is not equals batch size. The data batch is wrapped by message hub in JSON string. A simple algorithm is used to divide data set into batches and Ingest it into Message Hub topics. One can refer it in the attached code.
Configuring application to Interact with Message Hub For Bluemix
Configure server.xml of local liberty server for the following:
a) Shared library for message hub libraries.
b) JAAS login module with existing Message Hub credentials. The credentials of binded message hub credentials is taken care of at run-time.
c) deferServletLoad needs to be added make listener servlet run if one needs to configure scheduler in subscriber application which would run in liberty run-time. This is optional depending on availability of subscriber. If customer runs a separate subscriber some where else then this parameter is not needed.
Once we are ready with server configuration, we can go ahead and deploy this on bluemix as cloud foundry application.
Export the package server
Right click on the server in eclipse and refer the below screen shot for details
Goto the package server directory and run the below command after login into respective space in Bluemix.
After the server configuration is uploaded to bluemix you can see it as an application.
Now bind this application with Message Hub as shown below
We are now ready with application setup. Use the credentials of binded message hub service Instance to subscribe to the topics from application to receive the messages. To make the code run properly make sure that you have made below topics in message hub instance:
Simulation and Testing of Batches
To simulate the batches one can Invoke the REST URL either through scheduler as discussed in earlier article or from an application. For sake of simplicity a stand alone java application is been used. The batch size is sent as a URL parameter to REST call.
Case I : Line by Line
Case II : Batches
Below is standalone REST Java client to trigger the Ingestion Process.
a) load: This attribute specifies the overall load to Ingest. Based on this attribute file is selected and Ingested.
b) batchsize: This attribute specifies the chunks of batched to be Ingested.
For this article and sample code one can use the attributes as mentioned below:
Where sample corresponds to SampleTopic topic , loadrange=10,100,1000,10000,100000,5000,50000,500000 and batch size could range from 1 to any value less that data set size.
The data files are inside the application for this article.
However code can be modified to fetch the data from any external repository a well.
Configuring Subscriber Application
Existing MessageHubKafkaSASL application has been customized for this article to act as an standalone subscriber. This standalone application client can be used to test the functionality of application. To run the code in standalone mode one should hardcode apiKey as mentioned below
Since VCAP is not available on local system. Update credentials in jaas.conf and jaas.conf.template in resources folder.
Make sure that the values must be copied from VCAP variables of binded Message Hub service.
The Subscriber will get the results as follows
One can also check Sysouts in logs of respective liberty run-time on Bluemix.
By default Liberty on Bluemix runs with 512MB of Heap. When you simulate batches with large batchsizes,this may consume most part of heap leading to out of memory exceptions. It is best to increase GB memory per instance from existing 512GB to optimal one and scale Instances based on workload to achieve best results.
If one gets the exception “Failed to update metadata after 16 ms” where time is milisecond is variable and may differ from case to case.
Check for the following to fix the issue:
a) Topic should be created in message hub which one is referring to from the code.
b) Username and Password for message hub should be the same as that of message hub in specific deployment zone.
c) If testing the code locally check for – apiKey and message hub servers (EU or US) and credentials in local jaas.conf file. This otherwise is taken care of on Bluemix through VCAP_SERVICES.
Batch Ingestion Simulator Code: https://github.com/sharadc2001/MessageHubSimulation
Message Hub Standalone Subscriber: https://github.com/sharadc2001/MessageHubStandaloneSubs
Application Deployment on Liberty: https://youtu.be/exE1ECPgIDU
Message Hub Subscriber: https://youtu.be/PMEdBFlwDFk