Performance Testing Strategies for BIG DATA Applications – B – 101



Big Data is one of the areas where all IT organizations are expanding and going deep dive in it to get the technologies to handle the large amount of data with speed and accuracy. As objective is not only to process the voluminous data but maintain the speed and security as well. If speed comes into the picture, then “performance testing” is the thing that we need to focus on along with the data processing.

Problem Statement:

Performance testing of Big Data is challenging as it is composed of different technologies (Hadoop,NoSQL, MapReduce). So single tool is not enough to test all the components. As Big Data basically deals with large amount of data which means large amount of test data is required. The absence of robust test data management strategies and a lack of performance testing tools within many IT organizations make big data testing one of the most perplexing technical propositions that business encounters. Also, Replicate the PROD environment is sometimes difficult and require more cost.


Big data is defined as a collection of very large amount of data which can be structured, Unstructured or Semi structured and it cannot be processed using traditional computing techniques. So, testing of these kinds of datasets require new technologies, tools and techniques.

Big data can be explained with the help of 4 V’s – Velocity, Variety and Volume, Veracity which is the speed, kinds and amount and accuracy of data being fetched or uploaded and to make big data testing strategy effective all the components should be tested and monitored properly. This paper will elaborate the approaches to test the 4 above dimensions with different tools like YCSB (Yahoo Cloud Servicing Benchmark), Loadrunner (with AMPQ and STOMP benchmarks), JMeter, Hadoop Benchmarks etc.


Big data testing can be done effectively if all the V’s of big data are tested. There are lot of testing techniques which can be applied to obtain results for response time, maximum user data capacity size, GUI and customer requirements for data. Since big data is a collection of structured, semi-structured and unstructured data so the testing solution needs to be selected based on the complexity of data. Performance testing for big data comes with many challenges such as diversity of data, variety of technologies used, volume of data. The traditional performance benchmarking methods are not enough for the NoSQL databases due to the changes in fault tolerance and error recovery methods,load distribution, and many more factors. For Big Data, enterprises need to test all the performance critical areas like data ingestion, throughput, etc. One important focus area is to test the performance of underlying NoSQL database for scalability and reliability.

In this paper, we will describe all the challenges we can face during Big Data performance testing. Also, we will investigate the existing tools and solutions to do performance testing on Big Data.

BIO – Himika Gupta: Working in IBM India Pvt Ltd as Senior Technical Performance Test Specialist. Having 6 years of experience in performance testing across Telecom, Financial, Insurance and Retail industries.

-Worked as Risk analyst to assess applications for performance testing. -Worked on end-to-end application on several protocols (SAP/SIEBEL/HTTP/HTML) using Load runner tool. Also, worked on JMeter and Neoload for performance testing. -Monitored and analyzed the applications using AppDynamics, Grafana and New Relic tool. -Developed Script Enhancer Tool on JAVA to speed up the scripting activity in performance testing.

Speaker Bio

BIO – Simran Solanki: A performance test specialist in IBM India Pvt Ltd. having 6 years of experience in Performance Testing as well as in Performance Engineering. Good exposure to End to End performance test cycle including non-functional requirement gathering, Test Planning, Estimation, Script development, Test scenario creation, Test execution and Summary Report Preparation. Good experience in performance monitoring and have successfully identified performance bottlenecks.

Worked on Loadrunner(Protocols HTTP/HTTPS, SAP Web, AJAX Truclient) , Jmeter tool. Certified in Loadrunner. Developed JAVA tools to speed up the scripting task during the performance test executions.