by Bo Meng | Published October 22, 2018
With the exploding interest and development of artificial intelligence (AI) and machine Learning, this year’s Spark Summit emphasized machine learning experiences and many of its developing techniques. Because of my exploration of deep learning with TensorFlow, I am starting a series of articles to summarize the lessons, tricks, and tips that I learned. This first article covers Apache Kafka.
Apache Kafka is a distributed streaming platform that generally can be used for two broad classes of applications:
To try out Apache Kafka with other downstream frameworks, such as Apache Spark or TensorFlow, you usually have to set up a cluster using a few (virtual) machines or a single node in local mode. Instead of installing those components to run Apache Kafka, this articles explains how to get an embedded Apache Kafka cluster running on your local machine so that you can focus more on developing the downstream applications.
The Apache Kafka cluster usually includes a few components:
Zookeeper is the service that stores key-value to maintain a server state. Kafka relies on Zookeeper to run, so the first task is to start a Zookeeper instance.
By using another Apache project, Apache Curator, you can start a TestingServer provided by Curator. From its Javadoc, you should notice that TestingServer is FOR TESTING PURPOSES ONLY, but it is sufficient for your use. By creating an instance of TestingServer, you can easily make Zookeeper run in embedded mode and get connection information from that server. Because certain versions of Apache Curator only work with certain versions of Zookeeper, you must ensure that you are using the proper version of Apache Curator. Take a look at the version compatibility before you begin.
Before starting the Kafka server, you must set up a minimum number of configuration properties to run it, including the host name and port.
Kafka Test provides a very useful TestUtils class that you use to create a Kafka server. Note that usually you won’t be able to find Kafka Test in your release package, but for testing purposes, you can add it by turning on the <classifier>test</classifier> in your pom.xml file.
With the Kafka server running on your local machine in the embedded mode, you can start writing the code to create topics and put some data into that. Later on, you can also stop the server and do some cleanup.
Developers usually must set a cluster to test out Apache Kafka and program the downstream applications. But, by using some other open source projects and test utilities, you can avoid downloading and installing those components, speeding up the development process and letting you focus more on the applications. For more information and code, take a look at my GitHub repo.
March 22, 2019
April 12, 2019
Hybrid CloudIBM Cloud+
Back to top