Kafka Monthly Digest – April 2020
See what's going on in the Kafka community this month
In this 27th edition of the Kafka Monthly Digest, I’ll cover what happened in the Apache Kafka community in April 2020.
For last month’s digest, see Kafka Monthly Digest: March 2020.
2.5.0: After 4 Release Candidates, David Arthur released Apache Kafka 2.5.0 on April 15. This new minor version brings a number of interesting features. A blog post was published on the Apache blog, and as always the full release notes are available on apache.org. The release plan is on the wiki.
- Progress on the removal of Zookeeper: This release includes some of the items required for KIP-500 (KIP-543, KIP-555)
- Support for TLSv1.3. In addition, TLSv1 and TLSv1.1, now considered unsecured, are disabled and only TLSv1.2 is enabled by default (KIP-553)
- Upgraded Zookeeper to 3.5.7. Kafka can also be configured to communicate over TLS with Zookeeper (KIP-515)
- Scala 2.11 is no longer supported. Only Scala 2.12 and 2.13 (support added in Kafka 2.4.0) are now supported (KIP-531)
- Kafka clients can now advertise their name and version. This allows administrators to monitor clients used by their users (KIP-511)
- The Kafka Protocol is friendlier with L7 Proxies. The protocol is now self-explanatory and can easily be decoded by Layer 7 proxies (KIP-559)
- Improve reliability of idempotent/transactional producer (KIP-360)
- Improve metadata lookups in producer (KIP-526)
- Add Reset/List offsets operations to AdminClient (KIP-396)
- Added replica assignment operations to AdminClient (KIP-455
- Use robin-round for assigning partitions in MirrorMaker 2. This can significantly improve the load balancing of Mirroring tasks especially with topics with few partitions (KAFKA-9352)
- Co-groups for Kafka Streams. This simplifies aggregating multiple streams together with the DSL (KIP-150)
toTable()to the DSL. This allows translating a stream of events into a
- Allow state stores to serve stale reads during rebalance. Instead of failing Interactive Queries during a rebalance, this allows you to favor availability over consistency (KIP-535)
Last month, the community submitted 17 KIPs (KIP-586 to KIP-603 and 600 was skipped), and these are the ones that caught my eye.
These 3 KIPs are part of the ongoing effort to replace Zookeeper by a Self-Managed Metadata Quorum.
KIP-589 Add API to update Replica state in Controller. At the moment, when a log directory fails, brokers notify the controller via a write to Zookeeper. To transition away from Zookeeper, the community decided to only allow the Controller to write to Zookeeper. Hence, this KIP introduces a new RPC,
ReplicaStateEvent, that will enable brokers to report failed log directories without using Zookeeper.
KIP-590: Redirect Zookeeper Mutation Protocols to The Controller. Like the previous KIP, this is addressing some of the use cases where brokers are writing to Zookeeper. This KIP proposes introducing a new RPC,
Envelope, that brokers will use to forward requests mutating Zookeeper to the controller.
KIP-595: A Raft Protocol for the Metadata Quorum. Currently, Kafka relies on Zookeeper consensus protocol to enforce consistent replication semantics. This KIP introduces a new consensus protocol, heavily based on Raft, that Kafka will use to keep its current semantics and guarantees without using Zookeeper. It covers with details the proposed protocol and new RPCs and it is a great read if you’re interested in Distributed Systems.
KIP-598: Augment TopologyDescription with store and source / sink serde information. When building complex Kafka Streams topologies, it can be hard to visualize the exact flow of data through all the processors. Fortunately, Streams can generate a
TopologyDescriptionthat is human readable. This KIP proposes to enhance
TopologyDescriptionto include the
SerDesused for keys and values by all Sink, Source, and Store operators.
KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations. Administrative operations such a topic/partition creation and topic deletion are handled by the cluster controller. When done in large volume, these operations can noticeably impact the controller performance. For example, creating or deleting 5000 topics can take several minutes to complete during which the controller cannot handle any other events. This KIP proposes adding a throttling mechanism similar to the existing Quotas to allow administrators to control the rate of topic/partition creations and topic deletions.
In this section, I will cover releases of some community projects. This only includes projects that are Open Source.
librdkafka 1.4.0. Librdkafka is a third party client in C/C++. It’s one of the most advanced and performant library after the official Java client and many third party client are based on it. In addition to a number of fixes and small improvements, this new version introduces the Transactional Producer API and adds support for static group membership and client name/version reporting. At the same time, confluent-kafka-python, confluent-kafka-dotnet, and confluent-kafka-go were updated to use this new release of librdkafka.
node-rdkafka 2.8.0. This new version of node-rdkafka is now based on librdkafka 1.3.0. It also added support for Node.js v13 and includes a major update of its TypeScript definitions.
- Monitoring Kafka performance metrics
- Apache Kafka Example: How Rollbar Removed Technical Debt – Part 2
- Using Debezium With the Apicurio API and Schema Registry
Get started with Kafka
IBM Event Streams for Cloud is Apache Kafka-as-a-Service for IBM Cloud. Get started with IBM Event Streams today.