In Streams v4.0, we have introduced a new feature called Consistent Region in the product. This feature enables applications to guarantee processing of all tuples.
To achieve guaranteed tuple processing, Streams periodically establishes a checkpoint for all the operators in a consistent region. A checkpoint is established, when all tuples in the system have been fully processed and when operator states are stored in the checkpoint backend. Streams supports two types of checkpoint backend: Redis or File System.
Upon an application failure, all operators will be rolled back to the last successful checkpoint states. Tuples since the last successful checkpoints are replayed by the source operators.
For more details about using and understanding guaranteed processing with Streams, start here.
The checkpoint process requires some additional system resource utilization. That utilization will vary by application, but is generally correlated to:
- Frequency of checkpoint, and
- Checkpoint size
This article discusses how the checkpoint process affects applications performance.
Effect of Checkpoint Frequency
The graph below shows how checkpoint frequency impacts performance. At the baseline, the application processes 440000 tuples per second. When the application checkpoints every 8 seconds, the throughput is dropped to 395000 tuples per second, a 8.35 % decrease. If we decrease the checkpoint frequency to every 32 seconds, the throughput improves to 430000 tuples per second, only 1.76% degradation from the baseline.
As we have shown the the graph, as the data is checkpointed more frequently, the impact on performance is more pronounced. Note that less frequent checkpointing will result in longer recovery time in the event of failure, due to the time required to recover from larger checkpoint, and also having to replay more tuples since the last successful checkpoint.
Effect of Checkpoint Size
The graph below demonstrates the effect of checkpoint size on performance. Checkpoint size is determined by the operators in use and what internal states the operators have to save into the checkpoint. Therefore, the higher number of operators in the application, the larger the checkpoint size.
Once again, at the application baseline when consistent region is not enabled, the application processes 440000 tuples per second. If the checkpoint size is at 32 MB, the application tuple rate is dropped to about 427500 tuples per second, a 2.67% degradation. As more data is stored into the checkpoint, the more pronounce the performance impact is. If the checkpoint size is at 128 MB, the application throughput is dropped to about 400000 tuples per second, about 8.03% performance degradation.
Consistent regions provide an effective mechanism for recovery from application failures. Instead of keeping track of every single tuple or record that goes through the system, we take a less invasive approach. The checkpoint process has minimal performance impact, even with fairly frequent checkpoints and large amounts of data.