In Streams v4.0, we have introduced a new feature called Consistent Region in the product.  This feature enables applications to guarantee processing of all tuples.

To achieve guaranteed tuple processing, Streams periodically establishes a checkpoint for all the operators in a consistent region.  A checkpoint is established, when all tuples in the system have been fully processed and when operator states are stored in the checkpoint backend.  Streams supports two types of checkpoint backend:  Redis or File System.

Upon an application failure, all operators will be rolled back to the last successful checkpoint states.  Tuples since the last successful checkpoints are replayed by the source operators.

For more details about using and understanding guaranteed processing with Streams, start here.

The checkpoint process requires some additional system resource utilization. That utilization will vary by application, but is generally correlated to:

  1. Frequency of checkpoint, and
  2. Checkpoint size

This article discusses how the checkpoint process affects applications performance.

Effect of Checkpoint Frequency

The graph below shows how checkpoint frequency impacts performance.  At the baseline, the application processes 440000 tuples per second.  When the application checkpoints every 8 seconds, the throughput is dropped to 395000 tuples per second, a 8.35 % decrease.  If we decrease the checkpoint frequency to every 32 seconds, the throughput improves to 430000 tuples per second, only 1.76% degradation from the baseline.

As we have shown the the graph, as the data is checkpointed more frequently, the impact on performance is more pronounced.  Note that less frequent checkpointing will result in longer recovery time in the event of failure, due to the time required to recover from larger checkpoint, and also having to replay more tuples since the last successful checkpoint.

EffectsOfCheckpointFrequency

Effect of Checkpoint Size

The graph below demonstrates the effect of checkpoint size on performance.  Checkpoint size is determined by the operators in use and what internal states the operators have to save into the checkpoint.  Therefore, the higher number of operators in the application, the larger the checkpoint size.

Once again, at the application baseline when consistent region is not enabled, the application processes 440000 tuples per second.  If the checkpoint size is at 32 MB, the application tuple rate is dropped to about 427500 tuples per second, a 2.67% degradation.  As more data is stored into the checkpoint, the more pronounce the performance impact is.  If the checkpoint size is at 128 MB, the application throughput is dropped to about 400000 tuples per second, about 8.03% performance degradation.

EffectsOfCheckpointSize 

Conclusion

Consistent regions provide an effective mechanism for recovery from application failures.  Instead of keeping track of every single tuple or record that goes through the system, we take a less invasive approach.  The checkpoint process has minimal performance impact, even with fairly frequent checkpoints and large amounts of data.

 

4 comments on"Performance of Guaranteed Processing"

  1. Dakshi.Agrawal September 19, 2015

    Informative article – even if performance impact will vary depending on application characteristics, it is great to have a baseline provided by experts.

    Would it be possible to describe the application used for these experiments? Thanks!!

  2. The application had the following topology:
    Beacon -> Custom (2 output ports) (Same PE as Beacon) -> Functor -> Functor (31 times) (1 PE at every 8 chained operators) -> Custom (2 input ports) -> FileSink
    The @consistent annotations were on the Beacon operator.

  3. hello Henry,
    What storage have you used for the checkpoint Filesystem or Redis?
    is there any difference in performance between this two?
    thanks

Join The Discussion