Abstract
Stream processing is an increasingly popular model for online data processing that can be partitioned into streams of elements. It is commonly used in data analytics services, such as processing Twitter tweets. Current stream processing frameworks boast high throughput and low average latency. However, lower tail latencies and better real-time performance are desirable to stream processing users. In practice, there are issues that can affect the performance of these applications and cause unacceptable violations of real-time constraints. Some examples of these issues are garbage collection pauses and resource contention. In this paper, we propose applying redundancy in the data processing pipeline to increase the resiliency of stream processing applications to timing errors. This results in better real-time performance and a reduction in tail latency. We present a methodology and apply this redundancy in a framework based on Twitter's Heron. Then, we then evaluate the effectiveness of this technique against a range of injected timing errors using benchmarks from Intel's Storm Benchmark. Furthermore, we also study the potential effects of duplication when applied at different stages in the topology. Finally, we evaluate the additional overhead that duplicating tuples brings to a stream processing topology. Our results show that redundant tuple processing can effectively reduce the tail latency by up to 63% and that the number of missed deadlines can also be reduced by up to 94% in the best case. Overall we conclude that redundancy through duplicated tuples is indeed a powerful tool for increasing the resiliency to intermittent runtime timing errors.

