International Journal of Networked and Distributed Computing

Volume 1, Issue 4, November 2013, Pages 226 - 238

Fault Tolerant Distributed Stream Processing based on Backtracking

Authors
Qiming Chen, Meichun Hsu, Castellanos Malu
Corresponding Author
Qiming Chen
Available Online 1 November 2013.
DOI
https://doi.org/10.2991/ijndc.2013.1.4.4How to use a DOI?
Keywords
stream processing, fault tolerance, checkpointing, failure recovery
Abstract
Since distributed stream analytics is treated as a kind of cloud service, there exists a pressing need for its reliability and fault-tolerance, to guarantee the streaming data tuples to be processed in the order of their generation in every dataflow path, with each tuple processed once and only once. Currently there exist two kind approaches: one treats the whole process as a single transaction, and therefore suffers from the loss of intermediate results during failures; the other relies on the receipt of acknowledgement (ACK) to decide whether moving forward to emit the next resulting tuple or resending the current one after timeout, on the per-tuple basis, thus incurs extremely high latency penalty. In contradistinction to the above, we propose the backtrack mechanism for failure recovery, which allows a task to process tuples continuously without waiting for ACKs and without resending tuples in the failure-free case, but to request (ASK) the source tasks to resend the missing tuples only when it is restored from a failure which is a rare case thus has limited impact on the overall performance. The specific hard problem for building a transaction layer on-top of an existing stream processing platform consists in how to keep track the physical input/output messaging channels in order to realize re-messaging during failure recovery. Our solution is characterized by tracking physical messaging channels logically, for that we introduce the notions of virtual channel, task alias and messageId-set in reasoning, recording and communicating the channel information. We also provide a designated messaging channel, separated from the regular dataflow channel, for signaling ACK/ASK messages and for resending tuples, in order to avoid interrupting the regular order of data transfer. We have implemented the proposed mechanisms on Fontainebleau, the distributed stream analytics infrastructure we developed on top of Storm. As a principle, we ensure all the transactional properties to be system supported and transparent to users. Our experience shows the novelty and efficiency of the proposed mechanisms.
Open Access
This is an open access article distributed under the CC BY-NC license.

Download article (PDF)

Journal
International Journal of Networked and Distributed Computing
Volume-Issue
1 - 4
Pages
226 - 238
Publication Date
2013/11
ISSN (Online)
2211-7946
ISSN (Print)
2211-7938
DOI
https://doi.org/10.2991/ijndc.2013.1.4.4How to use a DOI?
Open Access
This is an open access article distributed under the CC BY-NC license.

Cite this article

TY  - JOUR
AU  - Qiming Chen
AU  - Meichun Hsu
AU  - Castellanos Malu
PY  - 2013
DA  - 2013/11
TI  - Fault Tolerant Distributed Stream Processing based on Backtracking
JO  - International Journal of Networked and Distributed Computing
SP  - 226
EP  - 238
VL  - 1
IS  - 4
SN  - 2211-7946
UR  - https://doi.org/10.2991/ijndc.2013.1.4.4
DO  - https://doi.org/10.2991/ijndc.2013.1.4.4
ID  - Chen2013
ER  -