TY - GEN
T1 - Sequential change detection on data streams
AU - Muthukrishnan, S.
AU - Van Den Berg, Eric
AU - Wu, Yihua
PY - 2007
Y1 - 2007
N2 - Model-based declarative queries are becoming an attractive paradigm for interacting with many data stream applications. This has led to the development of techniques to accurately answer the queries using distributional models rather than raw values. The quintessential problem with this is that of detecting when there is a change in the input stream, which makes models stale and inaccurate. We adopt the sound statistical method of sequential hypothesis testing to study this problem on streams, without independence assumption. It yields algorithms that are fast, space-efficient, and oblivious to data's underlying distributions. Our experiments demonstrate the effectiveness of our methods to not only determine the existence of a change, but also the point where the change is initiated, relative to the ground truth we obtain. Our methods work seamlessly without window limitations inherent in prior work, thus have clearly shorter delays compared to alternative window-based solutions.
AB - Model-based declarative queries are becoming an attractive paradigm for interacting with many data stream applications. This has led to the development of techniques to accurately answer the queries using distributional models rather than raw values. The quintessential problem with this is that of detecting when there is a change in the input stream, which makes models stale and inaccurate. We adopt the sound statistical method of sequential hypothesis testing to study this problem on streams, without independence assumption. It yields algorithms that are fast, space-efficient, and oblivious to data's underlying distributions. Our experiments demonstrate the effectiveness of our methods to not only determine the existence of a change, but also the point where the change is initiated, relative to the ground truth we obtain. Our methods work seamlessly without window limitations inherent in prior work, thus have clearly shorter delays compared to alternative window-based solutions.
UR - http://www.scopus.com/inward/record.url?scp=49549123783&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=49549123783&partnerID=8YFLogxK
U2 - 10.1109/ICDMW.2007.89
DO - 10.1109/ICDMW.2007.89
M3 - Conference contribution
AN - SCOPUS:49549123783
SN - 0769530192
SN - 9780769530192
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 551
EP - 556
BT - ICDM Workshops 2007 - Proceedings of the 17th IEEE International Conference on Data Mining Workshops
T2 - 17th IEEE International Conference on Data Mining Workshops, ICDM Workshops 2007
Y2 - 28 October 2007 through 31 October 2007
ER -