TY - GEN
T1 - Another outlier bites the dust
T2 - 25th IEEE International Conference on Data Engineering, ICDE 2009
AU - Deligiannakis, Antonios
AU - Kotidis, Yannis
AU - Vassalos, Vasilis
AU - Stoumpos, Vassilis
AU - Delis, Alex
PY - 2009
Y1 - 2009
N2 - Recent work has demonstrated that readings provided by commodity sensor nodes are often of poor quality. In order to provide a valuable sensory infrastructure for monitoring applications, we first need to devise techniques that can withstand "dirty" and unreliable data during query processing. In this paper we present a novel aggregation framework that detects suspicious measurements by outlier nodes and refrains from incorporating such measurements in the computed aggregate values. We consider different definitions of an outlier node, based on the notion of a user-specified minimum support, and discuss techniques for properly routing messages in the network in order to reduce the bandwidth consumption and the energy drain during the query evaluation. In our experiments using real and synthetic traces we demonstrate that: (i) a straightforward evaluation of a user aggregate query leads to practically meaningless results due to the existence of outliers; (ii) our techniques can detect and eliminate spurious readings without any application specific knowledge of what constitutes normal behavior; (iii) the identification of outliers, when performed inside the network, significantly reduces bandwidth and energy drain compared to alternative methods that centrally collect and analyze all sensory data; and (iv) we can significantly reduce the cost of the aggregation process by utilizing simple statistics on outlier nodes and reorganizing accordingly the collection tree.
AB - Recent work has demonstrated that readings provided by commodity sensor nodes are often of poor quality. In order to provide a valuable sensory infrastructure for monitoring applications, we first need to devise techniques that can withstand "dirty" and unreliable data during query processing. In this paper we present a novel aggregation framework that detects suspicious measurements by outlier nodes and refrains from incorporating such measurements in the computed aggregate values. We consider different definitions of an outlier node, based on the notion of a user-specified minimum support, and discuss techniques for properly routing messages in the network in order to reduce the bandwidth consumption and the energy drain during the query evaluation. In our experiments using real and synthetic traces we demonstrate that: (i) a straightforward evaluation of a user aggregate query leads to practically meaningless results due to the existence of outliers; (ii) our techniques can detect and eliminate spurious readings without any application specific knowledge of what constitutes normal behavior; (iii) the identification of outliers, when performed inside the network, significantly reduces bandwidth and energy drain compared to alternative methods that centrally collect and analyze all sensory data; and (iv) we can significantly reduce the cost of the aggregation process by utilizing simple statistics on outlier nodes and reorganizing accordingly the collection tree.
UR - http://www.scopus.com/inward/record.url?scp=67649637306&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=67649637306&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2009.100
DO - 10.1109/ICDE.2009.100
M3 - Conference contribution
AN - SCOPUS:67649637306
SN - 9780769535456
T3 - Proceedings - International Conference on Data Engineering
SP - 988
EP - 999
BT - Proceedings - 25th IEEE International Conference on Data Engineering, ICDE 2009
Y2 - 29 March 2009 through 2 April 2009
ER -