TY - GEN

T1 - Frugal streaming for estimating quantiles

AU - Ma, Qiang

AU - Muthukrishnan, S.

AU - Sandler, Mark

PY - 2013

Y1 - 2013

N2 - Modern applications require processing streams of data for estimating statistical quantities such as quantiles with small amount of memory. In many such applications, in fact, one needs to compute such statistical quantities for each of a large number of groups (e.g.,network traffic grouped by source IP address), which additionally restricts the amount of memory available for the stream for any particular group. We address this challenge and introduce frugal streaming, that is algorithms that work with tiny - typically, sub-streaming - amount of memory per group. We design a frugal algorithm that uses only one unit of memory per group to compute a quantile for each group. For stochastic streams where data items are drawn from a distribution independently, we analyze and show that the algorithm finds an approximation to the quantile rapidly and remains stably close to it. We also propose an extension of this algorithm that uses two units of memory per group. We show experiments with real world data from HTTP trace and Twitter that our frugal algorithms are comparable to existing streaming algorithms for estimating any quantile, but these existing algorithms use far more space per group and are unrealistic in frugal applications; further, the two memory frugal algorithm converges significantly faster than the one memory algorithm.

AB - Modern applications require processing streams of data for estimating statistical quantities such as quantiles with small amount of memory. In many such applications, in fact, one needs to compute such statistical quantities for each of a large number of groups (e.g.,network traffic grouped by source IP address), which additionally restricts the amount of memory available for the stream for any particular group. We address this challenge and introduce frugal streaming, that is algorithms that work with tiny - typically, sub-streaming - amount of memory per group. We design a frugal algorithm that uses only one unit of memory per group to compute a quantile for each group. For stochastic streams where data items are drawn from a distribution independently, we analyze and show that the algorithm finds an approximation to the quantile rapidly and remains stably close to it. We also propose an extension of this algorithm that uses two units of memory per group. We show experiments with real world data from HTTP trace and Twitter that our frugal algorithms are comparable to existing streaming algorithms for estimating any quantile, but these existing algorithms use far more space per group and are unrealistic in frugal applications; further, the two memory frugal algorithm converges significantly faster than the one memory algorithm.

UR - http://www.scopus.com/inward/record.url?scp=84894110628&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84894110628&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-40273-9_7

DO - 10.1007/978-3-642-40273-9_7

M3 - Conference contribution

AN - SCOPUS:84894110628

SN - 9783642402722

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 77

EP - 96

BT - Space-Efficient Data Structures, Streams, and Algorithms - Papers in Honor of J. Ian Munro on the Occasion of His 66th Birthday

PB - Springer Verlag

T2 - Conference on Space-Efficient Data Structures, Streams, and Algorithms

Y2 - 15 August 2013 through 16 August 2013

ER -