TY - GEN
T1 - Streaming algorithms for measuring H-impact
AU - Govindan, Priya
AU - Monemizadeh, Morteza
AU - Muthukrishnan, S.
N1 - Publisher Copyright:
© 2017 ACM.
PY - 2017/5/9
Y1 - 2017/5/9
N2 - We consider publication settings with positive user feedback, such as, users publishing tweets and other users retweeting them, friends posting photos and others liking them or even authors publishing research papers and others citing these publications. A well-accepted notion of "impact" for users in these settings is the H-Index, which is the largest k such that at least k publications have k or more (positive) feedback. We study how to calculate H-index on large streams of user publications and feedback. If all the items can be stored, H-index of a user can be computed by sorting. We focus on the streaming setting where as is typical, we do not have space to store all the items. We present the first known streaming algorithm for computing the H-index of a user in the cash register streaming model using space poly(1/e,log(1/δ), logn); this algorithm provides an additive e approximation. For the aggregated model where feedback for a publication is collated, we present streaming algorithms that use much less space, either only dependent on e and even a small constant. We also address the problem of finding "heavy hitters" users in H-index without estimating everyones' H-index. We present randomized streaming algorithms for finding 1 + e approximation to heavy hitters that uses space poly (1/e, log (1/δ), log n) and succeeds with probability at least 1 - δ. Again, this is the first sub-linear space algorithm for this problem, despite extensive research on heavy hitters in general. Our work initiates study of streaming algorithms for problems that estimate impact or identify impactful users.
AB - We consider publication settings with positive user feedback, such as, users publishing tweets and other users retweeting them, friends posting photos and others liking them or even authors publishing research papers and others citing these publications. A well-accepted notion of "impact" for users in these settings is the H-Index, which is the largest k such that at least k publications have k or more (positive) feedback. We study how to calculate H-index on large streams of user publications and feedback. If all the items can be stored, H-index of a user can be computed by sorting. We focus on the streaming setting where as is typical, we do not have space to store all the items. We present the first known streaming algorithm for computing the H-index of a user in the cash register streaming model using space poly(1/e,log(1/δ), logn); this algorithm provides an additive e approximation. For the aggregated model where feedback for a publication is collated, we present streaming algorithms that use much less space, either only dependent on e and even a small constant. We also address the problem of finding "heavy hitters" users in H-index without estimating everyones' H-index. We present randomized streaming algorithms for finding 1 + e approximation to heavy hitters that uses space poly (1/e, log (1/δ), log n) and succeeds with probability at least 1 - δ. Again, this is the first sub-linear space algorithm for this problem, despite extensive research on heavy hitters in general. Our work initiates study of streaming algorithms for problems that estimate impact or identify impactful users.
UR - http://www.scopus.com/inward/record.url?scp=85021223562&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85021223562&partnerID=8YFLogxK
U2 - 10.1145/3034786.3056118
DO - 10.1145/3034786.3056118
M3 - Conference contribution
AN - SCOPUS:85021223562
T3 - Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems
SP - 337
EP - 346
BT - PODS 2017 - Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
PB - Association for Computing Machinery
T2 - 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2017
Y2 - 14 May 2017 through 19 May 2017
ER -