TY - GEN
T1 - Graphical model sketch
AU - Kveton, Branislav
AU - Bui, Hung
AU - Ghavamzadeh, Mohammad
AU - Theocharous, Georgios
AU - Muthukrishnan, S.
AU - Sun, Siqi
N1 - Publisher Copyright:
© Springer International Publishing AG 2016.
PY - 2016
Y1 - 2016
N2 - Structured high-cardinality data arises in many domains, and poses a major challenge for both modeling and inference. Graphical models are a popular approach to modeling structured data but they are unsuitable for high-cardinality variables. The count-min (CM) sketch is a popular approach to estimating probabilities in high-cardinality data but it does not scale well beyond a few variables. In this work, we bring together the ideas of graphical models and count sketches; and propose and analyze several approaches to estimating probabilities in structured high-cardinality streams of data. The key idea of our approximations is to use the structure of a graphical model and approximately estimate its factors by “sketches”, which hash high-cardinality variables using random projections. Our approximations are computationally efficient and their space complexity is independent of the cardinality of variables. Our error bounds are multiplicative and significantly improve upon those of the CM sketch, a state-of-the-art approach to estimating probabilities in streams. We evaluate our approximations on synthetic and real-world problems, and report an order of magnitude improvements over the CM sketch.
AB - Structured high-cardinality data arises in many domains, and poses a major challenge for both modeling and inference. Graphical models are a popular approach to modeling structured data but they are unsuitable for high-cardinality variables. The count-min (CM) sketch is a popular approach to estimating probabilities in high-cardinality data but it does not scale well beyond a few variables. In this work, we bring together the ideas of graphical models and count sketches; and propose and analyze several approaches to estimating probabilities in structured high-cardinality streams of data. The key idea of our approximations is to use the structure of a graphical model and approximately estimate its factors by “sketches”, which hash high-cardinality variables using random projections. Our approximations are computationally efficient and their space complexity is independent of the cardinality of variables. Our error bounds are multiplicative and significantly improve upon those of the CM sketch, a state-of-the-art approach to estimating probabilities in streams. We evaluate our approximations on synthetic and real-world problems, and report an order of magnitude improvements over the CM sketch.
UR - http://www.scopus.com/inward/record.url?scp=84988584640&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84988584640&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-46128-1_6
DO - 10.1007/978-3-319-46128-1_6
M3 - Conference contribution
AN - SCOPUS:84988584640
SN - 9783319461274
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 81
EP - 97
BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2016, Proceedings
A2 - Giuseppe, Jilles
A2 - Landwehr, Niels
A2 - Manco, Giuseppe
A2 - Frasconi, Paolo
PB - Springer Verlag
T2 - 15th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2016
Y2 - 19 September 2016 through 23 September 2016
ER -