Graphical model sketch

Branislav Kveton, Hung Bui, Mohammad Ghavamzadeh, Georgios Theocharous, S. Muthukrishnan, Siqi Sun

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Structured high-cardinality data arises in many domains, and poses a major challenge for both modeling and inference. Graphical models are a popular approach to modeling structured data but they are unsuitable for high-cardinality variables. The count-min (CM) sketch is a popular approach to estimating probabilities in high-cardinality data but it does not scale well beyond a few variables. In this work, we bring together the ideas of graphical models and count sketches; and propose and analyze several approaches to estimating probabilities in structured high-cardinality streams of data. The key idea of our approximations is to use the structure of a graphical model and approximately estimate its factors by “sketches”, which hash high-cardinality variables using random projections. Our approximations are computationally efficient and their space complexity is independent of the cardinality of variables. Our error bounds are multiplicative and significantly improve upon those of the CM sketch, a state-of-the-art approach to estimating probabilities in streams. We evaluate our approximations on synthetic and real-world problems, and report an order of magnitude improvements over the CM sketch.

    Original languageEnglish (US)
    Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2016, Proceedings
    EditorsJilles Giuseppe, Niels Landwehr, Giuseppe Manco, Paolo Frasconi
    PublisherSpringer Verlag
    Pages81-97
    Number of pages17
    ISBN (Print)9783319461274
    DOIs
    StatePublished - 2016
    Event15th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2016 - Riva del Garda, Italy
    Duration: Sep 19 2016Sep 23 2016

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume9851 LNAI
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference15th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2016
    Country/TerritoryItaly
    CityRiva del Garda
    Period9/19/169/23/16

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • General Computer Science

    Fingerprint

    Dive into the research topics of 'Graphical model sketch'. Together they form a unique fingerprint.

    Cite this