Scalable computation of distributions from large scale data sets

Abon Chaudhuri, Teng Yok Lee, Bo Zhou, Cong Wang, Tiantian Xu, Han Wei Shen, Tom Peterka, Yi Jen Chiang

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    As we approach the era of exascale computing, the role of distributions to summarize, analyze and visualize large scale data is becoming more and more important. Since histograms continue to be a popular way of modeling the underlying data distribution, we propose a scalable and distributed framework for computing histograms from scalar and vector data at different levels of detail required by various types of analysis algorithms. We present efficient parallel techniques for histogram computation from regular as well as rectilinear grid data. We also study a technique called cross-validation to estimate the quality of computed histograms as a model of the actual data distribution. We parallelize cross-validation in a scalable manner to support histogram evaluation and selection of histogram parameters such as number of bins. We also present our distributed software framework for supporting science applications which require large scale distribution-based data analysis. The presented case studies highlight how the proposed algorithms and the related software benefit information theoretic and other distribution-driven analysis.

    Original languageEnglish (US)
    Title of host publicationIEEE Symposium on Large Data Analysis and Visualization 2012, LDAV 2012 - Proceedings
    Pages113-120
    Number of pages8
    DOIs
    StatePublished - 2012
    Event2nd Symposium on Large-Scale Data Analysis and Visualization, LDAV 2012 - Seattle, WA, United States
    Duration: Oct 14 2012Oct 19 2012

    Publication series

    NameIEEE Symposium on Large Data Analysis and Visualization 2012, LDAV 2012 - Proceedings

    Other

    Other2nd Symposium on Large-Scale Data Analysis and Visualization, LDAV 2012
    Country/TerritoryUnited States
    CitySeattle, WA
    Period10/14/1210/19/12

    ASJC Scopus subject areas

    • Computer Vision and Pattern Recognition
    • Information Systems

    Fingerprint

    Dive into the research topics of 'Scalable computation of distributions from large scale data sets'. Together they form a unique fingerprint.

    Cite this