Engineering the compression of massive tables: An experimental approach

Adam L. Buchsbaum, Donald F. Caldwell, Kenneth W. Church, Glenn S. Fowler, S. Muthukrishnan

    Research output: Contribution to conferencePaper

    Abstract

    We study the problem of compressing massive tables. We devise a novel compression paradigm - training for lossless compression - which assumes that the data exhibit dependencies that can be learned by examining a small amount of training material. We develop an experimental methodology to test the approach. Our result is a system, pzip, which outperforms gzip by factors of two in compression size and both compression and uncompression time for various tabular data. Pzip is now in production use in an AT&T network traffic data warehouse.

    Original languageEnglish (US)
    Pages175-184
    Number of pages10
    StatePublished - 2000
    Event11th Annual ACM-SIAM Symposium on Discrete Algorithms - San Francisco, CA, USA
    Duration: Jan 9 2000Jan 11 2000

    Other

    Other11th Annual ACM-SIAM Symposium on Discrete Algorithms
    CitySan Francisco, CA, USA
    Period1/9/001/11/00

    ASJC Scopus subject areas

    • Software
    • Mathematics(all)

    Fingerprint Dive into the research topics of 'Engineering the compression of massive tables: An experimental approach'. Together they form a unique fingerprint.

  • Cite this

    Buchsbaum, A. L., Caldwell, D. F., Church, K. W., Fowler, G. S., & Muthukrishnan, S. (2000). Engineering the compression of massive tables: An experimental approach. 175-184. Paper presented at 11th Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, CA, USA, .