Abstract
We study the problem of compressing massive tables. We devise a novel compression paradigm - training for lossless compression - which assumes that the data exhibit dependencies that can be learned by examining a small amount of training material. We develop an experimental methodology to test the approach. Our result is a system, pzip, which outperforms gzip by factors of two in compression size and both compression and uncompression time for various tabular data. Pzip is now in production use in an AT&T network traffic data warehouse.
Original language | English (US) |
---|---|
Pages | 175-184 |
Number of pages | 10 |
State | Published - 2000 |
Event | 11th Annual ACM-SIAM Symposium on Discrete Algorithms - San Francisco, CA, USA Duration: Jan 9 2000 → Jan 11 2000 |
Other
Other | 11th Annual ACM-SIAM Symposium on Discrete Algorithms |
---|---|
City | San Francisco, CA, USA |
Period | 1/9/00 → 1/11/00 |
ASJC Scopus subject areas
- Software
- General Mathematics