TY - GEN
T1 - Compressing and searching XML data via two zips
AU - Ferragina, P.
AU - Luccio, F.
AU - Manzini, G.
AU - Muthukrishnan, S.
PY - 2006
Y1 - 2006
N2 - XML is fast becoming the standard format to store, exchange and publish over the web, and is getting embedded in applications. Two challenges in handling XML are its size (the XML representation of a document is signifi-cantly larger than its native state) and the complexity of its search (XML search involves path and content searches on labeled tree structures). We address the basic problems of compression, navigation and searching of XML documents. In particular, we adopt recently proposed theoretical algorithms [11] for succinct tree representations to design and implement a compressed index for XML, called XBzipIn-dex, in which the XML document is maintained in a highly compressed format, and both navigation and searching can be done uncompressing only a tiny fraction of the data. This solution relies on compressing and indexing two arrays derived from the XML data. With detailed experiments we compare this with other compressed XML indexing and searching engines to show that XBzipIndex has compression ratio up to 35% better than the ones achievable by those other tools, and its time performance on some path and content search operations is order of magnitudes faster: few milliseconds over hundreds of MBs of XML files versus tens of seconds, on standard XML data sources.
AB - XML is fast becoming the standard format to store, exchange and publish over the web, and is getting embedded in applications. Two challenges in handling XML are its size (the XML representation of a document is signifi-cantly larger than its native state) and the complexity of its search (XML search involves path and content searches on labeled tree structures). We address the basic problems of compression, navigation and searching of XML documents. In particular, we adopt recently proposed theoretical algorithms [11] for succinct tree representations to design and implement a compressed index for XML, called XBzipIn-dex, in which the XML document is maintained in a highly compressed format, and both navigation and searching can be done uncompressing only a tiny fraction of the data. This solution relies on compressing and indexing two arrays derived from the XML data. With detailed experiments we compare this with other compressed XML indexing and searching engines to show that XBzipIndex has compression ratio up to 35% better than the ones achievable by those other tools, and its time performance on some path and content search operations is order of magnitudes faster: few milliseconds over hundreds of MBs of XML files versus tens of seconds, on standard XML data sources.
KW - Labeled trees
KW - XML compression and indexing
UR - http://www.scopus.com/inward/record.url?scp=34250636362&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34250636362&partnerID=8YFLogxK
U2 - 10.1145/1135777.1135891
DO - 10.1145/1135777.1135891
M3 - Conference contribution
AN - SCOPUS:34250636362
SN - 1595933239
SN - 9781595933232
T3 - Proceedings of the 15th International Conference on World Wide Web
SP - 751
EP - 760
BT - Proceedings of the 15th International Conference on World Wide Web
T2 - 15th International Conference on World Wide Web
Y2 - 23 May 2006 through 26 May 2006
ER -