Abstract
An approach to selectivity estimation of generalized Boolean substring queries with a focus on conjunctive multidimensional and Boolean queries was presented. The set hashing, a Monte Carlo technique, was used to succinctly represent the set of tuples containing a given substring as a signature vector of hash values. The analysis showed that using only linear storage, a large number of cross-counts were generated including those for complex co-occurrences of substrings.
Original language | English (US) |
---|---|
Pages (from-to) | 98-132 |
Number of pages | 35 |
Journal | Journal of Computer and System Sciences |
Volume | 66 |
Issue number | 1 |
DOIs | |
State | Published - 2003 |
ASJC Scopus subject areas
- Theoretical Computer Science
- General Computer Science
- Applied Mathematics
- Computer Networks and Communications
- Computational Theory and Mathematics