An approach to selectivity estimation of generalized Boolean substring queries with a focus on conjunctive multidimensional and Boolean queries was presented. The set hashing, a Monte Carlo technique, was used to succinctly represent the set of tuples containing a given substring as a signature vector of hash values. The analysis showed that using only linear storage, a large number of cross-counts were generated including those for complex co-occurrences of substrings.
ASJC Scopus subject areas
- Theoretical Computer Science
- Computer Networks and Communications
- Computational Theory and Mathematics
- Applied Mathematics