TY - JOUR
T1 - Optimizing Equijoin Queries in Distributed Databases Where Relations are Hash Partitioned
AU - Shasha, Dennis
AU - Wang, Tsong Li
PY - 1991/1/5
Y1 - 1991/1/5
N2 - Consider the class of distributed database systems consisting of a set of nodes connected by a high bandwidth network. Each node consists of a processor, a random access memory, and a slower but much larger memory such as a disk. There is no shared memory among the nodes. The data are horizontally partitioned often using a hash function. Such a description characterizes many parallel or distributed database systems that have recently been proposed, both commercial and academic. We study the optimization problem that arises when the query processor must repartition the relations and intermediate results participating in a multijoin query. Using estimates of the sizes of intermediate relations, we show 1991 optimum solutions for closed chain queries; (2) the NP-completeness of the optimization problem for star, tree, and general graph queries; and (3) effective heuristics for these hard cases. Our general approach and many of our results extend to other attribute partitioning schemes, for example, sort-partitioning on attributes, and to partitioned object databases.
AB - Consider the class of distributed database systems consisting of a set of nodes connected by a high bandwidth network. Each node consists of a processor, a random access memory, and a slower but much larger memory such as a disk. There is no shared memory among the nodes. The data are horizontally partitioned often using a hash function. Such a description characterizes many parallel or distributed database systems that have recently been proposed, both commercial and academic. We study the optimization problem that arises when the query processor must repartition the relations and intermediate results participating in a multijoin query. Using estimates of the sizes of intermediate relations, we show 1991 optimum solutions for closed chain queries; (2) the NP-completeness of the optimization problem for star, tree, and general graph queries; and (3) effective heuristics for these hard cases. Our general approach and many of our results extend to other attribute partitioning schemes, for example, sort-partitioning on attributes, and to partitioned object databases.
KW - NP-complete problems
KW - equijoin
KW - hashing
KW - relational data models
KW - spanning trees
KW - systems
UR - http://www.scopus.com/inward/record.url?scp=0026168378&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0026168378&partnerID=8YFLogxK
U2 - 10.1145/114325.103713
DO - 10.1145/114325.103713
M3 - Article
AN - SCOPUS:0026168378
SN - 0362-5915
VL - 16
SP - 279
EP - 308
JO - ACM Transactions on Database Systems (TODS)
JF - ACM Transactions on Database Systems (TODS)
IS - 2
ER -