TY - JOUR
T1 - Pincer-search
T2 - An efficient algorithm for discovering the maximum frequent set
AU - Lin, Dao I.
AU - Kedem, Zvi M.
N1 - Funding Information:
This research was partially supported by the US National Science Foundation under grant number CCR-94-11590, by Intel Corporation and by Microsoft. The authors thank Rakesh Agrawal, Roberto J. Bayardo, and Ramakrishnan Srikant for kindly providing us with the experimental data. The authors thank Thomas Anantharaman and Sridhar Ramaswamy for their very valuable comments and suggestions. The authors would also like to thank Sridhar Ramaswamy for his proposal for the term Pincer-Search.
PY - 2002/5
Y1 - 2002/5
N2 - Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottom-up, breadth-first search direction. The computation starts from frequent 1-itemsets (the minimum length frequent itemsets) and continues until all maximal (length) frequent itemsets are found. During the execution, every frequent itemset is explicitly considered. Such algorithms perform well when all maximal frequent itemsets are short. However, performance drastically deteriorates when some of the maximal frequent itemsets are long. We present a new algorithm which combines both the bottom-up and the top-down searches. The primary search direction is still bottom-up, but a restricted search is also conducted in the top-down direction. This search is used only for maintaining and updating a new data structure, the maximum frequent candidate set. It is used to prune early candidates that would be normally encountered in the bottom-up search. A very important characteristic of the algorithm is that it does not require explicit examination of every frequent itemset. Therefore, the algorithm performs well even when some maximal frequent itemsets are long. As its output, the algorithm produces the maximum frequent set, i.e., the set containing all maximal frequent itemsets, thus specifying immediately all frequent itemsets. We evaluate the performance of the algorithm using well-known synthetic benchmark databases, real-life census, and stock market databases. The improvement in performance can be up to several orders of magnitude, compared to the best previous algorithms.
AB - Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottom-up, breadth-first search direction. The computation starts from frequent 1-itemsets (the minimum length frequent itemsets) and continues until all maximal (length) frequent itemsets are found. During the execution, every frequent itemset is explicitly considered. Such algorithms perform well when all maximal frequent itemsets are short. However, performance drastically deteriorates when some of the maximal frequent itemsets are long. We present a new algorithm which combines both the bottom-up and the top-down searches. The primary search direction is still bottom-up, but a restricted search is also conducted in the top-down direction. This search is used only for maintaining and updating a new data structure, the maximum frequent candidate set. It is used to prune early candidates that would be normally encountered in the bottom-up search. A very important characteristic of the algorithm is that it does not require explicit examination of every frequent itemset. Therefore, the algorithm performs well even when some maximal frequent itemsets are long. As its output, the algorithm produces the maximum frequent set, i.e., the set containing all maximal frequent itemsets, thus specifying immediately all frequent itemsets. We evaluate the performance of the algorithm using well-known synthetic benchmark databases, real-life census, and stock market databases. The improvement in performance can be up to several orders of magnitude, compared to the best previous algorithms.
KW - Association rule
KW - Data mining
KW - Knowledge discovery
KW - Maximum frequent candidate set
KW - Maximum frequent set
KW - Pincer Search
UR - http://www.scopus.com/inward/record.url?scp=0036565197&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0036565197&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2002.1000342
DO - 10.1109/TKDE.2002.1000342
M3 - Article
AN - SCOPUS:0036565197
SN - 1041-4347
VL - 14
SP - 553
EP - 566
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 3
ER -