TY - JOUR
T1 - Efficient network aware search in collaborative tagging sites
AU - Yahia, Sihem Amer
AU - Benedikt, Michael
AU - Lakshmanan, Laks V.S.
AU - Stoyanovichy, Julia
PY - 2008
Y1 - 2008
N2 - The popularity of collaborative tagging sites presents a unique opportunity to explore keyword search in a context where query results are determined by the opinion of a network of taggers related to a seeker. In this paper, we present the first in-depth study of network-aware search. We investigate efficient top-k processing when the score of an answer is computed as its popularity among members of a seeker's network. We argue that obvious adaptations of top-k algorithms are too space-intensive, due to the dependence of scores on the seeker's network. We therefore develop algorithms based on maintaining score upper-bounds. The global upper-bound approach maintains a single score upper-bound for every pair of item and tag, over the entire collection of users. The resulting bounds are very coarse. We thus investigate clustering seekers based on similar behavior of their networks. We show that finding the optimal clustering of seekers is intractable, but we provide heuristic methods that give substantial time improvements. We then give an optimization that can benefit smaller populations of seekers based on clustering of taggers. Our results are supported by extensive experiments on del.icio.us datasets.
AB - The popularity of collaborative tagging sites presents a unique opportunity to explore keyword search in a context where query results are determined by the opinion of a network of taggers related to a seeker. In this paper, we present the first in-depth study of network-aware search. We investigate efficient top-k processing when the score of an answer is computed as its popularity among members of a seeker's network. We argue that obvious adaptations of top-k algorithms are too space-intensive, due to the dependence of scores on the seeker's network. We therefore develop algorithms based on maintaining score upper-bounds. The global upper-bound approach maintains a single score upper-bound for every pair of item and tag, over the entire collection of users. The resulting bounds are very coarse. We thus investigate clustering seekers based on similar behavior of their networks. We show that finding the optimal clustering of seekers is intractable, but we provide heuristic methods that give substantial time improvements. We then give an optimization that can benefit smaller populations of seekers based on clustering of taggers. Our results are supported by extensive experiments on del.icio.us datasets.
UR - http://www.scopus.com/inward/record.url?scp=84859178171&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84859178171&partnerID=8YFLogxK
U2 - 10.14778/1453856.1453934
DO - 10.14778/1453856.1453934
M3 - Article
AN - SCOPUS:84859178171
SN - 2150-8097
VL - 1
SP - 710
EP - 721
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 1
ER -