Abstract
Over the last few years, most major search engines have integrated link-based ranking techniques in order to provide more accurate search results. One widely known approach is the Pagerank technique, which forms the basis of the Google ranking scheme, and which assigns a global importance measure to each page based on the importance of other pages pointing to it. The main advantage of the Pagerank measure is that it is independent of the query posed by a user; this means that it can be precomputed and then used to optimize the layout of the inverted index structure accordingly. However, computing the Pagerank measure requires implementing an iterative process on a massive graph corresponding to billions of web pages and hyperlinks. In this paper, we study I/O-efficient techniques to perform this iterative computation. We derive two algorithms for Pagerank based on techniques proposed for out-of-core graph algorithms, and compare them to two existing algorithms proposed by Haveliwala. We also consider the implementation of a recently proposed topic-sensitive version of Pagerank. Our experimental results show that for very large data sets, significant improvements over previous results can be achieved on machines with moderate amounts of memory. On the other hand, at most minor improvements are possible on data sets that are only moderately larger than memory, which is the case in many practical scenarios.
Original language | English (US) |
---|---|
Title of host publication | International Conference on Information and Knowledge Management, Proceedings |
Editors | K Kalpakis, N Goharian, D Grossman |
Pages | 549-557 |
Number of pages | 9 |
State | Published - 2002 |
Event | Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM 2002) - McLean, VA, United States Duration: Nov 4 2002 → Nov 9 2002 |
Other
Other | Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM 2002) |
---|---|
Country/Territory | United States |
City | McLean, VA |
Period | 11/4/02 → 11/9/02 |
Keywords
- External memory algorithms
- Link-based ranking
- Out-of-core
- Pagerank
- Search engines
ASJC Scopus subject areas
- General Business, Management and Accounting