Abstract
The Google search engine uses a method called PageRank, together with term-based and other ranking techniques, to order search results returned to the user. PageRank uses link analysis to assign a global importance score to each web page, The PageRank scores of all the pages are usually determined off-line in a large-scale computation on the entire hyperlink graph of the web, and several recent studies have focused on improving the efficiency of this computation, which may require multiple hours on a workstation. However, in some scenarios, such as online analysis of link evolution and mining of large web archives such as the Internet Archive, it may be desirable to quickly approximate or update the PageRanks of individual nodes without performing a large-scale computation on the entire graph. We address this problem by studying several methods for efficiently estimating the PageRank score of a particular web page using only a small subgraph of the entire web. In our model, we assume that the graph is accessible remotely via a link database (such as the AltaVista Connectivity Server) or is stored in a relational database that performs lookups on disks to retrieve node and connectivity information. We show that a reasonable estimate of the PageRank value of a node is possible in most cases by retrieving only a moderate number of nodes in the local neighborhood of the node.
Original language | English (US) |
---|---|
Title of host publication | CIKM 2004: Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management |
Editors | D.A. Evans, L. Gravano, O. Herzog, C. Zhai, M. Ronthaler |
Pages | 381-389 |
Number of pages | 9 |
State | Published - 2004 |
Event | CIKM 2004: Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management - Washington, DC, United States Duration: Nov 8 2004 → Nov 13 2004 |
Other
Other | CIKM 2004: Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management |
---|---|
Country/Territory | United States |
City | Washington, DC |
Period | 11/8/04 → 11/13/04 |
Keywords
- External memory algorithms
- Link database
- Link-based ranking
- Out-of-core
- Pagerank
- Search engines
ASJC Scopus subject areas
- General Business, Management and Accounting