I/O-efficient techniques for computing pagerank

Yen Yu Chen, Qingqing Gan, Torsten Suel

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Over the last few years, most major search engines have integrated link-based ranking techniques in order to provide more accurate search results. One widely known approach is the Pagerank technique, which forms the basis of the Google ranking scheme, and which assigns a global importance measure to each page based on the importance of other pages pointing to it. The main advantage of the Pagerank measure is that it is independent of the query posed by a user; this means that it can be precomputed and then used to optimize the layout of the inverted index structure accordingly. However, computing the Pagerank measure requires implementing an iterative process on a massive graph corresponding to billions of web pages and hyperlinks. In this paper, we study I/O-efficient techniques to perform this iterative computation. We derive two algorithms for Pagerank based on techniques proposed for out-of-core graph algorithms, and compare them to two existing algorithms proposed by Haveliwala. We also consider the implementation of a recently proposed topic-sensitive version of Pagerank. Our experimental results show that for very large data sets, significant improvements over previous results can be achieved on machines with moderate amounts of memory. On the other hand, at most minor improvements are possible on data sets that are only moderately larger than memory, which is the case in many practical scenarios.

    Original languageEnglish (US)
    Title of host publicationInternational Conference on Information and Knowledge Management, Proceedings
    EditorsK Kalpakis, N Goharian, D Grossman
    Pages549-557
    Number of pages9
    StatePublished - 2002
    EventProceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM 2002) - McLean, VA, United States
    Duration: Nov 4 2002Nov 9 2002

    Other

    OtherProceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM 2002)
    CountryUnited States
    CityMcLean, VA
    Period11/4/0211/9/02

    Keywords

    • External memory algorithms
    • Link-based ranking
    • Out-of-core
    • Pagerank
    • Search engines

    ASJC Scopus subject areas

    • Business, Management and Accounting(all)

    Fingerprint Dive into the research topics of 'I/O-efficient techniques for computing pagerank'. Together they form a unique fingerprint.

    Cite this