Looking at both the present and the past to efficiently update replicas of Web content

Luciano Barbosa, Ana Carolina Salgado, Francisco De Carvalho, Jacques Robin, Juliana Freire

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Since Web sites are autonomous and independently updated, applications that keep replicas of Web data, such as Web warehouses and search engines, must periodically poll the sites and check for changes. Since this is a resource-intensive task, in order to keep the copies up-to-date, it is important to devise efficient update schedules that adapt to the change rate of the pages and avoid visiting pages not modified since the last visit. In this paper, we propose a new approach that learns to predict the change behavior of Web pages based both on the static features and change history of pages, and refreshes the copies accordingly. Experiments using real-world data show that our technique leads to substantial performance improvements compared to previously proposed approaches.

Original languageEnglish (US)
Title of host publicationWIDM 2005 - Proceedings of the 7th ACM International Workshop on Web Information and Data Management, Co-located with CIKM 2005
Pages75-80
Number of pages6
DOIs
StatePublished - 2005
Event7th ACM International Workshop on Web Information and Data Management, WIDM 2005, Held in Conjunction with the International Conference on Information and Knowledge Management, CIKM 2005 - Bremen, Germany
Duration: Nov 5 2005Nov 5 2005

Publication series

NameProceedings of the Interntational Workshop on Web Information and Data Management WIDM

Other

Other7th ACM International Workshop on Web Information and Data Management, WIDM 2005, Held in Conjunction with the International Conference on Information and Knowledge Management, CIKM 2005
CountryGermany
CityBremen
Period11/5/0511/5/05

Keywords

  • Indexing update
  • Machine learning
  • Update policy

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Software

Fingerprint Dive into the research topics of 'Looking at both the present and the past to efficiently update replicas of Web content'. Together they form a unique fingerprint.

Cite this