TY - GEN
T1 - Looking at both the present and the past to efficiently update replicas of Web content
AU - Barbosa, Luciano
AU - Salgado, Ana Carolina
AU - De Carvalho, Francisco
AU - Robin, Jacques
AU - Freire, Juliana
PY - 2005
Y1 - 2005
N2 - Since Web sites are autonomous and independently updated, applications that keep replicas of Web data, such as Web warehouses and search engines, must periodically poll the sites and check for changes. Since this is a resource-intensive task, in order to keep the copies up-to-date, it is important to devise efficient update schedules that adapt to the change rate of the pages and avoid visiting pages not modified since the last visit. In this paper, we propose a new approach that learns to predict the change behavior of Web pages based both on the static features and change history of pages, and refreshes the copies accordingly. Experiments using real-world data show that our technique leads to substantial performance improvements compared to previously proposed approaches.
AB - Since Web sites are autonomous and independently updated, applications that keep replicas of Web data, such as Web warehouses and search engines, must periodically poll the sites and check for changes. Since this is a resource-intensive task, in order to keep the copies up-to-date, it is important to devise efficient update schedules that adapt to the change rate of the pages and avoid visiting pages not modified since the last visit. In this paper, we propose a new approach that learns to predict the change behavior of Web pages based both on the static features and change history of pages, and refreshes the copies accordingly. Experiments using real-world data show that our technique leads to substantial performance improvements compared to previously proposed approaches.
KW - Indexing update
KW - Machine learning
KW - Update policy
UR - http://www.scopus.com/inward/record.url?scp=63449098887&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=63449098887&partnerID=8YFLogxK
U2 - 10.1145/1097047.1097062
DO - 10.1145/1097047.1097062
M3 - Conference contribution
AN - SCOPUS:63449098887
SN - 1595931945
SN - 9781595931948
T3 - Proceedings of the Interntational Workshop on Web Information and Data Management WIDM
SP - 75
EP - 80
BT - WIDM 2005 - Proceedings of the 7th ACM International Workshop on Web Information and Data Management, Co-located with CIKM 2005
T2 - 7th ACM International Workshop on Web Information and Data Management, WIDM 2005, Held in Conjunction with the International Conference on Information and Knowledge Management, CIKM 2005
Y2 - 5 November 2005 through 5 November 2005
ER -