TY - GEN
T1 - Geographic web usage estimation by monitoring DNS caches
AU - Akcan, Hüseyin
AU - Suel, Torsten
AU - Brönnimann, Hervé
N1 - Copyright:
Copyright 2010 Elsevier B.V., All rights reserved.
PY - 2008
Y1 - 2008
N2 - DNS is one of the most actively used distributed databases on earth, accessed by millions of people every day to transparently convert host names into IP addresses and vice versa. In order to improve their performance, DNS servers also keep temporary records of all requested domain names in their cache. While most of the DNS servers are configured to be used by their local users only, there still exist many DNS servers that respond to public queries. Querying these DNS servers reveals the recently visited domains. Exploiting the geographically distributed nature of DNS, one can gather usage statistics ranging from a single DNS server to global scale. In particular, this enables collecting statistics about geographic differences in web browsing behavior between different regions of a country or the world. In this paper, we present methods to identify these public DNS servers, discuss how to effectively crawl them, and describe our algorithm to extract usage estimations from the crawl data. We also evaluate our estimation algorithm using extensive simulations, and finally use our algorithms to crawl 150 U.S. universities for various domains, and explore the effects of location and time on the access rate of these domains.
AB - DNS is one of the most actively used distributed databases on earth, accessed by millions of people every day to transparently convert host names into IP addresses and vice versa. In order to improve their performance, DNS servers also keep temporary records of all requested domain names in their cache. While most of the DNS servers are configured to be used by their local users only, there still exist many DNS servers that respond to public queries. Querying these DNS servers reveals the recently visited domains. Exploiting the geographically distributed nature of DNS, one can gather usage statistics ranging from a single DNS server to global scale. In particular, this enables collecting statistics about geographic differences in web browsing behavior between different regions of a country or the world. In this paper, we present methods to identify these public DNS servers, discuss how to effectively crawl them, and describe our algorithm to extract usage estimations from the crawl data. We also evaluate our estimation algorithm using extensive simulations, and finally use our algorithms to crawl 150 U.S. universities for various domains, and explore the effects of location and time on the access rate of these domains.
KW - DNS
KW - web access monitoring
KW - web site usage estimation
UR - http://www.scopus.com/inward/record.url?scp=77954429438&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77954429438&partnerID=8YFLogxK
U2 - 10.1145/1367798.1367813
DO - 10.1145/1367798.1367813
M3 - Conference contribution
AN - SCOPUS:77954429438
SN - 9781605581606
T3 - ACM International Conference Proceeding Series
SP - 85
EP - 92
BT - LocWeb 2008 - Proceedings of the 1st International Workshop on Location and the Web, in Conjunction with the WWW 2008 Conference
T2 - 1st International Workshop on Location and the Web, LocWeb 2008, in Conjunction with the WWW 2008 Conference
Y2 - 22 April 2008 through 22 April 2008
ER -