TY - GEN
T1 - An empirical study of web cookies
AU - Cahn, Aaron
AU - Alfeld, Scott
AU - Barford, Paul
AU - Muthukrishnan, S.
PY - 2016
Y1 - 2016
N2 - Web cookies are used widely by publishers and 3rd parties to track users and their behaviors. Despite the ubiquitous use of cookies, there is little prior work on their characteristics such as standard attributes, placement policies, and the knowledge that can be amassed via 3rd party cookies. In this paper, we present an empirical study of web cookie characteristics, placement practices and information transmission. To conduct this study, we implemented a lightweight web crawler that tracks and stores the cookies as it navigates to websites. We use this crawler to collect over 3.2M cookies from the two crawls, separated by 18 months, of the top 100K Alexa web sites. We report on the general cookie characteristics and add context via a cookie category index and website genre labels. We consider privacy implications by examining specific cookie attributes and placement behavior of 3rd party cookies.We find that 3rd party cookies outnumber 1st party cookies by a factor of two, and we illuminate the connection between domain genres and cookie attributes. We find that less than 1% of the entities that place cookies can aggregate information across 75% of web sites. Finally, we consider the issue of information transmission and aggregation by domains via 3rd party cookies. We develop a mathematical framework to quantify user information leakage for a broad class of users, and present findings using real world domains. In particular, wedemonstrate the interplay between a domain's footprint across the Internet and the browsing behavior of users, which has significant impact on information transmission.
AB - Web cookies are used widely by publishers and 3rd parties to track users and their behaviors. Despite the ubiquitous use of cookies, there is little prior work on their characteristics such as standard attributes, placement policies, and the knowledge that can be amassed via 3rd party cookies. In this paper, we present an empirical study of web cookie characteristics, placement practices and information transmission. To conduct this study, we implemented a lightweight web crawler that tracks and stores the cookies as it navigates to websites. We use this crawler to collect over 3.2M cookies from the two crawls, separated by 18 months, of the top 100K Alexa web sites. We report on the general cookie characteristics and add context via a cookie category index and website genre labels. We consider privacy implications by examining specific cookie attributes and placement behavior of 3rd party cookies.We find that 3rd party cookies outnumber 1st party cookies by a factor of two, and we illuminate the connection between domain genres and cookie attributes. We find that less than 1% of the entities that place cookies can aggregate information across 75% of web sites. Finally, we consider the issue of information transmission and aggregation by domains via 3rd party cookies. We develop a mathematical framework to quantify user information leakage for a broad class of users, and present findings using real world domains. In particular, wedemonstrate the interplay between a domain's footprint across the Internet and the browsing behavior of users, which has significant impact on information transmission.
UR - http://www.scopus.com/inward/record.url?scp=85013084506&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85013084506&partnerID=8YFLogxK
U2 - 10.1145/2872427.2882991
DO - 10.1145/2872427.2882991
M3 - Conference contribution
AN - SCOPUS:85013084506
T3 - 25th International World Wide Web Conference, WWW 2016
SP - 891
EP - 901
BT - 25th International World Wide Web Conference, WWW 2016
PB - International World Wide Web Conferences Steering Committee
T2 - 25th International World Wide Web Conference, WWW 2016
Y2 - 11 April 2016 through 15 April 2016
ER -