TY - GEN
T1 - What's in the community cookie jar?
AU - Cahn, Aaron
AU - Alfeld, Scott
AU - Barford, Paul
AU - Muthukrishnan, S.
N1 - Funding Information:
The authors would like to thank Cookiepedia for supplying the data used in this study. This material is based upon work supported by the DHS grant BAA 11-01 and AFRL grant FA8750-12-2-0328. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not reflect the views of the DHS or AFRL
Publisher Copyright:
© 2016 IEEE.
PY - 2016/11/21
Y1 - 2016/11/21
N2 - Third party tracking of user behavior via web cookies represents a privacy threat. In this paper we assess this threat through an analysis of anonymized, crowdsourced cookie data provided by Cookiepedia.co.uk. We find that nearly 45% of the cookies in the corpus are from Facebook and of the remaining cookies 25% come from 10 distinct domains. Over 65% are Maximal Permission cookies (i.e., 3rd party, non-secure, persistent, root-level). Cookiepedia's anonymization of user data presents challenges with respect to modeling site traffic. We further elucidate the privacy issue by conducting targeted crawling campaigns to supplement the Cookiepedia data. We find that the amount of traffic obscured by Cookiepedia's anonymizing procedure varies dramatically from site to site - sometimes obscuring as much as 80% of traffic. We use the crawls to infer the inverse function of the anonymizing procedure, allowing us to enhance the crowdsourced dataset while maintaining user anonymity.
AB - Third party tracking of user behavior via web cookies represents a privacy threat. In this paper we assess this threat through an analysis of anonymized, crowdsourced cookie data provided by Cookiepedia.co.uk. We find that nearly 45% of the cookies in the corpus are from Facebook and of the remaining cookies 25% come from 10 distinct domains. Over 65% are Maximal Permission cookies (i.e., 3rd party, non-secure, persistent, root-level). Cookiepedia's anonymization of user data presents challenges with respect to modeling site traffic. We further elucidate the privacy issue by conducting targeted crawling campaigns to supplement the Cookiepedia data. We find that the amount of traffic obscured by Cookiepedia's anonymizing procedure varies dramatically from site to site - sometimes obscuring as much as 80% of traffic. We use the crawls to infer the inverse function of the anonymizing procedure, allowing us to enhance the crowdsourced dataset while maintaining user anonymity.
UR - http://www.scopus.com/inward/record.url?scp=85006802456&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85006802456&partnerID=8YFLogxK
U2 - 10.1109/ASONAM.2016.7752292
DO - 10.1109/ASONAM.2016.7752292
M3 - Conference contribution
AN - SCOPUS:85006802456
T3 - Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016
SP - 567
EP - 570
BT - Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016
A2 - Kumar, Ravi
A2 - Caverlee, James
A2 - Tong, Hanghang
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016
Y2 - 18 August 2016 through 21 August 2016
ER -