TY - GEN
T1 - Differentially Private Password Frequency Lists Or, How to release statistics from 70 million passwords (on purpose)
AU - Blocki, Jeremiah
AU - Datta, Anupam
AU - Bonneau, Joseph
N1 - Publisher Copyright:
© 2016 Internet Society.
PY - 2016
Y1 - 2016
N2 - Given a dataset of user-chosen passwords, the frequency list reveals the frequency of each unique password. We present a novel mechanism for releasing perturbed password frequency lists with rigorous security, efficiency, and distortion guarantees. Specifically, our mechanism is based on a novel algorithm for sampling that enables an efficient implementation of the exponential mechanism for differential privacy (naive sampling is exponential time). It provides the security guarantee that an adversary will not be able to use this perturbed frequency list to learn anything of significance about any individual user’s password even if the adversary already possesses a wealth of background knowledge about the users in the dataset. We prove that our mechanism introduces minimal distortion, thus ensuring that the released frequency list is close to the actual list. Further, we empirically demonstrate, using the now-canonical password dataset leaked from RockYou, that the mechanism works well in practice: as the differential privacy parameter ɛ varies from 8 to 0.002 (smaller ɛ implies higher security), the normalized distortion coefficient (representing the distance between the released and actual password frequency list divided by the number of users N) varies from 8.8 × 10-7 to 1.9 × 10-3. Given this appealing combination of security and distortion guarantees, our mechanism enables organizations to publish perturbed password frequency lists. This can facilitate new research comparing password security between populations and evaluating password improvement approaches. To this end, we have collaborated with Yahoo! to use our differentially private mechanism to publicly release a corpus of 50 password frequency lists representing approximately 70 million Yahoo! users. This dataset is now the largest password frequency corpus available. Using our perturbed dataset we are able to closely replicate the original published analysis of this data.
AB - Given a dataset of user-chosen passwords, the frequency list reveals the frequency of each unique password. We present a novel mechanism for releasing perturbed password frequency lists with rigorous security, efficiency, and distortion guarantees. Specifically, our mechanism is based on a novel algorithm for sampling that enables an efficient implementation of the exponential mechanism for differential privacy (naive sampling is exponential time). It provides the security guarantee that an adversary will not be able to use this perturbed frequency list to learn anything of significance about any individual user’s password even if the adversary already possesses a wealth of background knowledge about the users in the dataset. We prove that our mechanism introduces minimal distortion, thus ensuring that the released frequency list is close to the actual list. Further, we empirically demonstrate, using the now-canonical password dataset leaked from RockYou, that the mechanism works well in practice: as the differential privacy parameter ɛ varies from 8 to 0.002 (smaller ɛ implies higher security), the normalized distortion coefficient (representing the distance between the released and actual password frequency list divided by the number of users N) varies from 8.8 × 10-7 to 1.9 × 10-3. Given this appealing combination of security and distortion guarantees, our mechanism enables organizations to publish perturbed password frequency lists. This can facilitate new research comparing password security between populations and evaluating password improvement approaches. To this end, we have collaborated with Yahoo! to use our differentially private mechanism to publicly release a corpus of 50 password frequency lists representing approximately 70 million Yahoo! users. This dataset is now the largest password frequency corpus available. Using our perturbed dataset we are able to closely replicate the original published analysis of this data.
UR - http://www.scopus.com/inward/record.url?scp=85180773932&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85180773932&partnerID=8YFLogxK
U2 - 10.14722/ndss.2016.23328
DO - 10.14722/ndss.2016.23328
M3 - Conference contribution
AN - SCOPUS:85180773932
T3 - 23rd Annual Network and Distributed System Security Symposium, NDSS 2016
BT - 23rd Annual Network and Distributed System Security Symposium, NDSS 2016
PB - The Internet Society
T2 - 23rd Annual Network and Distributed System Security Symposium, NDSS 2016
Y2 - 21 February 2016 through 24 February 2016
ER -