The science of guessing: Analyzing an anonymized corpus of 70 million passwords

Joseph Bonneau

Research output: Chapter in Book/Report/Conference proceedingConference contribution


We report on the largest corpus of user-chosen passwords ever studied, consisting of anonymized password histograms representing almost 70 million Yahoo! users, mitigating privacy concerns while enabling analysis of dozens of subpopulations based on demographic factors and site usage characteristics. This large data set motivates a thorough statistical treatment of estimating guessing difficulty by sampling from a secret distribution. In place of previously used metrics such as Shannon entropy and guessing entropy, which cannot be estimated with any realistically sized sample, we develop partial guessing metrics including a new variant of guesswork parameterized by an attacker's desired success rate. Our new metric is comparatively easy to approximate and directly relevant for security engineering. By comparing password distributions with a uniform distribution which would provide equivalent security against different forms of guessing attack, we estimate that passwords provide fewer than 10 bits of security against an online, trawling attack, and only about 20 bits of security against an optimal offline dictionary attack. We find surprisingly little variation in guessing difficulty; every identifiable group of users generated a comparably weak password distribution. Security motivations such as the registration of a payment card have no greater impact than demographic factors such as age and nationality. Even proactive efforts to nudge users towards better password choices with graphical feedback make little difference. More surprisingly, even seemingly distant language communities choose the same weak passwords and an attacker never gains more than a factor of 2 efficiency gain by switching from the globally optimal dictionary to a population-specific lists.

Original languageEnglish (US)
Title of host publicationProceedings - 2012 IEEE Symposium on Security and Privacy, S and P 2012
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages15
ISBN (Print)9780769546810
StatePublished - 2012
Event33rd IEEE Symposium on Security and Privacy, S and P 2012 - San Francisco, CA, United States
Duration: May 21 2012May 23 2012

Publication series

NameProceedings - IEEE Symposium on Security and Privacy
ISSN (Print)1081-6011


Other33rd IEEE Symposium on Security and Privacy, S and P 2012
Country/TerritoryUnited States
CitySan Francisco, CA


  • authentication
  • computer security
  • data mining
  • information theory
  • statistics

ASJC Scopus subject areas

  • Safety, Risk, Reliability and Quality
  • Software
  • Computer Networks and Communications


Dive into the research topics of 'The science of guessing: Analyzing an anonymized corpus of 70 million passwords'. Together they form a unique fingerprint.

Cite this