ESTIMATING REPORTING BIAS IN 311 COMPLAINT DATA

Kate S. Boxer, Boyeong Hong, Constantine E. Kontokosta, Daniel B. Neill

Research output: Contribution to journalArticlepeer-review

Abstract

Systems such as “311” enable residents of a community to report on their environments and to request nonemergency municipal services. While such systems provide an important link between community and government, resident-generated data suffer from reporting bias, with some subpopulations reporting at lower rates than others. Our research focuses on defining the underreporting of heating and hot water problems to New York City’s 311 system and developing methods to estimate under-reporting. First, we estimate nonreporting by fitting a latent variable model, which estimates both the probability of an underlying heating problem conditional on building characteristics, and the probability of reporting a problem conditional on population characteristics. Second, we analyze “less-than-expected” reporting: buildings with fewer 311 calls than expected, as compared to similarly-sized buildings with similar estimated problem durations. Together, these analyses determine neighborhoods and neighborhood-level socioeconomic characteristics that are predictive of underreporting of heating and hot water problems. Our approaches can aid government agencies wishing to use resident-generated data to assist in constructing fair public policies.

Original languageEnglish (US)
Pages (from-to)1691-1713
Number of pages23
JournalAnnals of Applied Statistics
Volume19
Issue number2
DOIs
StatePublished - Jun 2025

Keywords

  • 311 data
  • citizen-generated data
  • city analytics
  • latent variable models
  • positive and unlabeled learning
  • reporting bias
  • Resident-generated data

ASJC Scopus subject areas

  • Statistics and Probability
  • Modeling and Simulation
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'ESTIMATING REPORTING BIAS IN 311 COMPLAINT DATA'. Together they form a unique fingerprint.

Cite this