Abstract
Systems such as “311” enable residents of a community to report on their environments and to request nonemergency municipal services. While such systems provide an important link between community and government, resident-generated data suffer from reporting bias, with some subpopulations reporting at lower rates than others. Our research focuses on defining the underreporting of heating and hot water problems to New York City’s 311 system and developing methods to estimate under-reporting. First, we estimate nonreporting by fitting a latent variable model, which estimates both the probability of an underlying heating problem conditional on building characteristics, and the probability of reporting a problem conditional on population characteristics. Second, we analyze “less-than-expected” reporting: buildings with fewer 311 calls than expected, as compared to similarly-sized buildings with similar estimated problem durations. Together, these analyses determine neighborhoods and neighborhood-level socioeconomic characteristics that are predictive of underreporting of heating and hot water problems. Our approaches can aid government agencies wishing to use resident-generated data to assist in constructing fair public policies.
Original language | English (US) |
---|---|
Pages (from-to) | 1691-1713 |
Number of pages | 23 |
Journal | Annals of Applied Statistics |
Volume | 19 |
Issue number | 2 |
DOIs | |
State | Published - Jun 2025 |
Keywords
- 311 data
- citizen-generated data
- city analytics
- latent variable models
- positive and unlabeled learning
- reporting bias
- Resident-generated data
ASJC Scopus subject areas
- Statistics and Probability
- Modeling and Simulation
- Statistics, Probability and Uncertainty