Abstract
Predicting the outcome of a single sporting event is difficult; predicting all of the outcomes for an entire tournament is a monumental challenge. Despite the difficulties, millions of people compete each year to forecast the outcome of the NCAA men's basketball tournament, which spans 63 games over 3 weeks. Statistical prediction of game outcomes involves a multitude of possible covariates and information sources, large performance variations from game to game, and a scarcity of detailed historical data. In this paper, we present the results of a team of modelers working together to forecast the 2014 NCAA men's basketball tournament. We present not only the methods and data used, but also several novel ideas for post-processing statistical forecasts and decontaminating data sources. In particular, we highlight the difficulties in using publicly available data and suggest techniques for improving their relevance.
Original language | English (US) |
---|---|
Pages (from-to) | 13-27 |
Number of pages | 15 |
Journal | Journal of Quantitative Analysis in Sports |
Volume | 11 |
Issue number | 1 |
DOIs | |
State | Published - Mar 1 2015 |
Keywords
- basketball
- data decontamination
- forecasting
- model ensembles
ASJC Scopus subject areas
- Social Sciences (miscellaneous)
- Decision Sciences (miscellaneous)