TY - JOUR
T1 - The Interplay of Demographic Variables and Social Distancing Scores in Deep Prediction of U.S. COVID-19 Cases
AU - Tang, Francesca
AU - Feng, Yang
AU - Chiheb, Hamza
AU - Fan, Jianqing
N1 - Publisher Copyright:
© 2021 American Statistical Association.
PY - 2021
Y1 - 2021
N2 - With the severity of the COVID-19 outbreak, we characterize the nature of the growth trajectories of counties in the United States using a novel combination of spectral clustering and the correlation matrix. As the United States and the rest of the world are still suffering from the effects of the virus, the importance of assigning growth membership to counties and understanding the determinants of the growth is increasingly evident. For the two communities (faster versus slower growth trajectories) we cluster the counties into, the average between-group correlation is 88.4% whereas the average within-group correlations are 95.0% and 93.8%. The average growth rate for one group is 0.1589 and 0.1704 for the other, further suggesting that our methodology captures meaningful differences between the nature of the growth across various counties. Subsequently, we select the demographic features that are most statistically significant in distinguishing the communities: number of grocery stores, number of bars, Asian population, White population, median household income, number of people with the bachelor’s degrees, and population density. Lastly, we effectively predict the future growth of a given county with a long short-term memory (LSTM) recurrent neural network using three social distancing scores. The best-performing model achieves a median out-of-sample R 2 of 0.6251 for a four-day ahead prediction and we find that the number of communities and social distancing features play an important role in producing a more accurate forecasting. This comprehensive study captures the nature of the counties’ growth in cases at a very micro-level using growth communities, demographic factors, and social distancing performance to help government agencies utilize known information to make appropriate decisions regarding which potential counties to target resources and funding to. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.
AB - With the severity of the COVID-19 outbreak, we characterize the nature of the growth trajectories of counties in the United States using a novel combination of spectral clustering and the correlation matrix. As the United States and the rest of the world are still suffering from the effects of the virus, the importance of assigning growth membership to counties and understanding the determinants of the growth is increasingly evident. For the two communities (faster versus slower growth trajectories) we cluster the counties into, the average between-group correlation is 88.4% whereas the average within-group correlations are 95.0% and 93.8%. The average growth rate for one group is 0.1589 and 0.1704 for the other, further suggesting that our methodology captures meaningful differences between the nature of the growth across various counties. Subsequently, we select the demographic features that are most statistically significant in distinguishing the communities: number of grocery stores, number of bars, Asian population, White population, median household income, number of people with the bachelor’s degrees, and population density. Lastly, we effectively predict the future growth of a given county with a long short-term memory (LSTM) recurrent neural network using three social distancing scores. The best-performing model achieves a median out-of-sample R 2 of 0.6251 for a four-day ahead prediction and we find that the number of communities and social distancing features play an important role in producing a more accurate forecasting. This comprehensive study captures the nature of the counties’ growth in cases at a very micro-level using growth communities, demographic factors, and social distancing performance to help government agencies utilize known information to make appropriate decisions regarding which potential counties to target resources and funding to. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.
KW - Block model
KW - COVID-19
KW - Community detection
KW - Learning
KW - Neural network
KW - Spectral clustering
KW - Stochastic machine
UR - http://www.scopus.com/inward/record.url?scp=85105221128&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85105221128&partnerID=8YFLogxK
U2 - 10.1080/01621459.2021.1901717
DO - 10.1080/01621459.2021.1901717
M3 - Article
AN - SCOPUS:85105221128
SN - 0162-1459
VL - 116
SP - 492
EP - 506
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 534
ER -