TY - JOUR
T1 - An information-theoretic approach to study spatial dependencies in small datasets
T2 - Spatial dependencies in small datasets
AU - Porfiri, Maurizio
AU - Ruiz Marín, Manuel
N1 - Funding Information:
Data accessibility. Data used in the analysis of human migration and motor vehicle deaths are available from online documentation of [21] and [22]. Authors’ contributions. M.P. wrote a first draft of the manuscript and M.R.M. developed the computer codes. Both the authors formulated the method, developed the mathematical proofs and analysed the results. Both the authors gave final approval for publication and agree to be held accountable for the work performed therein. The authors contributed equally to the study. Competing interests. We declare we have no competing interests. Funding. This study is part of the collaborative activities carried out under the programs of the region of Murcia (Spain): ‘Groups of Excellence of the region of Murcia, the Fundación Séneca, Science and Technology Agency’ project 19884/GERM/15 and ‘Call for Fellowships for Guest Researcher Stays at Universities and OPIS’ project 21144/IV/19. M.P. would like to express his gratitude to the Technical University of Cartagena for hosting him during a Sabbatical leave and to acknowledge support from the National Science Foundation under grant no. CMMI 1561134. M.R.M. would like to acknowledge support from Ministerio de Ciencia, Innovacin y Universidades under grant number PID2019-107800GB-I00/AEI/10.13039/501100011033.
Publisher Copyright:
© 2020 The Author(s).
PY - 2020/10
Y1 - 2020/10
N2 - From epidemiology to economics, there is a fundamental need of statistically principled approaches to unveil spatial patterns and identify their underpinning mechanisms. Grounded in network and information theory, we establish a non-parametric scheme to study spatial associations from limited measurements of a spatial process. Through the lens of network theory, we relate spatial patterning in the dataset to the topology of a network on which the process unfolds. From the available observations of the spatial process and a candidate network topology, we compute a mutual information statistic that measures the extent to which the measurement at a node is explained by observations at neighbouring nodes. For a class of networks and linear autoregressive processes, we establish closed-form expressions for the mutual information statistic in terms of network topological features. We demonstrate the feasibility of the approach on synthetic datasets comprising 25-100 measurements, generated by linear or nonlinear autoregressive processes. Upon validation on synthetic processes, we examine datasets of human migration under climate change in Bangladesh and motor vehicle deaths in the United States of America. For both these real datasets, our approach is successful in identifying meaningful spatial patterns, begetting statistically-principled insight into the mechanisms of important socioeconomic problems.
AB - From epidemiology to economics, there is a fundamental need of statistically principled approaches to unveil spatial patterns and identify their underpinning mechanisms. Grounded in network and information theory, we establish a non-parametric scheme to study spatial associations from limited measurements of a spatial process. Through the lens of network theory, we relate spatial patterning in the dataset to the topology of a network on which the process unfolds. From the available observations of the spatial process and a candidate network topology, we compute a mutual information statistic that measures the extent to which the measurement at a node is explained by observations at neighbouring nodes. For a class of networks and linear autoregressive processes, we establish closed-form expressions for the mutual information statistic in terms of network topological features. We demonstrate the feasibility of the approach on synthetic datasets comprising 25-100 measurements, generated by linear or nonlinear autoregressive processes. Upon validation on synthetic processes, we examine datasets of human migration under climate change in Bangladesh and motor vehicle deaths in the United States of America. For both these real datasets, our approach is successful in identifying meaningful spatial patterns, begetting statistically-principled insight into the mechanisms of important socioeconomic problems.
KW - human migration
KW - information theory
KW - motor vehicle death
KW - network
KW - non-parametric
UR - http://www.scopus.com/inward/record.url?scp=85096039766&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85096039766&partnerID=8YFLogxK
U2 - 10.1098/rspa.2020.0113
DO - 10.1098/rspa.2020.0113
M3 - Article
AN - SCOPUS:85096039766
VL - 476
JO - Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences
JF - Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences
SN - 0080-4630
IS - 2242
M1 - 20200113
ER -