TY - GEN
T1 - Optimal prediction of Markov chains with and without spectral gap conditions
AU - Han, Yanjun
AU - Jana, Soham
AU - Wu, Yihong
N1 - Funding Information:
Y. Wu is supported in part by NSF Grant CCF-1900507, an NSF CAREER award CCF-1651588, and an Alfred Sloan fellowship.
Publisher Copyright:
© 2021 Neural information processing systems foundation. All rights reserved.
PY - 2021
Y1 - 2021
N2 - We study the following learning problem with dependent data: Observing a trajectory of length n from a stationary Markov chain with k states, the goal is to predict the next state. For 3 ≤ k ≤ O(√n), using techniques from universal compression, the optimal prediction risk in Kullback-Leibler divergence is shown to be Θ(kn2 log kn2 ), in contrast to the optimal rate of Θ(log lognn ) for k = 2 previously shown in [FOPS16]. These rates, slower than the parametric rate of O(kn2 ), can be attributed to the memory in the data, as the spectral gap of the Markov chain can be arbitrarily small. To quantify the memory effect, we study irreducible reversible chains with a prescribed spectral gap. In addition to characterizing the optimal prediction risk for two states, we show that, as long as the spectral gap is not excessively small, the prediction risk in the Markov model is O(kn2 ), which coincides with that of an iid model with the same number of parameters.
AB - We study the following learning problem with dependent data: Observing a trajectory of length n from a stationary Markov chain with k states, the goal is to predict the next state. For 3 ≤ k ≤ O(√n), using techniques from universal compression, the optimal prediction risk in Kullback-Leibler divergence is shown to be Θ(kn2 log kn2 ), in contrast to the optimal rate of Θ(log lognn ) for k = 2 previously shown in [FOPS16]. These rates, slower than the parametric rate of O(kn2 ), can be attributed to the memory in the data, as the spectral gap of the Markov chain can be arbitrarily small. To quantify the memory effect, we study irreducible reversible chains with a prescribed spectral gap. In addition to characterizing the optimal prediction risk for two states, we show that, as long as the spectral gap is not excessively small, the prediction risk in the Markov model is O(kn2 ), which coincides with that of an iid model with the same number of parameters.
UR - http://www.scopus.com/inward/record.url?scp=85129959171&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85129959171&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85129959171
T3 - Advances in Neural Information Processing Systems
SP - 11233
EP - 11246
BT - Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
A2 - Ranzato, Marc'Aurelio
A2 - Beygelzimer, Alina
A2 - Dauphin, Yann
A2 - Liang, Percy S.
A2 - Wortman Vaughan, Jenn
PB - Neural information processing systems foundation
T2 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
Y2 - 6 December 2021 through 14 December 2021
ER -