Optimal prediction of Markov chains without mixing conditions

With Yihong Wu (Yale University)

Optimal prediction of Markov chains without mixing conditions

Motivated by practical applications such as autocomplete and text generation, we study the following statistical problem with dependent data: Observing a trajectory of length $n$ from a stationary first-order Markov chain with $k$ states, how to predict (the distribution of) the next state? In contrast to the better-studied parameter estimation problem which relies on assumptions on the mixing time or the minimal probability of the stationary distribution, the prediction problem requires neither. This allows for an assumption-free framework but also results in new technical challenges due to the lack of concentration for sample path statistics.

For $3 leq k leq O(sqrt{n})$, using information-theoretic techniques including, in particular, those rooted in universal compression, we show that the optimal rate of Kullback-Leibler prediction risk is $frac{k2}{n}log frac{n}{k2}$, in contrast to the optimal rate of $frac{log log n}{n}$ for $k=2$ previously shown by Falahatgar et al. These rates, slower than the parametric rate of $O(frac{k^2}{n})$, can be attributed to the memory in the data. To quantify the memory effect, we give a lower bound on the spectral gap that ensures the prediction risk is parametric. Extensions to higher-order Markov chains and Hidden Markov Model will be discussed briefly.

This is based on joint work with Yanjun Han and Soham Jana: https://arxiv.org/abs/2106.13947

Add to your calendar or Include in your list