## Optimal prediction of Markov chains without mixing conditions

### With Yihong Wu (Yale University)

# Optimal prediction of Markov chains without mixing conditions

Motivated by practical applications such as autocomplete and text generation, we study the following statistical problem with dependent data: Observing a trajectory of length $n$ from a stationary first-order Markov chain with $k$ states, how to predict (the distribution of) the next state? In contrast to the better-studied parameter estimation problem which relies on assumptions on the mixing time or the minimal probability of the stationary distribution, the prediction problem requires neither. This allows for an assumption-free framework but also results in new technical challenges due to the lack of concentration for sample path statistics.

For $3 leq k leq O(sqrt{n})$, using information-theoretic techniques including, in particular, those rooted in universal compression, we show that the optimal rate of Kullback-Leibler prediction risk is $frac{k^{2}{n}log frac{n}{k}2}$, in contrast to the optimal rate of $frac{log log n}{n}$ for $k=2$ previously shown by Falahatgar et al. These rates, slower than the parametric rate of $O(frac{k^2}{n})$, can be attributed to the memory in the data. To quantify the memory effect, we give a lower bound on the spectral gap that ensures the prediction risk is parametric. Extensions to higher-order Markov chains and Hidden Markov Model will be discussed briefly.

This is based on joint work with Yanjun Han and Soham Jana: https://arxiv.org/abs/2106.13947

- Speaker: Yihong Wu (Yale University)
- Friday 13 May 2022, 16:00–17:00
- Venue: MR12, Centre for Mathematical Sciences.
- Series: Statistics; organiser: Qingyuan Zhao.