Distribution-free inference for regression

With Rina Foygel Barber, University of Chicago

Distribution-free inference for regression: discrete, continuous, and in between

In data analysis problems where we are not able to rely on distributional assumptions, what types of inference guarantees can still be obtained? Many popular methods, such as holdout methods, cross-validation methods, and conformal prediction, are able to provide distribution-free guarantees for predictive inference, but the problem of providing inference for the underlying regression function (for example, inference on the conditional meanE[Y|X]) is more challenging. If X takes only a small number of possible values, then inference on E[Y|X] is trivial to achieve. At the other extreme, if the features X are continuously distributed, we show that any confidence interval for E[Y|X] must have non-vanishing width, even as sample size tends to infinity – this is true regardless of smoothness properties or other desirable features of the underlying distribution. In between these two extremes, we find several distinct regimes – in particular, it is possible for distribution-free confidence intervals to have vanishing width if and only if the effective support size of the distribution ofXis smaller than the square of the sample size.

This work is joint with Yonghoon Lee.

Add to your calendar or Include in your list