Statistics Seminar
Adaptive two-sample testing
With Arthur Gretton (Gatsby Computational Neuroscience Unit, UCL)
Adaptive two-sample testing
I will address the problem of two-sample testing using the Maximum Mean Discrepancy (MMD). The MMD is an integral probability metric defined using a reproducing kernel Hilbert space (RKHS), with properties determined by the choice of kernel. For good test power, the kernel must be chosen in accordance with the properties of the distributions being compared.
I will address two cases:
I will address two cases:
- The distributions being tested have densities, and the difference in densities lies in a Sobolev ball. The MMD test is then minimax optimal with a specific kernel depending on the smoothness parameter of the Sobolev ball. In practice, this parameter is unknown: to overcome this issue, I describe an aggregated test, called MMD Agg, which is adaptive to the smoothness parameter. The test power is maximised over the collection of kernels used, without requiring held-out data for kernel selection (which results in a loss of test power). MMD Agg controls the test level non-asymptotically, and achieves the minimax rate over Sobolev balls, up to an iterated logarithmic term. Guarantees hold for any product of one-dimensional translation invariant characteristic kernels.
- The distributions being tested may not have densities, but might be high dimensional (eg distributions over images), In this case, I will describe a heuristic for training neural net features for two-sample testing, by maximizing a proxy for test power over a held-out data set. This yields state-of-the-art performance on challenging real-world problems, for instance distinguishing between distributions over CIFAR images.
- Speaker: Arthur Gretton (Gatsby Computational Neuroscience Unit, UCL)
- Friday 27 October 2023, 14:00–15:00
- Venue: MR12, Centre for Mathematical Sciences.
- Series: Statistics; organiser: Qingyuan Zhao.