## Non-asymptotic control of a kernel 2-sample test

### With Perrine Lacroix (ENS Lyon)

# Non-asymptotic control of a kernel 2-sample test

We are interested in statistical tests to evaluate the hypothesis H₀: {P = Q} against its alternative H₁: {P ≠ Q}. Our data are multivariate, high-dimensional and exhibit strong dependencies between variables. We propose a comparison test of two distributions based on kernel methods: our data are first transformed via a well-chosen feature map and live in a reproducing kernel Hilbert space (RKHS). Our kernel test statistic is the equivalent of the Hotelling’s T2 comparison test for finite-dimensional multivariate data, and is equal to the mean embeddings difference (MMD) renormalized by a well-chosen covariance operator.

Classically, these non-parametric tests are either calibrated asymptotically, or via test aggregation techniques. Here, we propose to calibrate the test at a given fixed sample size by obtaining non-asymptotic bounds on our test statistic. For this, a regularization is required to approximate the covariance operator via its empirical estimator. Unlike the approaches of Harchaoui et al. (2007) or Hagrass et al. (2023) using L_2 regularizations, we propose spectral truncation. This method fixes the unknown number T of eigenfunctions to reconstruct the covariance operator and provides the additional advantage of data visualization.

Currently, at a fixed T, the test statistic, called the truncated kernel Fisher Discriminant Ratio (KFDA_T), provides a test whose asymptotic calibration is known (Ozier-Lafontaine et al. (2023)). In this talk, I will present how to theoretically and non-asymptotically bound the p-value of the test associated with the KFDA _T. This bound is a first step in defining a good calibration of the hyperparameter T.

In applications, this statistical question is essential in the field of genomics, where the two groups are composed of single-cell RNA -seq data. The goal is to detect distinct or similar biological behaviour between the groups.

Joint work with Bertrand Michel (Université de Nantes, France), Franck Picard (ENS de Lyon, France) and Vincent Rivoirard (Paris-Dauphine, France).

- Speaker: Perrine Lacroix (ENS Lyon)
- Friday 19 January 2024, 14:00–15:00
- Venue: MR12, Centre for Mathematical Sciences.
- Series: Statistics; organiser: Dr Sergio Bacallado.