Adaptation through prior tails and deep neural networks

With Ismael Castillo (Sorbonne Université)

Adaptation through prior tails and deep neural networks

We introduce a scalable Bayesian method to derive adaptation to structural properties such as smoothness or intrinsic dimensions. The main idea is to define a prior distribution that combines tails that are `heavy’ together with small deterministic scaling factors. This produces a `soft’ form of variable selection. A main algorithmic advantage is that there is no need to explicitly design a variable selection prior in the form of a hyper-prior. We illustrate this approach in nonparametric models: regression, density estimation and classification.

We examine the trade-off between choice of the tails and choice of scalings in the prior. While polynomial tails (e.g. Student) can lead to full adaptation, we also show that lighter tails (e.g. Laplace or `1/p’-Weibull) still provide improved rates compared to Gaussian tails and even full adaptation in an appropriate limit of vanishing tail index ‘p’.

We then discuss the method in the context of ReLU neural networks. We consider an over-parameterised deterministic network architecture. When using iid Student-type priors on network weights, the corresponding posterior distribution and its mean-field variational counterpart enjoy fully adaptive (to both smoothness and structure) convergence rates. We finally discuss work in progress using lighter tails for priors on the weights, and that connects to neural network estimators that have been previously implemented in practice but for which theoretical support is still limited.

This talk is based on joint works with Sergios Agapiou (Cyprus), Julyan Arbel (INRIA Grenoble) and Paul Egels (Sorbonne).

Add to your calendar or Include in your list