Stochastic optimization is widely-used in many areas, most recently in large-scale machine learning and data science, but its use in these areas is quite different than its use in more traditional areas of operations research, scientific computing, and statistics. In particular, second order optimization methods have been ubiquitous historically, but they are rarely used in machine learning and data science, compared to their first order counterparts. Motivated by well-known problems of first order methods, however, recent work has begun to experiment with second order methods for machine learning problems. By exploiting recent results from Randomized Numerical Linear Algebra, we establish improved bounds for algorithms that incorporate sub-sampling as a way to improve computational efficiency, while maintaining the original convergence properties of these algorithms. These results provide quantitative convergence results for variants of Newton’s methods, where the Hessian and/or the gradient is uniformly or non-uniformly sub-sampled, under much weaker assumptions than prior work; and these results include extensions to trust region and cubic regularization algorithms for non-convex optimization problems. When applied to complex machine learning tasks such as training deep neural networks, empirical results demonstrate that these methods perform quite well, both in ways that one would expect (e.g., leading to improved conditioning in the presence of so-called exploding/vanishing gradients) as well as in ways that are more surprising but more interesting (e.g., using so-called adversarial examples to architect the objective function surface to be more amenable to optimization algorithms).
- Speaker: Michael W. Mahoney (ICSI and Department of Statistics, UC Berkeley)
- Wednesday 08 May 2019, 14:00–15:00
- Venue: MR11, Centre for Mathematical Sciences, Wilberforce Road, Cambridge.
- Series: Statistics; organiser: HoD Secretary, DPMMS.