Oja’s algorithm for sparse PCA

With Purnamrita Sarkar, University of Texas, Austin

Oja’s algorithm for sparse PCA

Oja’s algorithm for streaming Principal Component Analysis (PCA) for n data points in a d-dimensional space achieves the same sin-squared error as the offline algorithm in O(d) space and O(nd) time and a single pass through the data points. Under this computational budget, we consider the problem of sparse PCA , where the principal eigenvector of the population covariance matrix Sigma is s-sparse, and the effective rank of Sigma can be large. In this setting, to our knowledge, there are no known single-pass algorithms that achieve the minimax error bound in O(d) space and O(nd) time without either requiring strong initialisation conditions or assuming further structure (e.g., spiked) of the covariance matrix. We show that a simple single-pass procedure that thresholds the output of Oja’s algorithm (the Oja vector) can achieve the minimax error bound under some regularity conditions in the aforementioned computational budget.

Our results are centred around a nontrivial and novel analysis of the entries of the unnormalised Oja vector, which involves projecting a product of independent random matrices on a random initial vector. This is completely different from previous analyses of Oja’s algorithm and matrix products, which have been done when the effective rank is relatively small or bounded.

This is joint work with Syamantak Kumar.

Add to your calendar or Include in your list