Presenter: Aaron Zhengyuan Tian
Faculty Sponsor: Cameron Musco
School: UMass Amherst
Research Area: Computer Science
Session: Poster Session 3, 1:15 PM - 2:00 PM, 163, C15
ABSTRACT
In modern data analysis, it is common to encounter matrices so large that we cannot afford to compute or store them in full. A canonical example comes from kernel methods in machine learning, where a dataset of n points induces an n x n kernel matrix measuring pairwise similarity between the data points. When the size of the dataset is large, forming all n2 entries is infeasible. Low-rank approximation addresses this bottleneck by replacing the matrix by a smaller, compressed form, leading to significant speedups in practice. For certain classes of matrices, including positive semidefinite (psd) matrices, it is possible to compute a high quality low-rank approximation with only a small number of matrix entries and in sublinear time.
We study sampling methods for sublinear low-rank approximation of psd matrices. Prior work has largely split into two camps: methods that are simple and fast in practice but come with weak error guarantees, and methods with strong error guarantees that are substantially more complex to implement. Recently, Randomly Pivoted Cholesky (RPCholesky) broke this barrier by producing a simple, empirically fast algorithm with strong trace error guarantees.
Motivated by this result, we ask whether an analogous approach is possible to attain Frobenius error guarantees. We give an empirical and theoretical analysis of randomized sampling methods for sublinear low-rank approximation. Our preliminary results show that Nyström methods, including RPCholesky, cannot achieve a (1+ε) relative-error guarantee in the Frobenius norm with few matrix entries. However, we hypothesize that Frobenius relative-error guarantees remain attainable via sampling according to approximate column norms.