TY - GEN
T1 - Subspace sampling and relative-error matrix approximation
T2 - 9th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2006 and 10th International Workshop on Randomization and Computation, RANDOM 2006
AU - Drineas, Petros
AU - Mahoney, Michael W.
AU - Muthukrishnan, S.
PY - 2006
Y1 - 2006
N2 - Given an m × n matrix A and an integer k less than the rank of A, the "best" rank k approximation to A that minimizes the error with respect to the Frobenius norm is Ak, which is obtained by projecting A on the top k left singular vectors of A. While Ak is routinely used in data analysis, it is difficult to interpret and understand it in terms of the original data, namely the columns and rows of A. For example, these columns and rows often come from some application domain, whereas the singular vectors are linear combinations of (up to all) the columns or rows of A. We address the problem of obtaining low-rank approximations that are directly interpretable in terms of the original columns or rows of A. Our main results are two polynomial time randomized algorithms that take as input a matrix A and return as output a matrix C, consisting of a "small" (i.e., a low-degree polynomial in k, 1/ε;, and log(1/δ)) number of actual columns of A such that ||A - CC+A||F ≤ (1 + ε) ||A - Ak|| F with probability at least 1 - δ. Our algorithms are simple, and they take time of the order of the time needed to compute the top k right singular vectors of A. In addition, they sample the columns of A via the method of "subspace sampling," so-named since the sampling probabilities depend on the lengths of the rows of the top singular vectors and since they ensure that we capture entirely a certain subspace of interest.
AB - Given an m × n matrix A and an integer k less than the rank of A, the "best" rank k approximation to A that minimizes the error with respect to the Frobenius norm is Ak, which is obtained by projecting A on the top k left singular vectors of A. While Ak is routinely used in data analysis, it is difficult to interpret and understand it in terms of the original data, namely the columns and rows of A. For example, these columns and rows often come from some application domain, whereas the singular vectors are linear combinations of (up to all) the columns or rows of A. We address the problem of obtaining low-rank approximations that are directly interpretable in terms of the original columns or rows of A. Our main results are two polynomial time randomized algorithms that take as input a matrix A and return as output a matrix C, consisting of a "small" (i.e., a low-degree polynomial in k, 1/ε;, and log(1/δ)) number of actual columns of A such that ||A - CC+A||F ≤ (1 + ε) ||A - Ak|| F with probability at least 1 - δ. Our algorithms are simple, and they take time of the order of the time needed to compute the top k right singular vectors of A. In addition, they sample the columns of A via the method of "subspace sampling," so-named since the sampling probabilities depend on the lengths of the rows of the top singular vectors and since they ensure that we capture entirely a certain subspace of interest.
UR - http://www.scopus.com/inward/record.url?scp=33750079844&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33750079844&partnerID=8YFLogxK
U2 - 10.1007/11830924_30
DO - 10.1007/11830924_30
M3 - Conference contribution
AN - SCOPUS:33750079844
SN - 3540380442
SN - 9783540380443
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 316
EP - 326
BT - Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 9th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2006 a
PB - Springer Verlag
Y2 - 28 August 2006 through 30 August 2006
ER -