[R] [R-pkgs] softImpute_1.0 uploaded to CRAN
hastie at stanford.edu
Tue Apr 2 20:14:44 CEST 2013
SoftImpute is a new package for matrix completion - i.e. for imputing missing values in matrices.
SoftImpute was written by myself and Rahul Mazumder.
softImpute uses uses squared-error loss with nuclear norm regularization - one can think of it as
the "lasso" for matrix approximation - to find a low-rank approximation to the observed entries in the matrix.
This low-rank approximation is then used to impute the missing entries.
softImpute works in a kind of "EM" fashion. Given a current guess, it fills in the missing entries.
Then it computes a soft-thresholded SVD of this complete matrix, which yields the next guess.
These steps are iterated till convergence to the solution of the convex-optimation problem.
The algorithm can work with large matrices, such as the "netflix" matrix (400K x 20K) by making heavy use
of sparse-matrix methods in the Matrix package. It creates new S4 classes such as "Incomplete" for storing the large
data matrix, and "SparseplusLowRank" for representing the completed matrix. SVD computations are done using
a specially built block-alternating algorithm, svd.als, that exploits these structures and uses warm starts.
Some of the methods used are described in
Rahul Mazumder, Trevor Hastie and Rob Tibshirani:
Spectral Regularization Algorithms for Learning Large Incomplete Matrices.
JMLR 2010 11 2287-2322
Other newer and more efficient methods that inter-weave the alternating block algorithm steps with imputation steps will
be described in a forthcoming article.
Some of the features of softImpute are
* works with large matrices using sparse matrix methods, or smaller matrices using standard svd methods.
* one can control the maximum rank of the solution, to avoid overly expensive operations.
* warm starts can be used to move from one solution to a new solution with a different value for the nuclear-norm regularization parameter lambda (and/or a different rank)
* with lambda=0 and a specified rank, one automatically gets an implementation of "hardImpute" - iterative svd imputation
* softImpute has an option "type" which can be "svd" or "als" (alternating least squares), for specifying which of the two approaches above should be used.
*included in the package is svd.als, an efficient rank-restricted svd algorithm that can exploit sparsity and other special structure, and accept warm starts.
* a function biScale is provided, for centering and scaling both rows and columns of matrix to have means zero and variance 1. The centering and scaling
constants are stored on the object. For sparse matrices with centering, the centered object is stored in "SparseplusLowRank" form to preserve its special structure
* prediction functions impute and complete are provided.
Trevor Hastie hastie at stanford.edu
Professor, Department of Statistics, Stanford University
Phone: (650) 725-2231 Fax: (650) 725-8977
address: room 104, Department of Statistics, Sequoia Hall
390 Serra Mall, Stanford University, CA 94305-4065
R-packages mailing list
R-packages at r-project.org
More information about the R-help