This page will tell you how to delete rows:
https://stat.ethz.ch/pipermail/r-help/2007-November/146220.html
I have used lm() with matrices, however, there maybe a more efficient method. I don't have the need for matrix functions, often. Hopefully, someone can direct you, if there is a more efficient method.
~Nicole Ford
Ph.D. Student
Graduate Assistant/ Instructor
Department of Government and International Affairs
University of South Florida
office: SOC 012M
Sent from my iPhone
On Feb 26, 2013, at 11:41 AM, Jie wrote:
> Hi All,
>
> I have millions of regression lines to fit. So I am looking for the
> most efficient approach in R.
>
> Details:
> I have a large desing matrix X. The dimension is n by p.
> Each time when fitting the model, select rows from this matrix X and
> form a new design matrix, called X_current.
> There is another binary matrix M, with dim m by n, and each row is a
> 1*n vector. It helps to determin X_current.
> For instance, if the first row of M is (1,0,1,0,1,..,1), then I remove
> the 2nd and 4th row from X, and use this submatrix as the design
> matrix to fit the model.
> if the second row is (0,0,0,1,...,1), then I use X[4:n,] as input
> matrix to do the regression,
> etc.
>
> I would like to extract a single element of estimated parameter beta,
> and the SSE from the results.
>
> So far what I know is to use these functions which performs better than lm():
> 1. lsfit() # this works faster than lm(), but have to extract the
> results without using ls.print(print=F), maybe I/O slows down. And not
> better than the other two approaches
> 2. directly use solve(X'X) and the formula from score equation to get
> the results # not sure how well for almost singular or wired matrices
> 3. directly use qr() with LAPACK or qr.solve() and the same formula
> # qr decomposion is reliable but takes a little more time than solve()
>
> Below is a piece of code for demonstration and time comparision. I do
> not know the best way to generate random non-singular design matrix,
> so I add try() to ignore errors.
> I did not include apply/sapply/lapply, seems they can save the memory
> but in my case do not improve. Thank you.
>
> #----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> M = matrix( NA, nrow = 10^4, ncol = 200)
> for (i in 1:10^4)
> {
> M[i,] = sample(c(0,1), size=200, replace = T)
> }
>
> X = cbind( rnorm(200,1,10) ,1:200, sample(1:10^5,200,T))
> Y = rnorm(200)
>
>
>
> # loop version
> resu1=matrix(NA,nrow=10^4, ncol=2)
> resu2=matrix(NA,nrow=10^4, ncol=2)
> resu3=matrix(NA,nrow=10^4, ncol=2)
> resu4=matrix(NA,nrow=10^4, ncol=2)
>
>
> t1 = proc.time()
> for (i in 1:10^4)
> {
> fit.ls = lsfit( x = X[M[i,],], y = Y[M[i,]], intercept=F)
> resu1[i,] = c( fit.ls$coeff[2], sum((fit.ls$res)^2))
> }
> t2 = proc.time()
> for (i in 1:10^4)
> {
> hatM = t(X[M[i,],])%*%X[M[i,],]
> try( beta <- ( solve( hatM ) %*% t( X[M[i,],] )%*%Y[M[i,]] ), silent = T)
> try( varr <- sum((Y[M[i,]]-X[M[i,],]%*%beta)^2), silent = T)
> try( resu2[i, ] <- c( beta[2], varr ), silent = T)
> }
> t3 = proc.time()
>
> for (i in 1:10^4)
> {
> hatM = t(X[M[i,],])%*%X[M[i,],]]
> try( beta <- ( qr.solve( hatM ) %*% t( X[M[i,],] )%*%Y[M[i,]] ), silent = T)
> try( varr <- sum(( Y[M[i,]]-X[M[i,],]%*%beta)^2 ), silent = T)
> try( resu2[i, ] <- c( beta[2], varr )), silent = T)
> }
> t4 = proc.time()
>
> (t2-t1)[3]
> (t3-t2)[3]
> (t4-t3)[3]
>
>
> #----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]