[R] pairwise linear regression between two large datasets

jdnewmil jdnewmil at dcn.org
Tue Apr 3 06:07:27 CEST 2012


If "any idea of a for loop is disastrous," why are you using apply, 
which is basically a for loop?

I think you have framed the question in such a way that loops are 
inevitable.  You are already using the LHS as a matrix, which is the 
main speedup I could think of. However, you can avoid some subscripting 
and use colMeans for a bit of speedup.

# fake data... you didn't provide any
A <- data.frame( a=1:10, b=(0:9)*2 )
B <- data.frame( c=A$a+rnorm(10), d=A$b+rnorm(10), e=A$a+A$b+rnorm(10) 
)
A <- as.matrix(A)
B <- as.matrix(B)

linRegUtility2 <- function(x,y){
    regRes <- lm(y~x)[["residuals"]]
    ( regRes[ nrow( regRes ), ] - colMeans( regRes ) )/ apply( regRes, 
2, sd )
    }
C <- apply( A, 2, linRegUtility2, y=B )

However, this is only getting about 9% improvement on my computer.

Stathis Metsovitis <stmetsov at gmail.com> wrote:

>Hi all,
>I am trying to perform some analysis on the residuals of pair-wise
>linear
>regressions between two large sets A with dimensions {k x m}, and B {k
>x n}
>. So I need to regress every column B[,j] of B on every column A[,i] 
> of
>A
>and produce a matrix C with dimensions {m x n}, so that C[i,j] 
> contains
>the
>z-score of the k-th (last) residual of the aforementioned linear
>regression.
>
>I have tried the following code, but i don't seem to get it work.
>Moreover,
>any idea of using a for loop is disastrous since A and B are very 
> large
>matrices. I'd be grateful for any suggestions!
>C <- apply( A[ ,1:dim(A)[2] ], 2, linRegZscore, y=A[ ,1:dim(A)[2]] )
>
>where linRegZscore is the following function
>linRegUtility1 <- function(x,y){
>  regRes <- lm(y~x)$residuals
>( regRes[dim(regRes)[1]]-apply(regRes, 2, mean) )/( apply(regRes,2, 
> sd)
>)
>  }
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list