[R] pairwise linear regression between two large datasets
jdnewmil
jdnewmil at dcn.org
Tue Apr 3 06:07:27 CEST 2012
If "any idea of a for loop is disastrous," why are you using apply,
which is basically a for loop?
I think you have framed the question in such a way that loops are
inevitable. You are already using the LHS as a matrix, which is the
main speedup I could think of. However, you can avoid some subscripting
and use colMeans for a bit of speedup.
# fake data... you didn't provide any
A <- data.frame( a=1:10, b=(0:9)*2 )
B <- data.frame( c=A$a+rnorm(10), d=A$b+rnorm(10), e=A$a+A$b+rnorm(10)
)
A <- as.matrix(A)
B <- as.matrix(B)
linRegUtility2 <- function(x,y){
regRes <- lm(y~x)[["residuals"]]
( regRes[ nrow( regRes ), ] - colMeans( regRes ) )/ apply( regRes,
2, sd )
}
C <- apply( A, 2, linRegUtility2, y=B )
However, this is only getting about 9% improvement on my computer.
Stathis Metsovitis <stmetsov at gmail.com> wrote:
>Hi all,
>I am trying to perform some analysis on the residuals of pair-wise
>linear
>regressions between two large sets A with dimensions {k x m}, and B {k
>x n}
>. So I need to regress every column B[,j] of B on every column A[,i]
> of
>A
>and produce a matrix C with dimensions {m x n}, so that C[i,j]
> contains
>the
>z-score of the k-th (last) residual of the aforementioned linear
>regression.
>
>I have tried the following code, but i don't seem to get it work.
>Moreover,
>any idea of using a for loop is disastrous since A and B are very
> large
>matrices. I'd be grateful for any suggestions!
>C <- apply( A[ ,1:dim(A)[2] ], 2, linRegZscore, y=A[ ,1:dim(A)[2]] )
>
>where linRegZscore is the following function
>linRegUtility1 <- function(x,y){
> regRes <- lm(y~x)$residuals
>( regRes[dim(regRes)[1]]-apply(regRes, 2, mean) )/( apply(regRes,2,
> sd)
>)
> }
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list