[R] lm models over all possible pairwise combinations of the columns of two matrices

Bert Gunter gunter.berton at gene.com
Tue Apr 22 16:00:45 CEST 2014


Well...

If my arithmetic and understanding is correct, that's 32 billion
combinations, which, to put it politely, is nuts. As all you'll be
doing is generating random numbers anyway, the fastest way to do this
is just to use a random number generator.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
H. Gilbert Welch




On Mon, Apr 21, 2014 at 11:32 PM, Matthew Robinson
<m.robinson11 at uq.edu.au> wrote:
> Dear all,
>
> I am working through a problem at the moment and have got stuck. I have searched around on the help list for assistance but could not find anything - but apologies if I have missed something. A dummy example of my problem is below. I will continue to work on it, but any help would be greatly appreciated.
>
> Thanks in advance for your time.
>
> Best wishes,
> Matt
>
>
> I have a matrix of response variables:
>
> p<-matrix(c(rnorm(120,1),
> rnorm(120,1),
> rnorm(120,1)),
> 120,3)
>
> and two matrices of covariates:
>
> g<-matrix(c(rep(1:3, each=40),
> rep(3:1, each=40),
> rep(1:3, 40)),
> 120,3)
> m<-matrix(c(rep(1:2, 60),
> rep(2:1, 60),
> rep(1:2, each=60)),
> 120,3)
>
> For all combinations of the columns of the covariate matrices g and m I want to run these two models:
>
> test <- function(uniq_m, uniq_g, p = p) {
>
>
> full <- lm(p ~ factor(uniq_m) * factor(uniq_g))
>     null <- lm(p ~ factor(uniq_m) + factor(uniq_g))
>     return(list('f'=full, 'n'=null))
> }
>
> So I want to test for an interaction between column 1 of m and column 1 of g, then column 2 of m and column 1 of g, then column 2 of m and column 2 of g...and so forth across all possible pairwise interactions. The response variable is the same each time and is a matrix containing multiple columns.
>
>
> So far, I can do this for a single combination of columns:
>
> test_1 <- test(m[ ,1], g[ ,1], p)
>
> And I can also run the model over all columns of m and one coloumn of g:
>
> test_2 <- apply(m, 2, function(uniq_m) {
> test(uniq_m, g[ ,1], p = p)
> })
>
>
> I can then get the F statistics for each response variable of each model:
>
> sapply(summary(test_2[[1]]$f), function(x) x$fstatistic)
> sapply(summary(test_2[[1]]$n), function(x) x$fstatistic)
>
> And I can compare models for each response variable using an F-test:
>
> d1<-colSums(matrix(residuals(test_2[[1]]$n),nrow(g),ncol(p))^2)
> d2<-colSums(matrix(residuals(test_2[[2]]$f),nrow(g),ncol(p))^2)
> F<-((d1-d2) / (d2/114))
>
>
> My question is how do I run the lm models over all combinations of columns from the m and the g matrix, and get the F-statistics? While this is a dummy example, the real analysis will have a response matrix that is 700 x 8000, and the covariate matrices will be 700 x 4000 and 700 x 100 so I need something that is as fast as possible.
>
>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list