[R-sig-hpc] Parallel linear model
Martin Morgan
mtmorgan at fhcrc.org
Wed Aug 22 23:21:31 CEST 2012
On 08/22/2012 12:47 AM, Patrik Waldmann wrote:
> Hello,
>
>
> I wonder if someone has experience with efficient ways of implicit parallel execution of (repeated) linear models (as in the non-parallel example below)? Any suggestions on which way to go?
>
> Patrik Waldmann
>
> pval<-c(1:n)
> for (i in 1:n){
> mod <- lm(y ~ x[,i])
> pval[i] <- summary(mod)$coefficients[2,4]
> }
As a different tack, the design matrix is the same across all
regressions, and if your data are consistently structured it may pay to
re-calculate the fit alone. Here's a loosely-tested version that uses a
template from a full fit augmented by the fit of individual columns to
the same model
looselm <- function(y, xi, tmpl)
{
x <- cbind(`(Intercept)`= 1, xi=xi)
z <- lm.fit(x, y)
tmpl[names(z)] <- z
tmpl
}
This is used in f2
f0 <- function(x, y)
lapply(seq_len(ncol(x)),
function(i, x, y) summary(lm(y~x[,i]))$coefficients[2, 4],
x, y)
f1 <- function(x, y, mc.cores=8L)
mclapply(seq_len(ncol(x)),
function(i, x, y) summary(lm(y~x[,i]))$coefficients[2, 4],
x, y, mc.cores=mc.cores)
f2 <- function(x, y) {
tmpl <- lm(y~x[,1])
lapply(seq_len(ncol(x)),
function(i, x, y, tmpl) {
summary(looselm(y, x[,i], tmpl))$coefficients[2, 4]
}, x, y, tmpl)
}
f3 <- function(x, y, mc.cores=8) {
tmpl <- lm(y~x[,1])
mclapply(seq_len(ncol(x)),
function(i, x, y, tmpl) {
summary(looselm(y, x[,i], tmpl))$coefficients[2, 4]
}, x, y, tmpl, mc.cores=mc.cores)
}
with timings (for 1000 x 1000)
> system.time(ans0 <- f0(x, y))
user system elapsed
23.865 1.160 25.120
> system.time(ans1 <- f1(x, y, 8L))
user system elapsed
31.902 6.705 6.708
> system.time(ans2 <- f2(x, y))
user system elapsed
5.285 0.296 5.596
> system.time(ans3 <- f3(x, y, 8L))
user system elapsed
10.256 4.092 2.322
and
> identical(ans0, ans1)
[1] TRUE
> identical(ans0, ans2)
[1] TRUE
> identical(ans0, ans3)
[1] TRUE
Presumably the full summary() machinery is also not required. Likely
there are significant additional short-cuts.
Martin
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the R-sig-hpc
mailing list