[R-sig-hpc] Parallel linear model

Thu Aug 23 01:03:36 CEST 2012

Martin:

This  is a great example and I would like to use it in class.  But I
think I don't understand the implications of the system.time output
you get.  I have a question about this below. Would you share your
thoughts?

On Wed, Aug 22, 2012 at 4:21 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
> On 08/22/2012 12:47 AM, Patrik Waldmann wrote:
>>
>> Hello,
>>
>>
>> I wonder if someone has experience with efficient ways of implicit
>> parallel execution of (repeated) linear models (as in the non-parallel
>> example below)? Any suggestions on which way to go?
>>
>> Patrik Waldmann
>>
>> pval<-c(1:n)
>> for (i in 1:n){
>> mod <- lm(y ~ x[,i])
>> pval[i] <- summary(mod)$coefficients[2,4]
>> }
>
>
> As a different tack, the design matrix is the same across all regressions,
> and if your data are consistently structured it may pay to re-calculate the
> fit alone. Here's a loosely-tested version that uses a template from a full
> fit augmented by the fit of individual columns to the same model
>
> looselm <- function(y, xi, tmpl)
> {
>     x <- cbind(`(Intercept)`= 1, xi=xi)
>     z <- lm.fit(x, y)
>     tmpl[names(z)] <- z
>     tmpl
> }
>
> This is used in f2
>
> f0 <- function(x, y)
>     lapply(seq_len(ncol(x)),
>            function(i, x, y) summary(lm(y~x[,i]))$coefficients[2, 4],
>            x, y)
>
> f1 <- function(x, y, mc.cores=8L)
>     mclapply(seq_len(ncol(x)),
>              function(i, x, y) summary(lm(y~x[,i]))$coefficients[2, 4],
>              x, y, mc.cores=mc.cores)
>
> f2 <- function(x, y) {
>     tmpl <- lm(y~x[,1])
>     lapply(seq_len(ncol(x)),
>            function(i, x, y, tmpl)  {
>                summary(looselm(y, x[,i], tmpl))$coefficients[2, 4]
>            }, x, y, tmpl)
> }
>
> f3 <- function(x, y, mc.cores=8) {
>     tmpl <- lm(y~x[,1])
>     mclapply(seq_len(ncol(x)),
>              function(i, x, y, tmpl)  {
>                  summary(looselm(y, x[,i], tmpl))$coefficients[2, 4]
>              }, x, y, tmpl, mc.cores=mc.cores)
> }
>
> with timings (for 1000 x 1000)
>
>> system.time(ans0 <- f0(x, y))
>    user  system elapsed
>  23.865   1.160  25.120
>> system.time(ans1 <- f1(x, y, 8L))
>    user  system elapsed
>  31.902   6.705   6.708
>> system.time(ans2 <- f2(x, y))
>    user  system elapsed
>   5.285   0.296   5.596
>> system.time(ans3 <- f3(x, y, 8L))
>    user  system elapsed
>  10.256   4.092   2.322
>
The ans2 version has user 5.2 and system 0.29, which are much better than ans3.

Ordinarily, I'd focus on the "user" part, and I'd think f2 (ordinary
lapply) is much faster.

However, the "elapsed" value for ans3 is half of ans2. How can elapsed
be smaller for ans3? I'm guessing that a larger amount of work is
divided among 8 cores?

When the mulicore functionality is called, "user" and "system" numbers
double because _____?

But the total time elapsed is smaller because a larger amount of
compute time is divided among more cores?

In a system with many users logged in at the same time, it appears the
reasonable thing is to tell them to use lapply, as in ans2, because
the aggregate amount of computational power used is one-half of the
multicore amount in ans3.  I mean, if we have a finite amount of
computation that can take place, the multicore requires twice as much
aggregate work.  While that one user benefits from smaller elapsed
time, the aggregate of the system's amount of work is doubled.

Or am I just thinking of this like a Soviet Communist of the 1950s....

pj

> Martin
>

-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu