[R-sig-hpc] Embarrasingly parallel computation of models (practically solved; conceptually still unclear)

Simon Urbanek simon.urbanek at r-project.org
Fri Sep 6 16:35:23 CEST 2013


On Sep 6, 2013, at 10:13 AM, Michael Kubovy wrote:

> Unfortunately I get the same failure with
> (1) formulaList[[1]] <- 'y ~ x'
> and
> (2) formulaList[[1]] <- as.formula('y ~ x')
> and
> (3) formulaList[[1]] <- as.formula(y ~ x)
> and 
> (4) formulaList[[1]] <- y ~ x # doesn't make sense to me
> 
> All this is puzzling in light of
> fit <- lm(sr ~ ., data = LifeCycleSavings)
> frml <- as.formula('sr ~ .')
> fit <- lm(frml, data = LifeCycleSavings)
> 
> and
> 
> frml <- vector('list', 4)
> frml[[1]] <- as.formula('sr ~ .')
> (fit <- lm(frml[[1]], data = LifeCycleSavings))
> 
> Call:
> lm(formula = frml[[1]], data = LifeCycleSavings)
> 
> Coefficients:
> (Intercept)        pop15        pop75          dpi         ddpi  
>  28.566087    -0.461193    -1.691498    -0.000337     0.409695  
> 
> so (2) should work. The problem must be how I use the foreach() function. So instead of
> 
> outputList <- foreach(i = 1:4) %dopar% glmer(formulaList, family = binomial, data = selected)
> 
> which failed, I tried
> 
> outputList <- foreach(i = 1:4) %dopar% glmer(formulaList[[i]], family = binomial, data = selected)
> 
> which worked! But that seems to me not to be in spirit of foreach. So how do I tell it where to look for the list over which i is supposed to range?
> 

It is in the spirit of foreach - you're telling it to loop over an index instead of the list. What you probably meant is

foreach(formula = formulaList) %dopar% glmer(formula, family = binomial, data = selected)

It's maybe more R-like (and less typing ;)) to write it as

mclapply(formulaList, glmer, family = binomial, data = selected)

Cheers,
Simon


> MK
> 
> On Sep 6, 2013, at 9:32 AM, Stephen Weston <stephen.b.weston at gmail.com> wrote:
> 
>> Hi Michael,
>> 
>> I think the problem is that you're creating a list of strings rather
>> than a list of formulas.  If you don't use single-quotes around the
>> formulas, I think it will work.
>> 
>> - Steve
>> 
>> On Fri, Sep 6, 2013 at 7:34 AM, Michael Kubovy <kubovy at virginia.edu> wrote:
>>> Dear HPC people,
>>> 
>>> I'm trying to leasrn how to run multiple independent models in parrallel. I have succesfully run parallel computations on different sets of data (using 12 of my 24 cores). But there's something (probably trivial) about the nature of formulae that is escaping me. Here is what I tried.
>>> 
>>> I store my formulae:
>>> 
>>> formulaList <- vector('list', length = 4)
>>> formulaList[[1]] <- 'resp ~ BAratio +dispDiff + (BAratio + dispDiff | subject)'
>>> formulaList[[2]] <- 'resp ~ BAratio * dispDiff + (BAratio * dispDiff | subject)'
>>> formulaList[[3]] <- 'resp ~ BA3Dratio + dispDiff + (BA3Dratio + dispDiff | subject)'
>>> formulaList[[4]] <- 'resp ~ BA3Dratio * dispDiff + (BA3Dratio * dispDiff | subject)'
>>> 
>>> I get workers:
>>> 
>>> require(foreach)
>>> require(doMC)
>>> registerDoMC()
>>> getDoParWorkers()
>>> 
>>> I try to run:
>>> 
>>> outputList <- foreach(i = 1:4) %dopar% glmer(formulaList, family = binomial, data = selected)
>>> 
>>> I get: Error: task 1 failed - "invalid formula"
>>> 
>>> FYI
>>> 
>>>> sessionInfo()
>>> R version 3.0.1 Patched (2013-09-02 r63805)
>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>> 
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>> 
>>> attached base packages:
>>> [1] splines   datasets  utils     stats     graphics  grDevices methods   base
>>> 
>>> other attached packages:
>>> [1] lme4_0.999999-2   Matrix_1.0-12     lattice_0.20-23   xtable_1.7-1      gamlss_4.2-6      nlme_3.1-111
>>> [7] gamlss.data_4.2-6 gamlss.dist_4.2-0 plyr_1.8          foreach_1.4.1     sos_1.3-7         brew_1.0-6
>>> [13] ggplot2_0.9.3.1   car_2.0-18        nnet_7.3-7        MASS_7.3-29
>>> 
>>> loaded via a namespace (and not attached):
>>> [1] codetools_0.2-8    colorspace_1.2-2   compiler_3.0.1     dichromat_2.0-0    digest_0.6.3       grid_3.0.1
>>> [7] gtable_0.1.2       iterators_1.0.6    labeling_0.2       munsell_0.4.2      proto_0.3-10       RColorBrewer_1.0-5
>>> [13] reshape2_1.2.2     scales_0.2.3       stats4_3.0.1       stringr_0.6.2      survival_2.37-4    tools_3.0.1
> 
> ______________________________________________
> Professor Michael Kubovy
> University of Virginia
> Department of Psychology
> for mail add:						for FedEx or UPS add: 
> P.O.Box 400400					Gilmer Hall, Room 102
> Charlottesville, VA 22904-4400	485 McCormick Road
> USA							Charlottesville, VA 22903
> 		room	phone
> Office:    B011	+1-434-982-4729
> Lab:        B019	+1-434-982-4751
> WWW:    http://www.people.virginia.edu/~mk9y/
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
> 
> 



More information about the R-sig-hpc mailing list