[R-sig-hpc] Embarrasingly parallel computation of models (practically solved; conceptually still unclear)
Simon Urbanek
simon.urbanek at r-project.org
Fri Sep 6 16:35:23 CEST 2013
On Sep 6, 2013, at 10:13 AM, Michael Kubovy wrote:
> Unfortunately I get the same failure with
> (1) formulaList[[1]] <- 'y ~ x'
> and
> (2) formulaList[[1]] <- as.formula('y ~ x')
> and
> (3) formulaList[[1]] <- as.formula(y ~ x)
> and
> (4) formulaList[[1]] <- y ~ x # doesn't make sense to me
>
> All this is puzzling in light of
> fit <- lm(sr ~ ., data = LifeCycleSavings)
> frml <- as.formula('sr ~ .')
> fit <- lm(frml, data = LifeCycleSavings)
>
> and
>
> frml <- vector('list', 4)
> frml[[1]] <- as.formula('sr ~ .')
> (fit <- lm(frml[[1]], data = LifeCycleSavings))
>
> Call:
> lm(formula = frml[[1]], data = LifeCycleSavings)
>
> Coefficients:
> (Intercept) pop15 pop75 dpi ddpi
> 28.566087 -0.461193 -1.691498 -0.000337 0.409695
>
> so (2) should work. The problem must be how I use the foreach() function. So instead of
>
> outputList <- foreach(i = 1:4) %dopar% glmer(formulaList, family = binomial, data = selected)
>
> which failed, I tried
>
> outputList <- foreach(i = 1:4) %dopar% glmer(formulaList[[i]], family = binomial, data = selected)
>
> which worked! But that seems to me not to be in spirit of foreach. So how do I tell it where to look for the list over which i is supposed to range?
>
It is in the spirit of foreach - you're telling it to loop over an index instead of the list. What you probably meant is
foreach(formula = formulaList) %dopar% glmer(formula, family = binomial, data = selected)
It's maybe more R-like (and less typing ;)) to write it as
mclapply(formulaList, glmer, family = binomial, data = selected)
Cheers,
Simon
> MK
>
> On Sep 6, 2013, at 9:32 AM, Stephen Weston <stephen.b.weston at gmail.com> wrote:
>
>> Hi Michael,
>>
>> I think the problem is that you're creating a list of strings rather
>> than a list of formulas. If you don't use single-quotes around the
>> formulas, I think it will work.
>>
>> - Steve
>>
>> On Fri, Sep 6, 2013 at 7:34 AM, Michael Kubovy <kubovy at virginia.edu> wrote:
>>> Dear HPC people,
>>>
>>> I'm trying to leasrn how to run multiple independent models in parrallel. I have succesfully run parallel computations on different sets of data (using 12 of my 24 cores). But there's something (probably trivial) about the nature of formulae that is escaping me. Here is what I tried.
>>>
>>> I store my formulae:
>>>
>>> formulaList <- vector('list', length = 4)
>>> formulaList[[1]] <- 'resp ~ BAratio +dispDiff + (BAratio + dispDiff | subject)'
>>> formulaList[[2]] <- 'resp ~ BAratio * dispDiff + (BAratio * dispDiff | subject)'
>>> formulaList[[3]] <- 'resp ~ BA3Dratio + dispDiff + (BA3Dratio + dispDiff | subject)'
>>> formulaList[[4]] <- 'resp ~ BA3Dratio * dispDiff + (BA3Dratio * dispDiff | subject)'
>>>
>>> I get workers:
>>>
>>> require(foreach)
>>> require(doMC)
>>> registerDoMC()
>>> getDoParWorkers()
>>>
>>> I try to run:
>>>
>>> outputList <- foreach(i = 1:4) %dopar% glmer(formulaList, family = binomial, data = selected)
>>>
>>> I get: Error: task 1 failed - "invalid formula"
>>>
>>> FYI
>>>
>>>> sessionInfo()
>>> R version 3.0.1 Patched (2013-09-02 r63805)
>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] splines datasets utils stats graphics grDevices methods base
>>>
>>> other attached packages:
>>> [1] lme4_0.999999-2 Matrix_1.0-12 lattice_0.20-23 xtable_1.7-1 gamlss_4.2-6 nlme_3.1-111
>>> [7] gamlss.data_4.2-6 gamlss.dist_4.2-0 plyr_1.8 foreach_1.4.1 sos_1.3-7 brew_1.0-6
>>> [13] ggplot2_0.9.3.1 car_2.0-18 nnet_7.3-7 MASS_7.3-29
>>>
>>> loaded via a namespace (and not attached):
>>> [1] codetools_0.2-8 colorspace_1.2-2 compiler_3.0.1 dichromat_2.0-0 digest_0.6.3 grid_3.0.1
>>> [7] gtable_0.1.2 iterators_1.0.6 labeling_0.2 munsell_0.4.2 proto_0.3-10 RColorBrewer_1.0-5
>>> [13] reshape2_1.2.2 scales_0.2.3 stats4_3.0.1 stringr_0.6.2 survival_2.37-4 tools_3.0.1
>
> ______________________________________________
> Professor Michael Kubovy
> University of Virginia
> Department of Psychology
> for mail add: for FedEx or UPS add:
> P.O.Box 400400 Gilmer Hall, Room 102
> Charlottesville, VA 22904-4400 485 McCormick Road
> USA Charlottesville, VA 22903
> room phone
> Office: B011 +1-434-982-4729
> Lab: B019 +1-434-982-4751
> WWW: http://www.people.virginia.edu/~mk9y/
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>
>
More information about the R-sig-hpc
mailing list