[R] parallel bootstrap linear model on multicore mac (re-post)
Anthony Dick
adick at fiu.edu
Thu Feb 24 20:23:52 CET 2011
Hello all,
I am re-posting my previous question with a simpler, more transparent,
commented code.
I have been ramming my head against this problem, and I wondered if
anyone could lend a hand. I want to make parallel a bootstrap of a
linear mixed model on my 8-core mac. Below is the process that I want to
make parallel (namely, the boot.out<-boot(dat.res,boot.fun, R = nboot)
command). This is an extension to lmer of the bootstrapping linear
models example in Venables and Ripley. Please excuse my rather terrible
programming skills. I am always open to suggestions. Below the example I
describe what methods I have tried.
library(boot)
library(lme4)
dat<-read.table("http://www2.fiu.edu/~adick/downloads/toy2.dat", header = T)
nboot<-1000 # number of bootstraps
attach(dat)
x<-dat[,2] # IV number 1
y<-dat[,4] # DV
z<-dat[,3] # IV number 2
subj<-dat[,1] # random factor
boot.fun<-function(data,i) { # function to resample residuals
d<-data
d$y<- d$fitted+d$res[i] # populate new y values based on
resampled residuals
as.numeric(coef(update(m2.fit,data=d))[1][[1]][1,c(1:4)])
# update the linear model and output the coefficients
}
fit<-lmer(y~x*z + (1|(subj))) # the linear model
dat.res<-data.frame(y,x,z,subj, res=resid(fit), fitted=fitted(fit)) #
add residuals and fitted values to dat
boot.out<-boot(dat.res,boot.fun, R = nboot) # run the bootstrap using
the boot.fun
boot.out
Methods attempted:
Using the multicore package, I tried
boot.out<-collect(parallel(boot(dat.res,boot.fun, R = nboot))). This
returned a correct result, but did not speed things up. Not sure why...
I also tried snowfall and snow. While I can create a cluster and run
simple processes (e.g., provided example from literature), I can't get
the bootstrap to run. For example, using snow:
cl <- makeCluster(8)
clusterSetupRNG(cl)
clusterEvalQ(cl,library(boot))
clusterEvalQ(cl,library(lme4))
boot.out<-clusterCall(cl,boot(dat.res,boot.fun, R = nboot))
stopCluster()
returns the following error:
Error in checkForRemoteErrors(lapply(cl, recvResult)) :
8 nodes produced errors; first error: could not find function "fun"
I am stuck and at the limit of my programming knowledge and am punting
to the R-help list. I need to run this process thousands of times, which
is the reason to make it parallel. Any suggestions are much appreciated.
Anthony
--
Anthony Steven Dick, Ph.D.
Assistant Professor
Department of Psychology
Florida International University
Modesto A. Maidique Campus DM 296B
11200 S.W. 8th Street
Miami, FL 33199
Phone: 305-348-4202
Lab Phone: 305-348-9057 or 305-348-9055 (I am usually here)
Fax: 305-348-3879
Email: adick at fiu.edu
Webpage: http://www.fiu.edu/~adick
More information about the R-help
mailing list