[R-sig-hpc] Problems parallelizing glmnet

Patrik Waldmann patrik.waldmann at boku.ac.at
Thu Sep 6 19:15:07 CEST 2012


I would like to avoid foreach since we showed earlier that it is VERY slow.

Patrik

>>> Zachary Mayer <zach.mayer at gmail.com> 09/06/12 18:09 PM >>>
You could also use the foreach package, if you do not wish to use caret,
e.g.:

tuneAlpha <- function(..., alphas){
  stopifnot(require(foreach))
 #Run Models
modelList <- foreach(alpha=alphas) %dopar% {
  stopifnot(require(glmnet))
cv.glmnet(..., alpha=alpha)
}
 #Choose best model
errors <- unlist(lapply(modelList, function(x) min(sqrt(x$cvm))))
return(modelList[[which.min(errors)]])
}

x <- matrix(rnorm(2000*100),ncol=100)
y <- matrix(rnorm(2000),ncol=1)
model <- tuneAlpha(x, y, alphas=c(0,1), family="gaussian", nfolds=10,
type.measure="mse")

You should probably pass an explicit "foldid" parameter as well, so each
model uses the same cross-validation folds.

On Thu, Sep 6, 2012 at 11:58 AM, Zachary Mayer <zach.mayer at gmail.com> wrote:

> Hasn't the caret package already solved this problem?
>
> You can pass the tuneGrid parameter to specify your custom alpha and
> lambda sequence, an the trainControl parameter to specify what kind of
> cross-validation you wish to use.
>
> Caret uses foreach, so you can register a parallel backend of your choice.
>
> Sent from my iPhone
>
> On Sep 6, 2012, at 11:56 AM, Patrik Waldmann <patrik.waldmann at boku.ac.at>
> wrote:
>
> > I want to run the cv.glmnet function with the same data (y and x) with
> different values on the alpha parameter determined by the number of cores,
> but the result is absurd. What is wrong in the code below?
> >
> > Patrik Waldmann
> >
> > x <- matrix(rnorm(2000*10000),ncol=10000)
> > y <- matrix(rnorm(2000),ncol=1)
> >
> > library(parallel)
> > cvglmnet <- function(...) {
> > library(glmnet)
> > cv.glmnet(x,y,alpha=alphasplit)
> > }
> > system.time(cores<-detectCores())
> > system.time(cl <- makeCluster(cores, methods=FALSE))
> > alpha<-seq(0, 1,by=1/(cores-1))
> > alphasplit<-clusterSplit(cl,alpha)
> > system.time(clusterExport(cl, c("x","y","cvglmnet","alphasplit")))
> > system.time(outbrlist<-clusterEvalQ(cl, cvglmnet(x,y,alphasplit)))
> > system.time(totoutbr<-do.call(cbind,outbrlist))
> > stopCluster(cl)
> >
> >    [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-hpc mailing list
> > R-sig-hpc at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>



More information about the R-sig-hpc mailing list