[R-sig-hpc] Problems parallelizing glmnet

Thu Sep 6 19:45:53 CEST 2012

Have a look at 'Parallel linear model', foreach was ten times slower (at that specific example). I don't know the reason.

Patrik

>>> Zachary Mayer <zach.mayer at gmail.com> 09/06/12 19:20 PM >>>
I was not aware of this unfortunate limitation.  Was it specific to a
particular back-end, or just foreach in general?

On Thu, Sep 6, 2012 at 1:15 PM, Patrik Waldmann
<patrik.waldmann at boku.ac.at>wrote:

> I would like to avoid foreach since we showed earlier that it is VERY slow.
>
> Patrik
>
> >>> Zachary Mayer <zach.mayer at gmail.com> 09/06/12 18:09 PM >>>
> You could also use the foreach package, if you do not wish to use caret,
> e.g.:
>
> tuneAlpha <- function(..., alphas){
>   stopifnot(require(foreach))
>  #Run Models
> modelList <- foreach(alpha=alphas) %dopar% {
>   stopifnot(require(glmnet))
> cv.glmnet(..., alpha=alpha)
> }
>  #Choose best model
> errors <- unlist(lapply(modelList, function(x) min(sqrt(x$cvm))))
> return(modelList[[which.min(errors)]])
> }
>
> x <- matrix(rnorm(2000*100),ncol=100)
> y <- matrix(rnorm(2000),ncol=1)
> model <- tuneAlpha(x, y, alphas=c(0,1), family="gaussian", nfolds=10,
> type.measure="mse")
>
> You should probably pass an explicit "foldid" parameter as well, so each
> model uses the same cross-validation folds.
>
> On Thu, Sep 6, 2012 at 11:58 AM, Zachary Mayer <zach.mayer at gmail.com>
> wrote:
>
> > Hasn't the caret package already solved this problem?
> >
> > You can pass the tuneGrid parameter to specify your custom alpha and
> > lambda sequence, an the trainControl parameter to specify what kind of
> > cross-validation you wish to use.
> >
> > Caret uses foreach, so you can register a parallel backend of your
> choice.
> >
> > Sent from my iPhone
> >
> > On Sep 6, 2012, at 11:56 AM, Patrik Waldmann <patrik.waldmann at boku.ac.at
> >
> > wrote:
> >
> > > I want to run the cv.glmnet function with the same data (y and x) with
> > different values on the alpha parameter determined by the number of
> cores,
> > but the result is absurd. What is wrong in the code below?
> > >
> > > Patrik Waldmann
> > >
> > > x <- matrix(rnorm(2000*10000),ncol=10000)
> > > y <- matrix(rnorm(2000),ncol=1)
> > >
> > > library(parallel)
> > > cvglmnet <- function(...) {
> > > library(glmnet)
> > > cv.glmnet(x,y,alpha=alphasplit)
> > > }
> > > system.time(cores<-detectCores())
> > > system.time(cl <- makeCluster(cores, methods=FALSE))
> > > alpha<-seq(0, 1,by=1/(cores-1))
> > > alphasplit<-clusterSplit(cl,alpha)
> > > system.time(clusterExport(cl, c("x","y","cvglmnet","alphasplit")))
> > > system.time(outbrlist<-clusterEvalQ(cl, cvglmnet(x,y,alphasplit)))
> > > system.time(totoutbr<-do.call(cbind,outbrlist))
> > > stopCluster(cl)
> > >
> > >    [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > R-sig-hpc mailing list
> > > R-sig-hpc at r-project.org
> > > https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
> >
>
>