[Bioc-devel] mclapply and Vector objects

Cook, Malcolm MEC at stowers.org
Tue Sep 25 17:43:28 CEST 2012


> -----Original Message-----
> From: bioc-devel-bounces at r-project.org [mailto:bioc-devel-bounces at r-project.org] On Behalf Of Michael Lawrence
> Sent: Monday, September 24, 2012 1:43 PM
> To: Vincent Carey
> Cc: Michael Lawrence; bioc-devel at r-project.org
> Subject: Re: [Bioc-devel] mclapply and Vector objects
> 
> Good points, Vince. Yes, it would be great to use the more general
> parLapply and expect the client to somehow pass a cluster object that
> controls the parallelization strategy. The mclapply function is a little
> bit special in that it is able to inherit the enclosing environment from
> the parent process. This is not generally feasible. So it seems OK to have
> an mclapply wrapper that makes this assumption clear in a way that is
> easier to read (in my opinion) from a parameterized apply function
> accepting a closure, cluster object, or whatever.
> 
> We actually don't have to do too much copying of the formals to the
> wrapper, because we can simply help mclapply find our as.list generic:
> 
> .mclapplyDefault <- parallel::mclapply
> environment(.mclapplyDefault) <- topenv()
> setMethod("mclapply", "List", .mclapplyDefault)

Michael, I'm close to presenting something that i think will address all concerns, and am review this small flurry of emails to make sure. 

Would you please give me a quick explanation of that the above 'trick' accomplishes.  I'm still knocking around R's OO frameworks.

Also...

> Tricks aside, mclapply has some issues that would be nice to address. The
> main one for me is error handling. I have developed some code that wraps
> the user function in try() and compiles any errors into a special
> "batchCondition" object that provides access to the individual exceptions
> and gives a nice summary of what went wrong. Ideally, that would go into
> parallel, but we could put it in IRanges as a stop gap.

mclapply currently returns a vector of individual error messages as value if any process.  Are you suggesting perhaps mclapply should, in such cases, instead of warn, rather `stop` with a condition (batchCondition ), providing as additional attributes to the condition the vector conditions (try-errors) returned from each process.

If so, yeah, yeah, I like.... sounds great.... but would be a change in behaviour to mclapply.... though probably an improvement that no-one would object to.  

What we have now, like this:

x<-try(mclapply(1:5,simpleError('Hey!'),mc.silent=TRUE))
Warning message:
In mclapply(1:5, stop, simpleError("Hey!"), mc.silent = TRUE) :
  all scheduled cores encountered errors in user code
> class(x)
[1] "list"
> class(x[[1]])
[1] "character"
>x[[1]]
[1] "Error in lapply(X=S, FUN=FUN,...) : 1 Error: I warned You\n\n"



Instead, we are saying, mclapply should raise a batchError which would have as an attribute each individual core's simpleError, say, 'jobCondition'

What could be simpler?

It would then look like this:

x<-try(mclapply(1:5,simpleError('Hey!'),mc.silent=TRUE))
Error:
In x<-try(mclapply(1:5,simpleError('Hey!'),mc.silent=TRUE))
  all scheduled cores encountered errors in user code
> class(x)
 [1] "batchError" "simpleError" "error" "condition"
> x
[1] Error:
In mclapply(1:5, stop, simpleError("Hey!"), mc.silent = TRUE) :
  all scheduled cores encountered errors in user code
attr(,"class")
[1] "batchError" "simpleError" "error" "condition"
attr(,"condition")
<batchError  all scheduled cores encountered errors in user code>
attr(,"jobCondition")
		[,1]	[,2]	[,3]	[,4]	[,5]
jobid		1	2	3	4	5
message	Hey!	Hey!	Hey!	Hey!	Hey!
call		NULL	NULL	NULL	NULL	NULL


This would be sweet and simple.


~Malcolm




> 
> Michael
> 
> On Mon, Sep 24, 2012 at 7:12 AM, Vincent Carey
> <stvjc at channing.harvard.edu>wrote:
> 
> > only caveat is: do we want to commit to mc* in the interface or remain
> > agnostic and allow
> > iterator selection to be dropped in?
> >
> > i looked at the commented out mcseqapply and it seems unfortunate to
> > manually propagate all the
> > mc.* options
> >
> > so what am i suggesting?  i myself had to wonder.
> >
> > interactively, i generally use mclapply(1:N, function(ind) ...) to do get
> > multicore processing for
> > general objects, and when i want a higher-level function that allows users
> > to choose for or against multicore
> > iteration, define an applier parameter that defaults to lapply ... if you
> > have to set options it is probably OK to do that through a closure, if you
> > don't want to have all those potentially unstable parameters cluttering
> > your arg list.
> >
> > so my proposal is: whatever we choose, plan for alternative approaches to
> > multicore execution, and keep the code base slim by allowing the
> > alternatives to be chosen through parameter settings as opposed to distinct
> > interfaces
> >
> >
> > On Mon, Sep 24, 2012 at 9:36 AM, Michael Lawrence <
> > lawrence.michael at gene.com> wrote:
> >
> >>  I should amend this: it would be a method for the List class. Many of the
> >> Vector classes are "atomic" and coercing them to a list is either not
> >> supported or may yield an undesired result. For example, coercing an
> >> IRanges to a list yields a list of integer vectors with the sequence from
> >> start to end. We don't have an lapply,Vector for this reason.
> >>
> >> I actually already made a commented-out mcseqapply. I think I aborted the
> >> mc* stuff back before the parallel package existed, just to avoid adding a
> >> dependency on multicore. With parallel in base R, it's reasonable to add
> >> these methods. If no one else complains, I'll move ahead.
> >>
> >> Michael
> >>
> >> On Mon, Sep 24, 2012 at 6:23 AM, Michael Lawrence <michafla at gene.com>
> >> wrote:
> >>
> >> > It definitely makes sense to have a generic for mclapply that dispatches
> >> > on Vector. Perhaps also for some of the other apply functions in the
> >> > parallel package.
> >> >
> >> > Michael
> >> >
> >> >
> >> > On Mon, Sep 24, 2012 at 3:58 AM, Hahne, Florian <
> >> > florian.hahne at novartis.com> wrote:
> >> >
> >> >> Would it be possible to make mclapply aware of the Vector class?
> >> >> Currently, the following line causes them to be coerced into a regular
> >> >> list which could be rather expensive for instance in the case of
> >> >> GrangesLists:
> >> >> if (!is.vector(X) || is.object(X))
> >> >>         X <- as.list(X)
> >> >>
> >> >>
> >> >> I guess something like
> >> >> if ((!is.vector(X) && !is(X, "Vector")) || is.object(X))
> >> >>         X <- as.list(X)
> >> >> would do the trick.
> >> >>
> >> >> Or am I missing something obvious here?
> >> >> Cheers,
> >> >> Florian
> >> >>
> >> >> _______________________________________________
> >> >> Bioc-devel at r-project.org mailing list
> >> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >> >>
> >> >
> >> >
> >>
> >>         [[alternative HTML version deleted]]
> >>
> >>
> >> _______________________________________________
> >> Bioc-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >
> >
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list