[Rd] mccollect with NULL in R 3.6

Gergely Daróczi d@rocz|g @end|ng |rom r@pporter@net
Fri May 3 15:04:18 CEST 2019


On Thu, May 2, 2019 at 7:24 PM Tomas Kalibera <tomas.kalibera using gmail.com> wrote:
>
> On 5/1/19 12:25 AM, Gergely Daróczi wrote:
> > Dear All,
> >
> > I'm running into issues with calling mccollect on a list containing NULL
> > using R 3.6 (this used to work in 3.5.3):
> >
> > jobs <- lapply(
> >      list(NULL, 'foobar'),
> >      function(x) mcparallel(identity(x)))
> > mccollect(jobs, wait = FALSE, timeout = 0)
> > #> Error in names(res) <- pnames[match(s, pids)] :
> > #>   'names' attribute [2] must be the same length as the vector [1]
> >
> > Note, setting a "name" for the jobs does not help, but the above works with
> > "wait=TRUE", and also if I change the order of NULL and "foobar", although
> > in that case, the second value (NULL) is ommitted.  It also works with
> > mclapply fine.
> >
> > Any ideas/suggestion on how to get mccollect work with the above example?.
>
> NULL is not a valid job identification. Perhaps mccollect() could give a
> clearer error message, but I don't see, given its documentation, what
> else than throwing an error it should do. What is the problem you were
> trying to solve?

Thank you very much for looking into this!

What was interesting to me is that it used to work before 3.6 -- I
have a script iterating over a list of data frames to train models,
but it started to fail with the 3.6 release.

The "NULL is not a valid job identification" problem doesn't seem to
stand for my production job, as each list element has a proper name,
but I think I can reproduce this with this minimal example as well:

library(parallel)
jobs <- lapply(1:2, function(x) {
    mcparallel(if (x == 1) NULL else x, name = as.character(x))
})
mccollect(jobs, wait = FALSE, timeout = 2)
#> Error in names(res) <- pnames[match(s, pids)] :
#>   'names' attribute [1] must be the same length as the vector [0]

So the jobs have proper name, but the NULL return value is causing
problems. Note, that it only causes problems when the NULL value is
the first, eg switching 1 and 2 works, also running this on 1:3 and
returning NULL on 2 etc.

Now, I'm aware that 7 months ago this was added to the docs at
https://github.com/wch/r-source/commit/f0d15be765dcf92e2349429428d49cd5b212abb4
that NULL should not be returned, so it seems to be a user error on my
end, but it seems to fail only when NULL is the first returned element
in mccollect, and working OK eg when NULL is the 2nd or other element
(although with side effects, eg missed elements).

So maybe failing with an explicit error message whenever mccollect
hits a NULL for consistency might help here instead of skipping
"delivered.result <- delivered.result + 1L" when the returned value is
not raw at https://github.com/wch/r-source/commit/f0d15be765dcf92e2349429428d49cd5b212abb4#diff-e634fbaed323aac88667e7826865b160R72
? Or even better (at least for my use case), maybe allowing to return
NULL and throwing the warning on line 108 in that case.

Thanks for considering this,
Gergely


>
> Best
> Tomas
>
> >
> > Thanks,
> > Gergely
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>



More information about the R-devel mailing list