[R] assigning vector or matrix sparsely (for use with mclapply)

Wed Mar 28 04:27:46 CEST 2012

It is (at least for me) really unclear what the problem is, or how
it's related to mclapply.
You say
" this works fine, except that what I want to get NA's in the return
 positions that were not recalculated.  then, I can write
>
>  newdata$y <- ifelse ( is.na(olddata$y), mc.byselectrows( olddata,
> is.na(olddata$y), fun.calc.y ), olddata$y )
"
Why ???
Are you applying the function twice ?  than why not simply
v1.1 <- mc.byselectrows( d, loc<1, function(x) x[,2]^2 )
the second time ?

If the problem is in keeping track of which rows got calculated, why
not rename with the row.names omitted after mclapply (probably a good
idea anyway):

FUN.ON.ROWS <- function(.index, ...)
as.matrix(FUN(data.notdone[.index,], ...))
  soln <- mclapply( as.list(1:nrow(data.notdone)) , FUN.ON.ROWS, ... )
  rv <- do.call("rbind", soln)  ## omits naming.
  if (ncol(rv)==1){ rv <- as.vector(rv) ; names(rv) <- row.names(data.notdone) }
  else rownames(rv) <- row.names(data.notdone)
 rv
}

And finally, you don't even need row.names for c(v1,d[loc<1,2])

Or am I missing something here ?

BTW your code uses cat.stderr (which is local ? ) instead of cat, and
has no call to multicore.

Cheers

>
On Mon, Mar 26, 2012 at 4:28 PM, ivo welch <ivo.welch at gmail.com> wrote:
> Dear R wizards---
>
> I have a wrapper on mclapply() that makes it a little easier for me to
> do multiprocessing.  (Posting this may make life easier for other
> googlers.)  I pass a data frame, a vector that tells me what rows
> should be recomputed, and the function; and I get back a vector or
> matrix of answers.
>
>   d <- data.frame( id=1:6, val=11:16 )
>   loc <- c(TRUE,TRUE,FALSE,TRUE,FALSE,TRUE)
>   v1 <- mc.byselectrows( d, loc, function(x) x[,2]^2 )
>   v2 <- mc.byselectrows(d, loc, function(x) cbind(x[,2]^2,x[,2]^3))
>
> mc.byselectrows <- function(data.in, recalclist, FUN, ...) {
>
>   data.notdone <- data.in[recalclist,]
>   cat.stderr("[mc.byselectrows: ", nrow(data.notdone), "rows to be
> recomputed out of", nrow(data.in), "]\n")
>
>   FUN.ON.ROWS <- function(.index, ...)
> as.matrix(FUN(data.notdone[.index,], ...))
>   soln <- mclapply( as.list(1:nrow(data.notdone)) , FUN.ON.ROWS, ... )
>   rv <- do.call("rbind", soln)  ## omits naming.
>   if (ncol(rv)==1) rv <- as.vector(rv)
>   rv
> }
>
> this works fine, except that what I want to get NA's in the return
> positions that were not recalculated.  then, I can write
>
>  newdata$y <- ifelse ( is.na(olddata$y), mc.byselectrows( olddata,
> is.na(olddata$y), fun.calc.y ), olddata$y )
>
> I can do this very inelegantly, of course.  I can merge recalclist
> into data.in and then write a loop that substitutes for the do.call to
> rbind.  yikes.  or I could do the recalclist contingency inside the
> FUN.ON.ROWS, but this is costly in terms of execution time.  are there
> obvious solutions?  advice appreciated.
>
> regards,
>
> /iaw
> ----
> Ivo Welch (ivo.welch at gmail.com)
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.