[R] assigning vector or matrix sparsely (for use with mclapply)

ivo welch ivo.welch at gmail.com
Tue Mar 27 00:28:11 CEST 2012


Dear R wizards---

I have a wrapper on mclapply() that makes it a little easier for me to
do multiprocessing.  (Posting this may make life easier for other
googlers.)  I pass a data frame, a vector that tells me what rows
should be recomputed, and the function; and I get back a vector or
matrix of answers.

   d <- data.frame( id=1:6, val=11:16 )
   loc <- c(TRUE,TRUE,FALSE,TRUE,FALSE,TRUE)
   v1 <- mc.byselectrows( d, loc, function(x) x[,2]^2 )
   v2 <- mc.byselectrows(d, loc, function(x) cbind(x[,2]^2,x[,2]^3))

mc.byselectrows <- function(data.in, recalclist, FUN, ...) {

  data.notdone <- data.in[recalclist,]
  cat.stderr("[mc.byselectrows: ", nrow(data.notdone), "rows to be
recomputed out of", nrow(data.in), "]\n")

  FUN.ON.ROWS <- function(.index, ...)
as.matrix(FUN(data.notdone[.index,], ...))
  soln <- mclapply( as.list(1:nrow(data.notdone)) , FUN.ON.ROWS, ... )
  rv <- do.call("rbind", soln)  ## omits naming.
  if (ncol(rv)==1) rv <- as.vector(rv)
  rv
}

this works fine, except that what I want to get NA's in the return
positions that were not recalculated.  then, I can write

  newdata$y <- ifelse ( is.na(olddata$y), mc.byselectrows( olddata,
is.na(olddata$y), fun.calc.y ), olddata$y )

I can do this very inelegantly, of course.  I can merge recalclist
into data.in and then write a loop that substitutes for the do.call to
rbind.  yikes.  or I could do the recalclist contingency inside the
FUN.ON.ROWS, but this is costly in terms of execution time.  are there
obvious solutions?  advice appreciated.

regards,

/iaw
----
Ivo Welch (ivo.welch at gmail.com)



More information about the R-help mailing list