[R] assigning vector or matrix sparsely (for use with mclapply)
ivo welch
ivo.welch at gmail.com
Tue Mar 27 00:28:11 CEST 2012
Dear R wizards---
I have a wrapper on mclapply() that makes it a little easier for me to
do multiprocessing. (Posting this may make life easier for other
googlers.) I pass a data frame, a vector that tells me what rows
should be recomputed, and the function; and I get back a vector or
matrix of answers.
d <- data.frame( id=1:6, val=11:16 )
loc <- c(TRUE,TRUE,FALSE,TRUE,FALSE,TRUE)
v1 <- mc.byselectrows( d, loc, function(x) x[,2]^2 )
v2 <- mc.byselectrows(d, loc, function(x) cbind(x[,2]^2,x[,2]^3))
mc.byselectrows <- function(data.in, recalclist, FUN, ...) {
data.notdone <- data.in[recalclist,]
cat.stderr("[mc.byselectrows: ", nrow(data.notdone), "rows to be
recomputed out of", nrow(data.in), "]\n")
FUN.ON.ROWS <- function(.index, ...)
as.matrix(FUN(data.notdone[.index,], ...))
soln <- mclapply( as.list(1:nrow(data.notdone)) , FUN.ON.ROWS, ... )
rv <- do.call("rbind", soln) ## omits naming.
if (ncol(rv)==1) rv <- as.vector(rv)
rv
}
this works fine, except that what I want to get NA's in the return
positions that were not recalculated. then, I can write
newdata$y <- ifelse ( is.na(olddata$y), mc.byselectrows( olddata,
is.na(olddata$y), fun.calc.y ), olddata$y )
I can do this very inelegantly, of course. I can merge recalclist
into data.in and then write a loop that substitutes for the do.call to
rbind. yikes. or I could do the recalclist contingency inside the
FUN.ON.ROWS, but this is costly in terms of execution time. are there
obvious solutions? advice appreciated.
regards,
/iaw
----
Ivo Welch (ivo.welch at gmail.com)
More information about the R-help
mailing list