[R] assigning vector or matrix sparsely (for use with mclapply)
ilai
keren at math.montana.edu
Wed Mar 28 04:27:46 CEST 2012
It is (at least for me) really unclear what the problem is, or how
it's related to mclapply.
You say
" this works fine, except that what I want to get NA's in the return
positions that were not recalculated. then, I can write
>
> newdata$y <- ifelse ( is.na(olddata$y), mc.byselectrows( olddata,
> is.na(olddata$y), fun.calc.y ), olddata$y )
"
Why ???
Are you applying the function twice ? than why not simply
v1.1 <- mc.byselectrows( d, loc<1, function(x) x[,2]^2 )
the second time ?
If the problem is in keeping track of which rows got calculated, why
not rename with the row.names omitted after mclapply (probably a good
idea anyway):
FUN.ON.ROWS <- function(.index, ...)
as.matrix(FUN(data.notdone[.index,], ...))
soln <- mclapply( as.list(1:nrow(data.notdone)) , FUN.ON.ROWS, ... )
rv <- do.call("rbind", soln) ## omits naming.
if (ncol(rv)==1){ rv <- as.vector(rv) ; names(rv) <- row.names(data.notdone) }
else rownames(rv) <- row.names(data.notdone)
rv
}
And finally, you don't even need row.names for c(v1,d[loc<1,2])
Or am I missing something here ?
BTW your code uses cat.stderr (which is local ? ) instead of cat, and
has no call to multicore.
Cheers
>
On Mon, Mar 26, 2012 at 4:28 PM, ivo welch <ivo.welch at gmail.com> wrote:
> Dear R wizards---
>
> I have a wrapper on mclapply() that makes it a little easier for me to
> do multiprocessing. (Posting this may make life easier for other
> googlers.) I pass a data frame, a vector that tells me what rows
> should be recomputed, and the function; and I get back a vector or
> matrix of answers.
>
> d <- data.frame( id=1:6, val=11:16 )
> loc <- c(TRUE,TRUE,FALSE,TRUE,FALSE,TRUE)
> v1 <- mc.byselectrows( d, loc, function(x) x[,2]^2 )
> v2 <- mc.byselectrows(d, loc, function(x) cbind(x[,2]^2,x[,2]^3))
>
> mc.byselectrows <- function(data.in, recalclist, FUN, ...) {
>
> data.notdone <- data.in[recalclist,]
> cat.stderr("[mc.byselectrows: ", nrow(data.notdone), "rows to be
> recomputed out of", nrow(data.in), "]\n")
>
> FUN.ON.ROWS <- function(.index, ...)
> as.matrix(FUN(data.notdone[.index,], ...))
> soln <- mclapply( as.list(1:nrow(data.notdone)) , FUN.ON.ROWS, ... )
> rv <- do.call("rbind", soln) ## omits naming.
> if (ncol(rv)==1) rv <- as.vector(rv)
> rv
> }
>
> this works fine, except that what I want to get NA's in the return
> positions that were not recalculated. then, I can write
>
> newdata$y <- ifelse ( is.na(olddata$y), mc.byselectrows( olddata,
> is.na(olddata$y), fun.calc.y ), olddata$y )
>
> I can do this very inelegantly, of course. I can merge recalclist
> into data.in and then write a loop that substitutes for the do.call to
> rbind. yikes. or I could do the recalclist contingency inside the
> FUN.ON.ROWS, but this is costly in terms of execution time. are there
> obvious solutions? advice appreciated.
>
> regards,
>
> /iaw
> ----
> Ivo Welch (ivo.welch at gmail.com)
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list