[R-sig-hpc] Snow, parApply computational times

Elena Grassi grassi.e at gmail.com
Fri Sep 28 13:02:13 CEST 2012


Hello,

I'm trying to use snow and parApply to parallelize a litte script of
mine that computes fisher
tests but I'm puzzled by the timing results. Maybe I should use a matrix
and other functions like parRapply/parCapply?
On a fairly big data set (413795 tests to be done) the simple version
took 449m33.496s,
while the parallel one (with 4 nodes of a socket cluster) 486m24.006s.
While running it begins with 4 processes and after a short time only
one remains active. This
process occupies a bunch of memory so I'm guessing that it's the one
taking care of the "reduce"
step that puts together the results, but I would have hoped in better
performances. [On
a smaller subset of data the parallel results were better than the serial ones].
I'm trying now with mpi but it seems to be aiming at the same results.

Maybe I'm doing something wrong, chosen the wrong data structure or
something like that,
as long as I'm not an experienced R programmer at all, so
I'm pasting here the relevant part of the source code:

get_fisher <- function(counts){
  mat <- matrix(as.numeric(counts[c("a","b", "c", "d")]), ncol=2)
  colnames(mat) <- c('1', '2')
  rownames(mat) <- c('f', 'g')
  f <- fisher.test(as.table(mat), alt="two.sided")
  return(c(counts["id"], f$p.value))
}

if (!is.null(opt$parallel)) {
        library("snow")
        library("Rmpi")
        cl <- makeMPIcluster(opt$parallel)
        #cl <- makeSOCKcluster(rep("localhost", opt$parallel))
}

counts <- read.table("stdin", sep="\t")
colnames(counts) <- c("id", "exon1","gene1", "exon2", "gene2")
if (!is.null(opt$parallel)) {
        fishers <- parApply(cl, counts, 1,  get_fisher)
        stopCluster(cl)
} else {
        fishers <- apply(counts, 1,  get_fisher)
}
df <- as.data.frame(fishers)
write.table(df, file="", sep="\t", quote=FALSE, col=F, append=T, row.names=F)

Thank you for your help,
Elena Grassi
-- 
http://www.biocut.unito.it/



More information about the R-sig-hpc mailing list