[R] spending most of my time in assignments?
Ross Boylan
ross at biostat.ucsf.edu
Fri Dec 20 00:37:02 CET 2013
My code seems to be spending most of its time in assignment statements,
in some cases simple assignment of a model frame or model matrix.
Can anyone provide any insights into what's going on, or how to speed
things up?
For starters, is it possible that the reports are not accurate, or that
I am misreading them. In R 3.0.1 (running under ESS):
> Rprof(line.profiling=TRUE)
> system.time(r <- totalEffect(dodata[[1]], dodata[[2]], 1:3, 4))
user system elapsed
21.629 0.756 22.469
!> Rprof(NULL)
> summaryRprof(lines="both")
$by.self
self.time self.pct total.time total.pct
box.R#158 6.74 29.56 13.06 57.28
simulator.multinomial.R#64 2.92 12.81 2.96 12.98
simulator.multinomial.R#63 2.76 12.11 2.76 12.11
box.R#171 2.54 11.14 5.08 22.28
simulator.d1.R#70 0.98 4.30 0.98 4.30
simulator.d1.R#71 0.98 4.30 0.98 4.30
densMap.R#42 0.72 3.16 0.86 3.77
"standardGeneric" 0.52 2.28 11.30 49.56
......
Here's some of the code, with comments at the line numbers
box.R:
sp <- merge(sexpartner, data, by="studyidx")
sp$y <- numFactor(sp$pEthnic) #I think y is not used but must be present
data(sims.c1[[k]]) <- sp ###<<<<< line 158
sp0 <- sp
sp <- sim(sims.c1[[k]], i)
ctable[[k]] <- update.c1(ctable[[k]], sp)
if (is.null(i.c1.in)) {
i.c1.in <- match("pEthnic", colnames(sp0))
i.c1.out <- match(c("studyidx", "n", "pEthnic"), colnames(sp))
}
sp0 <- merge(sp0[,-i.c1.in], sp[,i.c1.out], by=c("studyidx", "n"))
# d1
sp0 <- sp0[sp0$pIsMale == 1,]
# avoid lots of conversion warnings
sp0$pEthnic <- factor(sp0$pEthnic, levels=partRaceLevels)
data(sims.d1[[k]]) <- sp0 ###<<<<< line 171
sp <- sim(sims.d1[[k]], i)
dtable[[k]] <- update.d1(dtable[[k]], sp)
rngstate[[k]] <- .Random.seed
The timing seems odd since it doesn't appear there's anything to do at
the 2 lines except invoke data<-, but if that's slow I would expect the
time to go to the data<- function (in a different file) and not to the
call.
In fact the other big time items are inside the data<- functions.
simulator.multinomial.R:
setMethod("data<-", c("simulator.multinomial", "data.frame"),
function(obj, value) {
mf <- model.frame(obj at dataFormula, data=value)
mf$iCluster <- fromOrig(obj at idmap, as.character(mf$studyidx))
if (any(is.na(mf$iCluster)))
stop("New studyidx--need to draw from meta distn")
mm <- model.matrix(obj at modelFormula, data=mf)
obj at data <- mf ##<<< line 63
obj at mm <- mm ##<<< line 64
return(obj)
})
The mm and data slots have type restrictions, but no other validation
tests.
setClass("simulator.multinomial",
representation(fit="stanfit", idmap="sIDMap",
modelFormula="formula",
categories="ANY", # could be factor or character
# categories should be in the order of their numeric codes in y
# cached results
coef="list",
data="data.frame",
dataFormula="formula",
mm="matrix"))
Does it matter that, e.g., a model frame is more than a vanilla data frame?
I thought assignment, given R's lazy copying behavior, was essentially
resetting a pointer, and so should be fast.
Or maybe the time is going to garbage collecting the previous contents
of the slots?
Ross Boylan
More information about the R-help
mailing list