[R] spending most of my time in assignments?

Ross Boylan ross at biostat.ucsf.edu
Fri Dec 20 00:37:02 CET 2013


My code seems to be spending most of its time in assignment statements,
in some cases simple assignment of a model frame or model matrix.

Can anyone provide any insights into what's going on, or how to speed
things up?

For starters, is it possible that the reports are not accurate, or that
I am misreading them.  In R 3.0.1 (running under ESS):
 > Rprof(line.profiling=TRUE)
 > system.time(r <- totalEffect(dodata[[1]], dodata[[2]], 1:3, 4))
    user  system elapsed
  21.629   0.756  22.469
!> Rprof(NULL)                                                                                                                                                                                                                                                                 
 > summaryRprof(lines="both")
 $by.self
                            self.time self.pct total.time total.pct
 box.R#158                       6.74    29.56      13.06     57.28                                                                                                                                                                                                            
 simulator.multinomial.R#64      2.92    12.81       2.96     12.98                                                                                                                                                                                                            
 simulator.multinomial.R#63      2.76    12.11       2.76     12.11                                                                                                                                                                                                            
 box.R#171                       2.54    11.14       5.08     22.28                                                                                                                                                                                                            
 simulator.d1.R#70               0.98     4.30       0.98      4.30                                                                                                                                                                                                            
 simulator.d1.R#71               0.98     4.30       0.98      4.30                                                                                                                                                                                                            
 densMap.R#42                    0.72     3.16       0.86      3.77                                                                                                                                                                                                            
 "standardGeneric"               0.52     2.28      11.30     49.56
......

Here's some of the code, with comments at the line numbers
box.R:
                sp <- merge(sexpartner, data, by="studyidx")                                                                                                                                                                                                                   
                sp$y <- numFactor(sp$pEthnic)  #I think y is not used but must be present                                                                                                                                                                                      
                data(sims.c1[[k]]) <- sp    ###<<<<< line 158                                                                                                                                                                                                                                   
                sp0 <- sp                                                                                                                                                                                                                                                      
                sp <- sim(sims.c1[[k]], i)                                                                                                                                                                                                                                     
                ctable[[k]] <- update.c1(ctable[[k]], sp)                                                                                                                                                                                                                      
                if (is.null(i.c1.in)) {                                                                                                                                                                                                                                        
                    i.c1.in <- match("pEthnic", colnames(sp0))                                                                                                                                                                                                                 
                    i.c1.out <- match(c("studyidx", "n", "pEthnic"), colnames(sp))                                                                                                                                                                                             
                }                                                                                                                                                                                                                                                              
                sp0 <- merge(sp0[,-i.c1.in], sp[,i.c1.out], by=c("studyidx", "n"))                                                                                                                                                                                             
                # d1                                                                                                                                                                                                                                                           
                sp0 <- sp0[sp0$pIsMale == 1,]                                                                                                                                                                                                                                  
                # avoid lots of conversion warnings                                                                                                                                                                                                                            
                sp0$pEthnic <- factor(sp0$pEthnic, levels=partRaceLevels)                                                                                                                                                                                                      
                data(sims.d1[[k]]) <- sp0    ###<<<<< line 171                                                                                                                                                                                                                                
                sp <- sim(sims.d1[[k]], i)                                                                                                                                                                                                                                     
                dtable[[k]] <- update.d1(dtable[[k]], sp)                                                                                                                                                                                                                      
                rngstate[[k]] <- .Random.seed   
The timing seems odd since it doesn't appear there's anything to do at
the 2 lines except invoke data<-, but if that's slow I would expect the
time to go to the data<- function (in a different file) and not to the
call.

In fact the other big time items are inside the data<- functions.
simulator.multinomial.R:

   setMethod("data<-", c("simulator.multinomial", "data.frame"),
          function(obj, value) {
    mf <- model.frame(obj at dataFormula, data=value)
    mf$iCluster <- fromOrig(obj at idmap, as.character(mf$studyidx))
    if (any(is.na(mf$iCluster)))
        stop("New studyidx--need to draw from meta distn")
    mm <- model.matrix(obj at modelFormula, data=mf)
    obj at data <- mf  ##<<< line 63
    obj at mm <- mm    ##<<< line 64
    return(obj)
})

The mm and data slots have type restrictions, but no other validation
tests.
setClass("simulator.multinomial",
         representation(fit="stanfit", idmap="sIDMap",
                        modelFormula="formula",
                        categories="ANY",  # could be factor or character                                                                                                                                                                                                      
                                        # categories should be in the order of their numeric codes in y                                                                                                                                                                        
                        # cached results                                                                                                                                                                                                                                       
                        coef="list",
                        data="data.frame",
                        dataFormula="formula",
                        mm="matrix"))
Does it matter that, e.g., a model frame is more than a vanilla data frame?

I thought assignment, given R's lazy copying behavior, was essentially
resetting a pointer, and so should be fast.

Or maybe the time is going to garbage collecting the previous contents
of the slots?

Ross Boylan



More information about the R-help mailing list