[R] Puzzled by what Rprof is telling me

Luke Tierney luke at stat.umn.edu
Tue Jun 11 22:35:25 CEST 2002


Don't know if it is an issue in this case but lazy evaluation can
sometimes lead to difficulties in interpreting Rprof ouptut since the
time used to evaluate arguments is attributed by the profiler to the
function that forces the argument evaluation.

luke

On Tue, Jun 11, 2002 at 10:00:37AM -0700, Don MacQueen wrote:
> I am using Rprof() to help find ways to improve performance.
> 
> I found a function whose total seconds and self seconds were large. I 
> replaced it with something else. The something else had a small 
> number of total seconds and self seconds. But the total time did not 
> decrease.
> 
> I don't understand how that could be, and would appreciate any suggestions.
> 
> Thanks
> -Don
> 
> 
> Details, unfortunately lengthy, are here:
> 
> The snippet of code I'm profiling is given below. 'inmat' is a 
> numeric matrix with 128347 rows; I need to apply a function, sfun(), 
> to each of 4191 subsets of rows, indexed by indat$grpi. sfun() is 
> fairly complex, and requires information from all of the columns of 
> inmat, so simple uses of, for example, aggregate() don't do the job.
> 
> Rprof('Rpf3n')
> for (i in unique(indat$grpi)) {
>   ssnew[i,] <- sfun(inmat[indat$grpi==i,,drop=FALSE])
> }
> Rprof(NULL)
> 
> 
> ----------------- Rprof output, v1 ----------------
> Each sample represents 0.02 seconds.
> Total run time: 404.999999999922 seconds.
> 
> Total seconds: time spent in function and callees.
> Self seconds: time spent in function alone.
> 
>     %       total       %       self
>   total    seconds     self    seconds    name
> 100.00    405.00      0.00      0.00     "source"
> 100.00    405.00      0.11      0.46     "eval.with.vis"
>   99.88    404.50      4.70     19.04     "sfun"
>   83.98    340.10     41.25    167.06     "nrow"
>   42.68    172.84     42.68    172.84     "=="
>    3.08     12.48      0.50      2.04     "lm.fit"
>    1.72      6.96      0.23      0.94     "median"
> (remainder omitted)
> 
> nrow() has 340 total and 41 self seconds, a large portion of the 
> total run time.
> 
> There is only one call to nrow() in my code,
>    ntot <- nrow(dati)
> where dati is a subset of the numeric matrix, indat.
> 
> I tried replacing nrow(dati) with length(dati[,1])
> 
> The profile is then:
> 
> ----------------- Rprof output, v2 ----------------
> Each sample represents 0.02 seconds.
> Total run time: 403.699999999923 seconds.
> 
> Total seconds: time spent in function and callees.
> Self seconds: time spent in function alone.
> 
>     %       total       %       self
>   total    seconds     self    seconds    name
> 100.00    403.70      0.11      0.46     "eval.with.vis"
> 100.00    403.70      0.00      0.00     "source"
>   99.88    403.20     46.03    185.82     "sfun"
>   42.37    171.06     42.37    171.06     "=="
>    3.00     12.10      0.59      2.40     "lm.fit"
>    1.70      6.88      0.22      0.90     "median"
> (skip many lines until)
>    0.01      0.04      0.01      0.04     "length"
> (remainder omitted)
> 
> This is where I'm puzzled.
> nrow() has disappeared as expected
> length() has not taken its place
> Yet the total run time is about the same.
> 
> 
> As an aside, I had previously been using sapply() and split(), and 
> that was far slower. Rprof() helped me find out why it was slow. 
> Also, the matrix ssnew is created before the loop, with the right 
> dimensions, so it's not being grown during the loop.
> 
> -- 
> --------------------------------------
> Don MacQueen
> Environmental Protection Department
> Lawrence Livermore National Laboratory
> Livermore, CA, USA
> --------------------------------------
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

-- 
Luke Tierney
University of Minnesota                      Phone:           612-625-7843
School of Statistics                         Fax:             612-624-8868
313 Ford Hall, 224 Church St. S.E.           email:      luke at stat.umn.edu
Minneapolis, MN 55455 USA                    WWW:  http://www.stat.umn.edu
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list