[R] Puzzled by what Rprof is telling me
Don MacQueen
macq at llnl.gov
Tue Jun 11 19:00:37 CEST 2002
I am using Rprof() to help find ways to improve performance.
I found a function whose total seconds and self seconds were large. I
replaced it with something else. The something else had a small
number of total seconds and self seconds. But the total time did not
decrease.
I don't understand how that could be, and would appreciate any suggestions.
Thanks
-Don
Details, unfortunately lengthy, are here:
The snippet of code I'm profiling is given below. 'inmat' is a
numeric matrix with 128347 rows; I need to apply a function, sfun(),
to each of 4191 subsets of rows, indexed by indat$grpi. sfun() is
fairly complex, and requires information from all of the columns of
inmat, so simple uses of, for example, aggregate() don't do the job.
Rprof('Rpf3n')
for (i in unique(indat$grpi)) {
ssnew[i,] <- sfun(inmat[indat$grpi==i,,drop=FALSE])
}
Rprof(NULL)
----------------- Rprof output, v1 ----------------
Each sample represents 0.02 seconds.
Total run time: 404.999999999922 seconds.
Total seconds: time spent in function and callees.
Self seconds: time spent in function alone.
% total % self
total seconds self seconds name
100.00 405.00 0.00 0.00 "source"
100.00 405.00 0.11 0.46 "eval.with.vis"
99.88 404.50 4.70 19.04 "sfun"
83.98 340.10 41.25 167.06 "nrow"
42.68 172.84 42.68 172.84 "=="
3.08 12.48 0.50 2.04 "lm.fit"
1.72 6.96 0.23 0.94 "median"
(remainder omitted)
nrow() has 340 total and 41 self seconds, a large portion of the
total run time.
There is only one call to nrow() in my code,
ntot <- nrow(dati)
where dati is a subset of the numeric matrix, indat.
I tried replacing nrow(dati) with length(dati[,1])
The profile is then:
----------------- Rprof output, v2 ----------------
Each sample represents 0.02 seconds.
Total run time: 403.699999999923 seconds.
Total seconds: time spent in function and callees.
Self seconds: time spent in function alone.
% total % self
total seconds self seconds name
100.00 403.70 0.11 0.46 "eval.with.vis"
100.00 403.70 0.00 0.00 "source"
99.88 403.20 46.03 185.82 "sfun"
42.37 171.06 42.37 171.06 "=="
3.00 12.10 0.59 2.40 "lm.fit"
1.70 6.88 0.22 0.90 "median"
(skip many lines until)
0.01 0.04 0.01 0.04 "length"
(remainder omitted)
This is where I'm puzzled.
nrow() has disappeared as expected
length() has not taken its place
Yet the total run time is about the same.
As an aside, I had previously been using sapply() and split(), and
that was far slower. Rprof() helped me find out why it was slow.
Also, the matrix ssnew is created before the loop, with the right
dimensions, so it's not being grown during the loop.
--
--------------------------------------
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
--------------------------------------
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list