[R] interpreting profiling output

Tue Jul 27 05:23:07 CEST 2004

I know that we ESS-help'ers are brilliant at everything, and emacs can
and will solve every problem known to man, but perhaps this message
would be more illuminating if directed to r-devel?

best,
-tony

Kasper Daniel Hansen <k.hansen at biostat.ku.dk> writes:

> Luke Tierney <luke at stat.uiowa.edu> writes:
>
>> On Sun, 25 Jul 2004, Peter Dalgaard wrote:
>>
>>> Kasper Daniel Hansen <k.hansen at biostat.ku.dk> writes:
>>> 
>>> > I have some trouble interpreting the output from profiling. I have
>>> > read the help pages Rprof, summaryRprof and consult the R extensions
>>> > manual, but I still have problems understanding the output.
>>> > 
>>> > Basically the output consist of self.time and total.time. I have the
>>> > understanding that total.time is the time spent in a given function
>>> > including any subcalls or child functions or whatever the technical
>>> > term for that activity is. In contrasts self.time is the time spent in
>>> > the function excluding subcalls.
>>> > 
>>> > Now, in my understanding basically everything in R is functions. I
>>> > would then guess that for almost any function (except the "atomic
>>> > ones") the self.time would be very small as it would spend most of its
>>> > time calling other functions (again, since almost everything is a
>>> > function). So how do R determine when a subfunction is called?
>>> ...brutal snippage...
>>> > I guess some of the answers would be clear if I had a firm grasp of
>>> > the inner workings of R :)
>>> 
>>> In a word, yes... 
>>> 
>>> The profiling keeps track of R's *context stack* which is not quite
>>> the same as function calls. Essentially, it only counts R functions that
>>> are actually written in R, but not .Internal, .Primitive, etc. So
>>> "self" counts the amount of time that a function was at the top of the
>>> stack. This is done by a periodic poll which dumps out the context
>>> stack at regular intervals. Seeing cases where the self percentages
>>> don't add to 100% is, I believe, simply due to truncation of the tails
>>> -- that is, there is a large number of different functions which each
>>> are counted a few times, and these are not shown in the summary output.
>>> 
>>> [Sorry about the late reply, but I was out of town, and noone seems to
>>> have answered this.]
>>
>> Just one additional clarification: There are two flavors of
>> .Primitive, 'builtin' and 'special'.  You can tell which is which
>> using typeof().  Builtins are things like arithmetic operations and
>> specials include control constructs like `for`.  At present profiling
>> does record builtins but not specials.  Recording specials is
>> something we would like to do eventually but it requires more changes
>> to the call stack (and dealing with more interections with those
>> changes) so it hasn't happened yet. .Internal is a special, so no
>> .Internal calls are recorded.
>>
>> luke
>>
>> -- 
>> Luke Tierney
>> University of Iowa                  Phone:             319-335-3386
>> Department of Statistics and        Fax:               319-335-3017
>>    Actuarial Science
>> 241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
>> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
>
> Thanks to both of you for considering my question. I still have some
> problems however. Consider the nnet.default function which uses the .C
> interface. As 
>> typeof(.C)
> [1] "builtin"
> I would guess - from my current understanding - that the time spent in
> the C functions of nn.default are added to its self.time. Now look at
> the output
>
>> prof02.out$by.self[1:10,]
>                   self.time self.pct total.time total.pct
> nnet.default          33.32     11.2    2186.48      92.1
>
> So nnet.default only used 33.3 on its C calls, but 2186 in total
> time. From this I would guess that nnet.default - in my case - spends
> its majority of time setting up the fit and postprocessing it (as this
> is done by R functions which does not add to its self time). The
> actual fit (which is done by the C functions) only takes very little
> time. 
>
> Looking further down the table I see
>
>                   self.time self.pct total.time total.pct
> as.integer            22.28      7.5      30.92       1.3
>
> as.interger is basically a .Internal call - which ought not to be
> recorded. But why does it then have such a "high" self.time?
>
> And finally, if a program (such as mine) have a high discrepancy
> between self.time and sampling.time, does this imply that the far
> majority of time is spent in .Primitives which are of type special?
> -- 
> Kasper Daniel Hansen, Research Assistant
> Department of Biostatistics, University of Copenhagen
>
> ______________________________________________
> ESS-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/ess-help
>

-- 
Anthony Rossini			    Research Associate Professor
rossini at u.washington.edu            http://www.analytics.washington.edu/ 
Biomedical and Health Informatics   University of Washington
Biostatistics, SCHARP/HVTN          Fred Hutchinson Cancer Research Center
UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
FHCRC  (M/W): 206-667-7025 FAX=206-667-4812 | use Email

CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}