[Rd] profiler and loops
Romain Francois
romain.francois at dbmail.com
Tue Mar 3 13:33:14 CET 2009
Hello,
(This is follow up from this thread:
http://www.nabble.com/execution-time-of-.packages-td22304833.html but
with a different focus)
I am often confused by the result of the profiler, when a loop is
involved. Consider these two scripts:
script1:
Rprof( )
x <- numeric( )
for( i in 1:10000){
x <- c( x, rnorm(10) )
}
Rprof( NULL )
print( summaryRprof( ) )
script2:
Rprof( )
ffor <- function(){
x <- numeric( )
for( i in 1:10000){
x <- c( x, rnorm(10) )
}
}
ffor()
Rprof( NULL )
print( summaryRprof( ) )
[]$ time Rscript --vanilla script1.R
$by.self
self.time self.pct total.time total.pct
"rnorm" 0.22 100 0.22 100
$by.total
total.time total.pct self.time self.pct
"rnorm" 0.22 100 0.22 100
$sampling.time
[1] 0.22
real 0m7.786s
user 0m5.192s
sys 0m0.735s
[]$$ time Rscript --vanilla script2.R
$by.self
self.time self.pct total.time total.pct
"ffor" 4.94 92.5 5.34 100.0
"rnorm" 0.40 7.5 0.40 7.5
$by.total
total.time total.pct self.time self.pct
"ffor" 5.34 100.0 4.94 92.5
"rnorm" 0.40 7.5 0.40 7.5
$sampling.time
[1] 5.34
real 0m7.841s
user 0m5.152s
sys 0m0.712s
In the first one, I call a for loop from the top level and in the second
one, the loop is wrapped in a function call. This shows the inability of
the profiler to point loops as responsible for bottlenecks. The coder of
script1 would not know what to do to improve on the script.
I have had a quick look in the code, and here are a few thoughts:
in the function "doprof" in eval.c, this loop write the call stack on
the profiler file:
for (cptr = R_GlobalContext; cptr; cptr = cptr->nextcontext) {
if ((cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN))
&& TYPEOF(cptr->call) == LANGSXP) {
SEXP fun = CAR(cptr->call);
if (!newline) newline = 1;
fprintf(R_ProfileOutfile, "\"%s\" ",
TYPEOF(fun) == SYMSXP ? CHAR(PRINTNAME(fun)) :
"<Anonymous>");
}
}
so we can see it only cares about context CTXT_FUNCTION and
CTXT_BUILTIN, when for loops play with CTXT_LOOP (this is again in
eval.c within the do_for function)
begincontext(&cntxt, CTXT_LOOP, R_NilValue, rho, R_BaseEnv, R_NilValue,
R_NilValue);
which as the name implies, begins the context of the for loop. The
begincontext function looks like this :
void begincontext(RCNTXT * cptr, int flags,
SEXP syscall, SEXP env, SEXP sysp,
SEXP promargs, SEXP callfun)
{
cptr->nextcontext = R_GlobalContext;
cptr->cstacktop = R_PPStackTop;
cptr->evaldepth = R_EvalDepth;
cptr->callflag = flags;
cptr->call = syscall;
cptr->cloenv = env;
cptr->sysparent = sysp;
cptr->conexit = R_NilValue;
cptr->cend = NULL;
cptr->promargs = promargs;
cptr->callfun = callfun;
cptr->vmax = vmaxget();
cptr->intsusp = R_interrupts_suspended;
cptr->handlerstack = R_HandlerStack;
cptr->restartstack = R_RestartStack;
cptr->prstack = R_PendingPromises;
#ifdef BYTECODE
cptr->nodestack = R_BCNodeStackTop;
# ifdef BC_INT_STACK
cptr->intstack = R_BCIntStackTop;
# endif
#endif
R_GlobalContext = cptr;
}
So it could be possible to set the last argument of the begincontext
function to "for" and use this code in the doprof function:
for (cptr = R_GlobalContext; cptr; cptr = cptr->nextcontext) {
if ( ( cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN ) )
&& TYPEOF(cptr->call) == LANGSXP) {
SEXP fun = CAR(cptr->call);
if (!newline) newline = 1;
fprintf(R_ProfileOutfile, "\"%s\" ",
TYPEOF(fun) == SYMSXP ? CHAR(PRINTNAME(fun)) :
"<Anonymous>");
} else if( cptr->callflag & CTXT_LOOP){
SEXP fun = CAR(cptr->syscall);
if (!newline) newline = 1;
fprintf(R_ProfileOutfile, "\"%s\" ", CHAR(PRINTNAME(fun)) );
}
}
so that we see for in the list of "functions" that appear in the
profiler file.
Obviously I am taking some shortcuts here, because of the other loops,
but I would like to make a formal patch with this. Before I do that, I'd
like to know :
- is this has a chance of breaking something else (does the CTXT_LOOP
being R_NilValue is used elsewhere)
- would this feature be welcome.
- Should I differentiate real functions with loops in the output file,
maybe I can write "[for]" instead of for to emphacize this is not a
function.
Romain
--
Romain Francois
Independent R Consultant
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
--
Romain Francois
Independent R Consultant
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
More information about the R-devel
mailing list