[R] performance gap between R 1.7.1 and 1.8.0
Liaw, Andy
andy_liaw at merck.com
Thu Dec 4 19:20:06 CET 2003
> From: Martin Maechler [mailto:maechler at stat.math.ethz.ch]
[...]
> A very first step of diagnosis might be to activate
> trace(read.dcf)
> trace(library)
> options(verbose = TRUE)
>
> A step further might be to patch read.dcf such that it prints
> info
> if(getOption("verbose")) { <<print what I am trying to read>> }
>
> Martin
To refresh people's memory, the problem is that the identical code took more
than twice as long in R-1.8.x than in R-1.7.x. I've managed to strip the
code down to the following and still see the same problem. It basically
bootstraps mixture models to groups of data. It uses boot() from the boot
package and flexmix() with intercept only, in the flexmix package.
=============================================
fitmix <- function(x, verbosefit=FALSE) {
bpunlist <- function(x) {
unlisted <- numeric()
for(i in 1:length(x at components)) {
unlisted <- c(unlisted, unlist(parameters(x, component=i)))
}
unlisted
}
unifit <- flexmix(x ~ 1, k = 1, control=list(verbose=0))
unlist(unlist(list(G = 1, proportion = unifit at prior,
param=matrix(bpunlist(unifit), nrow=2),
bic=BIC(unifit), loglik=unifit at logLik)))
}
bootmix <- function(data, i) {
d <- data
d.grps <- split.data.frame(d[i,], d[i, "id"])
comps <- vector("list", length=length(d.grps))
for (j in seq(along=comps)) {
thefit <- try(fitmix(d.grps[[j]][,1], verbosefit=FALSE))
comps[[j]] <- if (inherits(thefit, "try-error")) rep(NA, 48) else thefit
}
return(unlist(comps))
}
## The actual commands:
require(flexmix)
require(boot)
set.seed(76421)
x <- rnorm(3e3)
id <- gl(10, 300)
mydf <- data.frame(x, id)
Rprof(filename=paste("Rprof.out", system("hostname -s", int=TRUE), sep="."))
system.time(res <- boot(mydf, bootmix, strata=mydf[,"id"], R=5))
Rprof(NULL)
summaryRprof(filename=paste("Rprof.out", system("hostname -s", int=TRUE),
sep="."))$by.self[1:20,]
=============================================
I ran this on a Xeon 2.4GHz running Mandrake 9.0, R compiled from source.
This took 2 seconds in R-1.7.1, with the output:
self.time self.pct total.time total.pct
"FUN" 0.18 9 0.24 12
"names" 0.16 8 0.16 8
"initialize" 0.10 5 0.80 40
"apply" 0.08 4 0.30 15
"[.data.frame" 0.06 3 0.22 11
"inherits" 0.06 3 0.18 9
"is.null" 0.06 3 0.06 3
"match" 0.06 3 0.22 11
"seq.default" 0.06 3 0.06 3
"structure" 0.06 3 0.08 4
"any" 0.04 2 0.14 7
"el" 0.04 2 0.04 2
"FLXfit" 0.04 2 1.18 59
"lapply" 0.04 2 0.22 11
"lm.wfit" 0.04 2 0.16 8
"names<-.default" 0.04 2 0.04 2
"slot<-" 0.04 2 0.18 9
"<" 0.02 1 0.02 1
"|" 0.02 1 0.02 1
":" 0.02 1 0.02 1
... and took over 7 seconds in R-1.8.1, with the output:
self.time self.pct total.time total.pct
"paste" 0.52 6.7 1.10 14.1
"read.dcf" 0.38 4.9 0.96 12.3
"exists" 0.36 4.6 0.50 6.4
"lapply" 0.36 4.6 1.36 17.4
"names" 0.34 4.4 0.40 5.1
"names<-" 0.32 4.1 0.44 5.6
"inherits" 0.22 2.8 0.34 4.4
"==" 0.20 2.6 0.20 2.6
".Call" 0.20 2.6 0.20 2.6
"seq" 0.18 2.3 0.50 6.4
"seq.default" 0.18 2.3 0.28 3.6
"dynGet" 0.16 2.1 0.46 5.9
"topenv" 0.16 2.1 0.18 2.3
"unique" 0.16 2.1 1.00 12.8
"apply" 0.14 1.8 0.20 2.6
"as.list" 0.14 1.8 0.18 2.3
".find.package" 0.14 1.8 1.88 24.1
"sapply" 0.14 1.8 1.54 19.7
"any" 0.12 1.5 1.68 21.5
"initialize" 0.12 1.5 4.98 63.8
I then did the following:
> trace(read.dcf, recover)
[1] "read.dcf"
> res <- boot(mydf, bootmix, strata=mydf[,"id"], R=1)
Tracing read.dcf(file.path(package.lib, package, "DESCRIPTION"), fields =
"Namespace") on entry
Enter a frame number, or 0 to exit
1:boot(mydf, bootmix, strata = mydf[, "id"], R = 1)
2:statistic(data, original, ...)
3:try(fitmix(d.grps[[j]][, 1], verbosefit = FALSE))
4:fitmix(d.grps[[j]][, 1], verbosefit = FALSE)
5:flexmix(x ~ 1, k = 1, control = list(verbose = 0))
6:flexmix(x ~ 1, k = 1, control = list(verbose = 0))
7:flexmix(formula = formula, data = data, k = k, cluster = cluster, model =
list(F
8:FLXglm()
9:new("FLXmodel", weighted = TRUE, formula = formula, name = paste("FLXglm",
famil
10:initialize(value, ...)
11:initialize(value, ...)
12:getClass(Class)
13:.requirePackage(package)
14:trySilent(loadNamespace(package))
15:eval.parent(call)
16:eval(expr, p)
17:eval(expr, envir, enclos)
18:try(loadNamespace(package))
19:loadNamespace(package)
20:packageHasNamespace(package, package.lib)
21:read.dcf(file.path(package.lib, package, "DESCRIPTION"), fields =
"Namespace")
I'm really out of ideas at this point. Can anyone see the problem given the
above, or at least tell me what to try next?
Best,
Andy
More information about the R-help
mailing list