[R] A performance anomaly
Ajay Narottam Shah
ajayshah at mayin.org
Mon Jun 6 07:31:37 CEST 2005
I wrote a simple log likelihood (for the ordinary least squares (OLS)
model), in two ways. The first works out the likelihood. The second
merely calls the first, but after transforming the variance parameter,
so as to allow an unconstrained maximisation. So the second suffers a
slight cost for one exp() and then it pays the cost of calling the first.
I did performance measurement. One would expect the second version to
be slightly (very slightly) slower than the first. But I am finding
this is not the case! The second version is slightly faster. How can
this be?
On my machine (ibook @ 1.2 GHz, OS X "panther", R 2.1), I get:
> measurement(ols.lf1)
[1] 7.45
> measurement(ols.lf1.inlogs)
[1] 6.9
Here is the self-contained bug-demonstration code:
ols.lf1 <- function(theta, y, X) {
beta <- theta[-1]
sigma2 <- theta[1]
if (sigma2 <= 0) return(NA)
n <- nrow(X)
e <- y - X%*%beta
logl <- ((-n/2)*log(2*pi)) - ((n/2)*log(sigma2)) - ((t(e)%*%e)/(2*sigma2))
return(-logl) # since optim() does minimisation by default.
}
# A variant: Shift to logs for sigma2 so I can do unconstrained maximisation --
ols.lf1.inlogs <- function(truetheta, y, X) {
ols.lf1(c(exp(truetheta[1]), truetheta[-1]), y, X)
}
# We embark on one numerical experiment
set.seed(101)
X <- cbind(1, runif(20000))
theta.true <- c(2,4,6) # error variance = 2, intercept = 4, slope = 6.
y <- X %*% theta.true[-1] + sqrt(theta.true[1]) * rnorm(20000)
measurement <- function(f) {
N <- 200
cost <- system.time(
for (i in 1:N) {
j <- f(c(2,3,4), y, X)
}
)
return(cost[1]*1000/N) # cost per lf in milliseconds
}
measurement(ols.lf1)
measurement(ols.lf1.inlogs)
--
Ajay Shah Consultant
ajayshah at mayin.org Department of Economic Affairs
http://www.mayin.org/ajayshah Ministry of Finance, New Delhi
More information about the R-help
mailing list