[R] hard to believe speed difference
Thomas Lumley
tlumley at u.washington.edu
Tue Aug 6 22:56:00 CEST 2002
On Tue, 6 Aug 2002, Jason Liao wrote:
> First, I love R and am grateful to be using this free and extremely
> high quality software.
>
> Recently I have been comparing two algorithms and naturally I
> programmed in R first. It is so slow that I can almost feel its pain.
> So I decided to do a comparison with Java. To draw 500,0000 truncated
> normal, Java program takes 2 second and R takes 72 seconds. Not a
> computer science major, I find it hard to understand how R can be so
> slow.
>
> R code:
>
> normal.truncated <- function(mean, sd, lower, upper)
> {
> repeat
> {
> deviate = round( rnorm(1, mean, sd) );
> if(deviate<=upper && deviate>=lower) break;
> }
> temp <- c(deviate+.5, deviate-.5, upper+.5, lower-.5);
> result <- pnorm(temp, mean, sd);
> prob <- (result[1] - result[2])/(result[3] - result[4]);
>
> return(c(deviate, prob));
> }
>
> print(date());
>
> n = 500000;
> result = numeric(n);
>
> for(i in 1:n) result[i] = normal.truncated(0,1, -10, 10)[1]
> print(date());
>
Largely because you aren't using vectorised code. Loops in R are slow (to
some extent true with all interpreters). The way around this is to use
R's ability to work on vectors or matrices instead of looping over their
elements.
Eg
rtruncnorm<-function(n,mean,sd,lower,upper){
rval<-numeric(n)
done<-rep(FALSE,n)
while(m<-sum(!done)){
rval[!done] <- round(rnorm(m,mean,sd))
done <- rval>=lower & rval<=upper
}
prob<-(pnorm(rval+0.5,mean,sd)-pnorm(rval-0.5,mean,sd))/(pnorm(upper+0.5,mean,sd)-pnorm(lower-0.5))
cbind(rval,prob)
}
rtruncnorm2<-function(n,mean,sd,lower,upper){
uval<-runif(n,pnorm(lower-0.5,mean,sd),pnorm(upper+0.5,mean,sd))
rval<-round(qnorm(uval,mean,sd))
prob<-(pnorm(rval+0.5,mean,sd)-pnorm(rval-0.5,mean,sd))/(pnorm(upper+0.5,mean,sd)-pnorm(lower-0.5))
cbind(rval,prob)
}
are two improved versions of your code that run more than 20 times
faster on my computer.
The first uses the same rejection sampling algorithm that you use, but
does it for the whole vector at once.
The second generates truncated uniforms (which is trivial) and transforms
them to Normal.
In addition to being a lot faster than your code I think they will handle
the case where mean, sd, lower or upper are also vectors.
Even faster, if you only want whole numbers and always the same
parameters, is to compute the probabilities once and then sample from the
resulting discrete distribution.
rtruncnorm3<-function(n,mean,sd,lower,upper){
p<-diff(pnorm((lower:(1+upper))-0.5,mean,sd))
sample(lower:upper,n,replace=TRUE,prob=p)
}
This is about 400 times faster than your version (and the speed advantage
would increase if the truncation points were less extreme), apparently
faster than the Java program.
By the way, there's no point truncating standard Normals to [-10,10]. The
probability of getting anything outside [-10,10] in 500000 samples is
about 10^-17. Even for [-6, 6] it's less than 0.001.
-thomas
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list