[R] help speeding up simple Theil regression function
Berend Hasselman
bhh at xs4all.nl
Sun Oct 21 20:59:00 CEST 2012
On 21-10-2012, at 20:06, Brad Schneid wrote:
> Hello,
>
> I am working on a simple non-parametric (Theil) regression function and and
> am following Hollander and Wolfe 1999 text. I would like some help making
> my function faster. I have compared with pre-packaged version from "MBLM",
> which isnt very fast either, but it appears mine is faster with N = 1000
> (see results below). I plan on running this function repeatedly, and I
> generally have data lengths of ~ N = 6000 or more.
>
> # My function following Hollander and Wolfe text, Chapter 9
> np.lm <-function(dat, X, Y, ...){
> # Ch 9.2: Slope est. (X) for Thiel statistic
> combos <- combn(nrow(dat), 2)
> i.s <- combos[1,]
> j.s <- combos[2,]
> num <- vector("list", length=length(i.s))
> dom <- vector("list", length=length(i.s))
>
> for(i in 1:length(i.s)){
> num[[i]] <- dat[j.s[i],Y] - dat[i.s[i],Y]
> dom[[i]] <- dat[j.s[i],X] - dat[i.s[i],X]
> }
>
> X <- median( sort( do.call(c, num) / do.call(c, dom) ) )
> # Ch 9.4: Intercept est. for Thiel statistic
> Intercept <- median(dat[,"Y"] - X*dat[,"X"])
> out <- data.frame(Intercept, X)
> return(out)
> } # usage: np.lm(dat, X=1, Y=2)
> ################################################################
>
> library("mblm") # I will compare to mblm() function
>
> X <- rnorm(1000)
> Y <- rnorm(1000)
> dat <- data.frame(X, Y)
>
> system.time(np.lm(dat, X=1, Y=2) )
> user system elapsed
> 118.610 0.130 119.144
> 109.000 0.040 109.416 # ran it twice
> 86.190 0.100 86.589 # 3rd time
Alternative function without your i loop (it isn't needed and can be vectorized):
np.lm.alt <-function(dat, X, Y, ...){
# Ch 9.2: Slope est. (X) for Thiel statistic ==> (Pedantic comment: it is Theil (swap the i and e)
combos <- combn(nrow(dat), 2)
i.s <- combos[1,]
j.s <- combos[2,]
Y.num <- dat[j.s,Y] - dat[i.s,Y]
X.dom <- dat[j.s,X] - dat[i.s,X]
X <- median( Y.num / X.dom)
# Ch 9.4: Intercept est. for Thiel statistic ==> (Pedantic comment: it is Theil (swap the i and e)
Intercept <- median(dat[,"Y"] - X*dat[,"X"])
out <- data.frame(Intercept, X)
return(out)
} # usage: np.lm(dat, X=1, Y=2)
Try the compiler package on you original function:
library(compiler)
np.lm.c <- cmpfun(np.lm)
Test speed and correct results:
X <- rnorm(500)
Y <- rnorm(500)
dat <- data.frame(X, Y)
system.time(npout.c <- np.lm.c(dat, X=1, Y=2) )
system.time(npout.1 <- np.lm(dat, X=1, Y=2) )
system.time(npout.a <- np.lm.alt(dat, X=1, Y=2) )
identical(npout.1,npout.c)
identical(npout.1,npout.a)
Results:
> system.time(npout.c <- np.lm.c(dat, X=1, Y=2) )
user system elapsed
21.442 0.066 21.517
> system.time(npout.1 <- np.lm(dat, X=1, Y=2) )
user system elapsed
21.068 0.073 21.161
> system.time(npout.a <- np.lm.alt(dat, X=1, Y=2) )
user system elapsed
0.303 0.010 0.313
> identical(npout.1,npout.c)
[1] TRUE
> identical(npout.1,npout.a)
[1] TRUE
You may try and test this with larger data lengths.
Berend
More information about the R-help
mailing list