[R] help speeding up simple Theil regression function
Brad Schneid
bps0002 at auburn.edu
Sun Oct 21 21:17:22 CEST 2012
I understand, thank you. I think I wanted the output to look similar to
that from mblm for quick comparison; that is obviously a problem.
William Dunlap wrote
> My comments have nothing to do with speed of your code,
> but with the correctness.
>
>> > np.lm <-function(dat, X, Y, ...){
>> > # Ch 9.2: Slope est. (X) for Thiel statistic
>> > ...
>> > num[[i]] <- dat[j.s[i],Y] - dat[i.s[i],Y]
>> > dom[[i]] <- dat[j.s[i],X] - dat[i.s[i],X]
>
> Up to here is looks like X and Y are used to indicate which columns
> of dat are the predictor and response, respectively.
>
>> > ..
>> > X <- median( sort( do.call(c, num) / do.call(c, dom) ) )
>
> Now X is used to mean the slope of the regression.
>
>> > # Ch 9.4: Intercept est. for Thiel statistic
>> > Intercept <- median(dat[,"Y"] - X*dat[,"X"])
>
> Now the predictor column must be named "X" and the response "Y",
> no matter what your input X and Y were. Hence the function will fail
> to give the correct answer in most cases.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
>> -----Original Message-----
>> From:
> r-help-bounces@
> [mailto:
> r-help-bounces@
> ] On Behalf
>> Of Berend Hasselman
>> Sent: Sunday, October 21, 2012 11:59 AM
>> To: Brad Schneid
>> Cc:
> r-help@
>> Subject: Re: [R] help speeding up simple Theil regression function
>>
>>
>> On 21-10-2012, at 20:06, Brad Schneid wrote:
>>
>> > Hello,
>> >
>> > I am working on a simple non-parametric (Theil) regression function and
>> and
>> > am following Hollander and Wolfe 1999 text. I would like some help
>> making
>> > my function faster. I have compared with pre-packaged version from
>> "MBLM",
>> > which isnt very fast either, but it appears mine is faster with N =
>> 1000
>> > (see results below). I plan on running this function repeatedly, and I
>> > generally have data lengths of ~ N = 6000 or more.
>> >
>> > # My function following Hollander and Wolfe text, Chapter 9
>> > np.lm <-function(dat, X, Y, ...){
>> > # Ch 9.2: Slope est. (X) for Thiel statistic
>> > combos <- combn(nrow(dat), 2)
>> > i.s <- combos[1,]
>> > j.s <- combos[2,]
>> > num <- vector("list", length=length(i.s))
>> > dom <- vector("list", length=length(i.s))
>> >
>> > for(i in 1:length(i.s)){
>> > num[[i]] <- dat[j.s[i],Y] - dat[i.s[i],Y]
>> > dom[[i]] <- dat[j.s[i],X] - dat[i.s[i],X]
>> > }
>> >
>> > X <- median( sort( do.call(c, num) / do.call(c, dom) ) )
>> > # Ch 9.4: Intercept est. for Thiel statistic
>> > Intercept <- median(dat[,"Y"] - X*dat[,"X"])
>> > out <- data.frame(Intercept, X)
>> > return(out)
>> > } # usage: np.lm(dat, X=1, Y=2)
>> > ################################################################
>> >
>> > library("mblm") # I will compare to mblm() function
>> >
>> > X <- rnorm(1000)
>> > Y <- rnorm(1000)
>> > dat <- data.frame(X, Y)
>> >
>> > system.time(np.lm(dat, X=1, Y=2) )
>> > user system elapsed
>> > 118.610 0.130 119.144
>> > 109.000 0.040 109.416 # ran it twice
>> > 86.190 0.100 86.589 # 3rd time
>>
>> Alternative function without your i loop (it isn't needed and can be
>> vectorized):
>>
>> np.lm.alt <-function(dat, X, Y, ...){
>> # Ch 9.2: Slope est. (X) for Thiel statistic ==> (Pedantic comment: it
>> is Theil (swap
>> the i and e)
>> combos <- combn(nrow(dat), 2)
>> i.s <- combos[1,]
>> j.s <- combos[2,]
>>
>> Y.num <- dat[j.s,Y] - dat[i.s,Y]
>> X.dom <- dat[j.s,X] - dat[i.s,X]
>> X <- median( Y.num / X.dom)
>> # Ch 9.4: Intercept est. for Thiel statistic ==> (Pedantic comment: it
>> is Theil (swap
>> the i and e)
>> Intercept <- median(dat[,"Y"] - X*dat[,"X"])
>> out <- data.frame(Intercept, X)
>> return(out)
>> } # usage: np.lm(dat, X=1, Y=2)
>>
>>
>> Try the compiler package on you original function:
>>
>> library(compiler)
>> np.lm.c <- cmpfun(np.lm)
>>
>> Test speed and correct results:
>>
>> X <- rnorm(500)
>> Y <- rnorm(500)
>> dat <- data.frame(X, Y)
>>
>> system.time(npout.c <- np.lm.c(dat, X=1, Y=2) )
>> system.time(npout.1 <- np.lm(dat, X=1, Y=2) )
>> system.time(npout.a <- np.lm.alt(dat, X=1, Y=2) )
>> identical(npout.1,npout.c)
>> identical(npout.1,npout.a)
>>
>> Results:
>>
>> > system.time(npout.c <- np.lm.c(dat, X=1, Y=2) )
>> user system elapsed
>> 21.442 0.066 21.517
>> > system.time(npout.1 <- np.lm(dat, X=1, Y=2) )
>> user system elapsed
>> 21.068 0.073 21.161
>> > system.time(npout.a <- np.lm.alt(dat, X=1, Y=2) )
>> user system elapsed
>> 0.303 0.010 0.313
>> > identical(npout.1,npout.c)
>> [1] TRUE
>> > identical(npout.1,npout.a)
>> [1] TRUE
>>
>> You may try and test this with larger data lengths.
>>
>>
>> Berend
>>
>> ______________________________________________
>>
> R-help@
> mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help@
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
View this message in context: http://r.789695.n4.nabble.com/help-speeding-up-simple-Theil-regression-function-tp4646923p4646935.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list