[R-SIG-Finance] repeating regression

Richard Herron richard.c.herron at gmail.com
Wed Nov 23 05:08:10 CET 2011


The beta = cov(x, y) / var(x) proposed assumes an intercept, but it
sounds like you want to run the regression through the origin, which
is beta = summation x*y / summation x*x. Also, the summation gets you
quite a speed boost. Below is some code.

Although I imagine that with only five observations you won't be able
to statistically differentiate between the betas. I suggest rolling
regressions using differenced cumulative sums to create the cov(x, y)
and var(x) terms. HTH.

mat_x <- matrix(5 + rnorm(5*2e4), ncol = 5)
mat_epsilon <- matrix(rnorm(5*2e4, mean = 0, sd = 0.1), ncol = 5)
mat_y <- 5 + 5*mat_x + mat_epsilon
mat_xy <- cbind(mat_x, mat_y)

# doing the regression with cov/var assumes an intercept
fun_beta_cov <- function(x) {
    cov(x[1:5], x[6:10]) / var(x[1:5])
}
system.time({
    beta_1 <- apply(mat_xy, 1, FUN = fun_beta_cov)
})

# doing the regression with summations
system.time({
    beta_2 <- rowSums(mat_xy[, 1:5]*mat_xy[, 6:10])/rowSums(mat_xy[,
1:5]*mat_xy[, 1:5])
})

# doing the regression with `lm` without intercept
fun_beta_lm <- function(x) {
    lm(x[6:10] ~ x[1:5] - 1)$coefficients[1]
}
system.time({
    beta_3 <- apply(mat_xy, 1, FUN = fun_beta_lm)
})

# doing the regression with `lm` with intercept
fun_beta_lm_int <- function(x) {
    lm(x[6:10] ~ x[1:5])$coefficients[2]
}
system.time({
    beta_3_int <- apply(mat_xy, 1, FUN = fun_beta_lm_int)
})

# results
head(beta_1)
head(beta_2)
head(beta_3)
head(beta_3_int)

On Tue, Nov 22, 2011 at 10:34, G See <gsee000 at gmail.com> wrote:
> You may be interested in the fastLM function from the RcppArmadillo package
>
> On Mon, Nov 21, 2011 at 9:52 PM, Robert A'gata <rhelpacc at gmail.com> wrote:
>
>> Hi,
>>
>> I think my problem is a bit mundane but it's quite intriguing. Imagine
>> I have a matrix of 10 by 2 million. The first 5 columns are x and the
>> last 5 are y values. I have to regress y on x (assume 0 intercept) for
>> each row to observe time series of the slope. I am wondering if there
>> is any way to speed this calculation up? I tried with apply. But it is
>> still slow. Is there any trick I should know? Thank you.
>>
>> Robert
>>
>> _______________________________________________
>> R-SIG-Finance at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions
>> should go.
>>
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> R-SIG-Finance at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.
>



More information about the R-SIG-Finance mailing list