[R] A doubt about "lm" and the meaning of its summary
Moshe Olshansky
m_olshansky at yahoo.com
Tue Aug 19 03:41:40 CEST 2008
Hi Alberto,
In your second case the linear model
y = a*x + b + error
does not hold.
--- On Tue, 19/8/08, Alberto Monteiro <albmont at centroin.com.br> wrote:
> From: Alberto Monteiro <albmont at centroin.com.br>
> Subject: [R] A doubt about "lm" and the meaning of its summary
> To: r-help at r-project.org
> Received: Tuesday, 19 August, 2008, 4:31 AM
> I have a conceptual problem (sort of). Maybe there's a
> simple solution,
> maybe not.
>
> First, let me explain the test case that goes ok.
>
> Let x be a (fixed) n-dimensional vector. I simulate a lot
> of linear models,
> y = m x + c + error, then I do a lot of regressions. As
> expected, the
> estimated values of m (let's call them m.bar) are
> distributed according to a
> Student's t distribution.
>
> More precisely (test case, it takes a few minutes to run):
> #
> # start with fixed numbers
> #
> m <- sample(c(-1, -0.1, 0.1, 1), 1)
> c <- sample(c(-1, -0.1, 0, 0.1, 1), 1)
> sigma <- sample(c(0.1, 0.2, 0.5, 1), 1)
> n <- sample(c(4,5,10,1000), 1)
> x <- 1:n # it's possible to use other x
> NN <- sample(c(3000, 4000, 5000), 1)
> m.bar <- m.std.error <- 0 # these vectors will
> hold the simulation output
>
> #
> # Now let's repeat NN times:
> # simulate y
> # regress y ~ x
> # store m.bar and its error
> #
> for (i in 1:NN) {
> y <- m * x + c + sigma * rnorm(n)
> reg <- lm(y ~ x)
> m.bar[i] <- summary(reg)$coefficients['x',
> 'Estimate']
> m.std.error[i] <-
> summary(reg)$coefficients['x', 'Std. Error']
> }
> #
> # Finally, let's analyse it
> # The distribution of (m.bar - m) / m.std.error is
> # a Student's t with n - 2 degrees of freedom.
> # Is it? Let's test!
> #
> print(ks.test((m.bar - m) / m.std.error, rt(NN, n - 2)))
>
> Now, this goes ok, with ks.test returning a number
> uniformly distributed in
> the 0-1 interval.
>
> Next, I did a slightly different model. This model is a
> reversion model,
> where the simulated random variable lp goes along the
> equation:
> lp[t + 1] <- (1 + m) * lp[t] + c + error
>
> I tried to use the same method as above, defining
> x = lp[1:n], y = lp[2:(n+1}] - lp[1:n], with equation
> y <- m * x + c + error
>
> And now it breaks. Test case (it takes some minutes to
> run):
>
> #
> # start with fixed numbers
> #
> m <- runif(1, -1, 0) # m must be something in the
> (-1,0) interval
> c <- sample(c(-1, -0.1, 0, 0.1, 1), 1)
> sigma <- sample(c(0.1, 0.2, 0.5, 1), 1)
> n <- sample(c(4,5,10,1000), 1)
> NN <- sample(c(3000, 4000, 5000), 1)
> m.bar <- m.std.error <- 0 # these vectors will
> hold the simulation output
>
> #
> # Now let's repeat NN times:
> # simulate y
> # regress y ~ x
> # store m.bar and its error
> #
> for (i in 1:NN) {
> lp <- 0
> lp[1] <- 0
> for (j in 1:n) {
> lp[j+1] = (1 + m) * lp[j] + c + sigma * rnorm(1)
> }
> x <- lp[1:n]
> y <- lp[-1] - x
> reg <- lm(y ~ x)
> m.bar[i] <- summary(reg)$coefficients['x',
> 'Estimate']
> m.std.error[i] <-
> summary(reg)$coefficients['x', 'Std. Error']
> }
> #
> # Finally, let's analyse it
> # The distribution of (m.bar - m) / m.std.error should be
> # a Student's t with n - 2 degrees of freedom.
> # Is it? Let's test!
> #
> print(ks.test((m.bar - m) / m.std.error, rt(NN, n - 2)))
>
> ... and now it's not.
>
> What's wrong with this? I suspect that the model y ~ x
> does only give
> meaningful values when x is deterministic; in this case x
> is stochastic. Is
> there any correct way to estimate this model giving
> meaningful outputs?
>
> Alberto Monteiro
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
More information about the R-help
mailing list