[R] problem with generalized least square, and lm.gls() bug
Naoki Takebayashi
ntakebay at bio.indiana.edu
Thu Jun 21 19:24:05 CEST 2001
I'm very confused about generalized least square method (GLS), so
please help me! I'm trying to use GLS with R to estimate an intercept and
slope; vector, b = (intercept, slope).
My understanding of GLS is like this:
y = X %*% b + e, where Var(e) is symmetric dispersion matrix
(covariance-variance matrix); V (which is theoretically known in my
case). We can decompose V into t(R) %*% R, where R is an upper
triangular, square-root matrix of V, and t(R) is tranpose of R. Then
if we call T = Inverse(t(R)), we can multply both side of equation to
make the problem as a ordinary least square.
T %*% y = T %*% X %*% b + T * e
To find T, "solve(t(chol(V))" should do it. However, when I looked at
lm.gls() from MASS library, it uses other method of obtaining T. From
my understanding, when I give V (my dispersion matrix) to the argument
"W=" of lm.gls, I should also give "inverse = TRUE" (can anyone
confirm that I should be using this option?). Then it uses the
following method to obtain T.
eW <- eigen(V, TRUE)
T <- diag(eW$values ^ (-0.5)) %*% t(eW$vector)
I was checking if these two methods give same answer with a small
matrix; so I tried this:
> tempSqRt <- matrix(c(7,0,0, 19, 76, 0, 31, 94, 51), c(3,3))
> temp <- t(tempSqRt) %*% tempSqRt # this is a symmetric dispersion
matrix
> temp.e <- eigen(temp, TRUE)
> temp.T <- diag(temp.e$values ^ (-0.5)) %*% t(temp.e$vector)
> temp.T
[,1] [,2] [,3]
[1,] 0.0001090669 0.0042107823 0.006247486
[2,] 0.0004108533 -0.0272660081 0.018370016
[3,] 0.1487442294 0.0003382315 -0.002824703
> solve(t(chol(temp)))
[,1] [,2] [,3]
[1,] 0.14285714 -5.569319e-18 3.101497e-17
[2,] -0.03571429 1.315789e-02 -6.217046e-18
[3,] -0.02100840 -2.425181e-02 1.960784e-02
So these two methods do NOT give the same answer. I guess I'm
missunderstanding something. Could someone point out what's wrong?
The reason I started to compare the two methods is that when I
estimate the slope and intercept with lm.gls(), the estimate was way
off. A linear regression with OLS gives that slope of 26 and
intercept of 5, and the regression line on scatter plot fits well, but
with lm.gls(), the estimate is slope of -5 and intercept of 0.89. So
the regression line doesn't even touch the cloud of data points in the
scatter plot. I was giving dispersion matrix (var-cov matrix) for
"W", so I used "inverse=T". But when I use "inverse=F" (which should
be wrong) using the same dispersion matrix without inversing, I get
more decent estimates of slope and intercept. This is why I started
to compare the two methods of obtaining T.
By the way, I got MASS library from VR_6.2-5.tar.gz, and lm.gls() has
a bug. In the 2nd half of the code, there is
fit <- lm.fit(A %*% X, A %*% Y, method, ...)
It needs to be changed to (the 3rd arg of lm.fit isn't method):
fit <- lm.fit(A %*% X, A %*% Y, method=method, ...)
Also I had to change
class(fit) <- c("lm.gls", class(fit))
to
class(fit) <- c("lm", class(fit))
Without this change I couldn't use summary() (I still can't use
anova() on the lm.gls() output). But I'm not sure if this is a correct
way.
Thank you,
Naoki
Naoki Takebayashi <ntakebay at bio.indiana.edu>
--- Dept. of Biology, Box 90338, Duke University, Durham, NC 27708-0338
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list