[R] solution to a regression with multiple independent variable
Charles C. Berry
cberry at tajo.ucsd.edu
Sun Nov 5 19:22:12 CET 2006
On Sun, 5 Nov 2006, John Sorkin wrote:
> Please forgive a statistics question.
> I know that a simple bivariate linear regression, y=f(x) or in R
> parlance lm(y~x) can be solved using the variance-covariance matrix:
> beta(x)=covariance(x,y)/variance(x). I also know that a linear
> regression with multiple independent variables, for example y=f(x,z)
> can also be solved using the variance-covariance matrix, but I don't
> know how to do this. Can someone help me go from the variance-covariance
> matrix to the solution of a regression with multiple independent
> variables? It is not clear how one applies the matrix solution b=
> (x'x)-1*x'y to the elements of the variance-covariance matrix, i.e. how
> one gets the required values from the variance-covariance matrix.
> Any help, or suggestions would be appreciated.
>
The "x"s you use above have differing meanings - a possible source of
confusion. The "x" in "(x'x)-1*x'y" is the design matrix and in the case
of a simple linear regression (not "bivariate" BTW) contains a column of
ones and a column of values of the independent variable.
I suggest you review the chapter in Draper and Smith's Applied Regression
Analysis where the transition to the matrix algebraic formulation of
regression is laid out. IIRC, it is done first for simple linear
regression.
In concert with this carry out the computation "longhand" (with the help
of R) for the simple linear regression using both formulae.
Also do it using a centered version of 'x'.
Here is one version:
> x <- 1:10
> y <- rnorm(10)+x
> cov(x,y)
[1] 10.17249
> var(x)
[1] 9.166667
> X <- cbind(1,x)
> t(X) %*% X
x
10 55
x 55 385
> t(X)%*%y
[,1]
57.63155
x 408.52594
> cov(x,y)/var(x)
[1] 1.109727
> lm(y~x)
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
-0.3403 1.1097
> solve( t(X) %*% X ) %*% t(X) %*% y
[,1]
-0.3403414
x 1.1097265
> X2 <- cbind( 1, x- mean(x) )
> t(X2) %*% X2
[,1] [,2]
[1,] 10 0.0
[2,] 0 82.5
> 82.5/9 ### have you seen this before?
[1] 9.166667
> t(X2) %*% y
[,1]
[1,] 57.63155
[2,] 91.55244
> 91.55244/9 ### or this??
[1] 10.17249
> solve( t(X2) %*% X2 ) %*% t(X2) %*% y
[,1]
[1,] 5.763155
[2,] 1.109727
> mean(y)
[1] 5.763155
>
Try it again using a centered version of y.
Does this help?
To really get a handle on this, you need to dig into the matrix algebra a
bit. Rao's Linear Statistical Inference and Its Applications does this
nicely and shows how matrix operations are carried out on the
variance-covariance matrices (sorry I don't have the page refs handy, but
IIRC it is in a later chapter pertaining to multivariate analysis).
Chuck
Comment: "solve( t(X) %*% X ) %*% t(X) %*% y" is NOT the way production
code for regression problems would be written. If you want to see how
production code should be written look at the Fortran source for "dqrls"
in the R source code distribution.
> Thanks,
> John
>
> John Sorkin M.D., Ph.D.
> Chief, Biostatistics and Informatics
> Baltimore VA Medical Center GRECC,
> University of Maryland School of Medicine Claude D. Pepper OAIC,
> University of Maryland Clinical Nutrition Research Unit, and
> Baltimore VA Center Stroke of Excellence
>
> University of Maryland School of Medicine
> Division of Gerontology
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
>
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
> jsorkin at grecc.umaryland.edu
>
> Confidentiality Statement:
> This email message, including any attachments, is for the so...{{dropped}}
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0717
More information about the R-help
mailing list