[R] Multiple regression (with interactions) by hand

Berend Hasselman bhh at xs4all.nl
Wed Sep 4 14:35:49 CEST 2013




On Tue, Sep 3, 2013 at 2:51 AM, Christoph Scherber
<Christoph.Scherber at agr.uni-goettingen.de> wrote:
> Dear all,
> 
> I´ve played around with the "airquality" dataset, trying to solve the matrix equations of a simple
> multiple regression by hand; however, my matrix multiplications don´t lead to the estimates returned
> by coef(). What have I done wrong here?
> 
> ##
> m1=lm(Ozone~Solar.R*Wind,airquality)
> 
> # remove NA´s:
> airquality2=airquality[complete.cases(airquality$Ozone)&
> complete.cases(airquality$Solar.R)&
> complete.cases(airquality$Wind),]
> 
> # create the model matrix by hand:
> X=cbind("(Intercept)"=1,Solar.R=airquality2$Solar.R,Wind=airquality2$Wind,"Solar.R:Wind"=airquality2$Solar.R*airquality2$Wind)
> # is the same as:
> model.matrix(m1)
> 
> # create the response vector by hand:
> Y=airquality2$Ozone
> # is the same as:
> m1$model$Ozone
> 
> # Now solve for the parameter estimates:
> library(MASS)
> ginv(t(X)%*%X)%*%t(X)%*%Y
> 
> # is not the same as:
> coef(m1)
> 
> ##
> Now why is my result (line ginv(...)) not the same as the one returned by coef(m1)?

Have a look at the help of ginv. It mentions the tol argument.
If you do

ginv(crossprod(X),tol=1e-12) %*% crossprod(X,Y)

you'll see that all is well. It's up to you to play with tol.

Berend



More information about the R-help mailing list