[R-sig-teaching] R code to perform OLS by hand

Douglas Bates bates at stat.wisc.edu
Tue Nov 6 15:29:21 CET 2007


On 11/6/07, Antonio P. Ramos <ramos.grad.student at gmail.com> wrote:
> Hi all.

> Does anyone have a "smart" code for performing bi and mult-ivariable
> regression in R by hand? This is suppose to help students without a sound
> knowledge of R to understand what the computer is doing when we write
> lm(x~y).

Why not approach the explanation the other way and describe what R did
to get the results?  There are many extractor functions that can be
used to "deconstruct" a fitted model.  You can use these to show
exactly what calculations were performed to fit the model.

Running

example(trees)

provides several plots of the data on the volume of timber in a sample
of 31 felled black cherry trees as a function of the girth (the
circumference of the tree at a specified distance from the ground) and
the overall height of the tree.  It also provides two fitted models,
fm1 and fm2.  Try some of the following

formula(fm1)
model.matrix(fm1)
model.frame(fm1)
model.response(model.frame(fm1))
fitted(fm1)
resid(fm1)
all.equal(model.response(model.frame(fm1) - fitted(fm1), resid(fm1))
coef(fm1)

I personally am loathe to present the famous formula
$\hat{\beta}=(X'X)^{-1}X'y$ because I know that the calculation is
never done that way.  However, I do want to show an equivalent
relationship, that the residual at the parameter estimates is
orthogonal to the columns of the model matrix.  That is,

t(model.matrix(fm1)) %*% resid(fm1)

should be very close to zero.

Repeat this for fm2.




More information about the R-sig-teaching mailing list