[R] faraway tutorial: cryptic command to newbie
andy_liaw at merck.com
Mon Feb 24 20:23:23 CET 2003
I'm no expert in these matters, but I'll toss in my $0.02 anyway.
My recollection from reading Golub & Van Loan a few years ago is that
there's quite a bit of controversy as to the "best" approach to least
squares. Just recently I've read Monahan's "Numerical Methods in
Statistics", which has three relevant chapters (including one titled
"Regression Computations"). In it, several approaches were presented:
QR-Householder, QR-Givens, SVD, MCD, sweep, etc. The conclusion drawn was
that no single method is the best for all problems, and the task of writing
a regression routine is best avoided unless the workhorse routines in stat
packages are not satisfactory (in terms of speed/storage requirement/etc.).
My impression is that, with the glaring exception of SAS (which uses sweep,
if I'm not mistaken), most stat packages use QR, as a good compromise
between stability and speed.
> -----Original Message-----
> From: Chong Gu [mailto:chong at stat.purdue.edu]
> Sent: Monday, February 24, 2003 1:51 PM
> To: Julian Faraway
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] faraway tutorial: cryptic command to newbie
> Not only it's unfair criticism, it's probably also imprecise
> For a detailed discussion of the precisions of regression estimates
> through QR-decomposition and normal equations, one may consult Golub
> and Van Loan's book on Matrix Computation (1989, Section 5.3.9 on page
> 230). QR takes twice as much computation, requires more memory, but
> does NOT necessarily provide better precision.
> The above said, I am not questioning the adequacy of the QR approach
> to regression calculation as implemented in R.
> > That's an unfair criticism. That discussion was never intended as
> > a recommendation for how to compute a regression. Of course, SVD or
> > QR decompositions are the preferred method but many newbies
> don't want to
> > digest all that right from the start. These are just
> obscure details to
> > the beginner.
> > One of the strengths of R in teaching is that students can directly
> > implement the formulae from the theory. This reinforces the
> > between theory and practice. Implementing the normal
> equations directly
> > is a quick early illustration of this connection.
> Explaining the precise
> > details of how to fit a regression model is something that can be
> > deferred.
> > Julian Faraway
> > >> I am just about working through Faraways excellent
> tutorial "practical
> > >> regression and ANOVA using R"
> > >
> > >I assume this is a reference to the PDF version available
> via CRAN. I am
> > >afraid that is *not* a good discussion of how to do regression,
> > especially
> > >not using R. That page is seriously misleading: good ways
> to compute
> > >regressions are QR decompositions with pivoting (which R
> uses) or an SVD.
> > >Solving the normal equations is well known to square the condition
> > number,
> > >and is close to the worse possible way. (If you must use normal
> > >equations, do at least centre the columns, and preferably do some
> > >scaling.)
> > >
> > >> on page 24 he makes the x matrix:
> > >> x <- cbind(1,gala[,-c(1,2)])
> > >>
> > >> how can I understand this gala[,-c(1,2)])... I couldn't find an
> > >> explanation of such "c-like" abbreviations anywhere.
> > >
> > >Well, it is in all good books (as they say) including `An
> Introduction to
> > >R'. (It's even on page 210 of that book!)
> > >
> > >-c(1,2) is (try it)
> > >
> > >> -c(1,2)
> > > -1 -2
> > >
> > >so this drops columns 1 and 2. It then adds in front a
> column made up of
> > >ones, which is usually a sign of someone not really
> understanding how
> > >R's linear models work.
> > >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > http://www.stat.math.ethz.ch/mailman/listinfo/r-help
> R-help at stat.math.ethz.ch mailing list
More information about the R-help