[R] R versus SAS: lm performance

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue May 11 14:59:17 CEST 2004


On 11 May 2004, Peter Dalgaard wrote:

> "Liaw, Andy" <andy_liaw at merck.com> writes:
> 
> > I tried the following on an Opteron 248, R-1.9.0 w/Goto's BLAS:
> > 
> > > y <- matrix(rnorm(14000*1344), 1344)
> > > x <- matrix(runif(1344*503),1344)
> > > system.time(fit <- lm(y~x))
> > [1] 106.00  55.60 265.32   0.00   0.00
> > 
> > The resulting fit object is over 600MB.  (The coefficient compoent is a 504
> > x 14000 matrix.)
> > 
> > If I'm not mistaken, SAS sweeps on the extended cross product matrix to fit
> > regression models.  That, I believe, in usually faster than doing QR
> > decomposition on the model matrix itself, but there are trade-offs.  

Roughly twice as fast but the price is accuracy.

> You
> > could try what Prof. Bates suggested.
> 
> Hmm. Shouldn't be all that much faster, but it will produce the Type I
> SS as you go along, whereas R probably wants to fit the 15 different
> models. 

Nope, R can read off the Type I SSQs from the QR decomposition so only one 
fit is done.  (Effectively you remove the effect of one column at a time, 
and you get the change in residual/regression SSq as a side effect. Take 
a look at anova.lm, which just aggregates squared effects over terms.)

> I'm still surprised that R/S-PLUS manages to use a full 15 minutes on
> a single response variable. It might be due to the singularities --
> the SAS code indicated that there was a nesting issue with the "A"
> factor in the last 4-factor interaction. If so, a reformulation of the
> model might help. 

I think we need to understand this better.  My guess (but only a guess) is 
that the model matrix has very many columns and is highly singular.  If 
the singularity is by design, a reformulation will help.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list