[R] Antwort: Re: Antwort: Buying more computer for GLM
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Sep 1 11:12:48 CEST 2006
On Fri, 1 Sep 2006, g.russell at eos-finance.com wrote:
> Peter Dalgaard wrote
> > Is this floating point bound? (When you say 30 factors does that mean
> > 30 parameters or factors representing a much larger number of groups).
> > If it is integer bound, I don't think you can do much better than
> > increase CPU speed and - note - memory bandwidth (look for large-cache
> > systems and fast front-side bus). To increase floating point
> > performance, you might consider the option of using optimized BLAS
> > (see the Windows FAQ 8.2 and/or the "R Installation and
> > Administration" manual) like ATLAS; this in turn may be multithreaded
> > and make use of multiple CPUs or multi-core CPUs.
>
> By "factors" I mean "parameters". I apologise for the confusion.
>
> This is floating point bound, so ATLAS might be a good idea.
>
> Before I put a lot of work into investigating multiple processors, I
> need to know, is the bottleneck with GLM going to be BLAS?
Probably not, but you have the ability to profile in R and find out.
Some more comments;
1) The Fortran code that underlies glm is that of lm.fit that only makes
use of level-1 BLAS and so is not going to be helped greatly by an
optimized BLAS.
2) No one has as far as I know succeeded in making a multithreaded
Rblas.dll for Windows. And under systems using pthreads, the success
with multithreaded BLAS is very mixed, with it resulting in a dramatic
slowdown in some problems.
3) As I recall, you were doing model selection via AIC on 20,000
observations. You might want to think hard about that, since AIC is
designed for good prediction. I would do model exploration on a much
smaller representative subset, and if I had 20,000 observations and 30
parameters and was interested in prediction, not do subset selection at
all.
4) glm() alllows you to specify starting parameters, which you could find
from a subsample. Very likely only 1 or 2 iterations would be needed.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list