[R] Antwort: Buying more computer for GLM
g.russell at eos-finance.com
g.russell at eos-finance.com
Fri Sep 1 14:34:07 CEST 2006
Prof Brian Ripley wrote
> Probably not, but you have the ability to profile in R and find out.
Thanks. This is certainly something I could check, and I shall do so.
>
>
> Some more comments;
>
> 1) The Fortran code that underlies glm is that of lm.fit that only makes
> use of level-1 BLAS and so is not going to be helped greatly by an
> optimized BLAS.
I was afraid it might be something like that.
>
> 2) No one has as far as I know succeeded in making a multithreaded
> Rblas.dll for Windows. And under systems using pthreads, the success
> with multithreaded BLAS is very mixed, with it resulting in a
dramatic
> slowdown in some problems.
I was afraid of that too. Oh well.
>
> 3) As I recall, you were doing model selection via AIC on 20,000
> observations. You might want to think hard about that, since AIC is
> designed for good prediction. I would do model exploration on a much
> smaller representative subset, and if I had 20,000 observations and
30
> parameters and was interested in prediction, not do subset selection
at
> all.
One problem is that some of the parameters in the learning set can be very
highly
correlated (I have no control over the observations), and I'm worried that
if I
don't prune away parameters which don't improve the log likelihood, my
predictions will be
busted by inputs which do not exhibit the same linear relationships as
those of most of the
learning set. Of course in such a case you'd have to worry about the
accuracy of the
predictions anyway, but in my job we just have to get make the best
predictions we can,
even if they aren't perfect.
>
> 4) glm() alllows you to specify starting parameters, which you could
find
> from a subsample. Very likely only 1 or 2 iterations would be
needed.
This sounds like a good idea, but what in fact I do now is build a model
using simple linear
regression (lm), which is very fast, in the hope that that will pick out
the important parameters,
which I can then feed to glm.
Many thanks again!
George Russell
More information about the R-help
mailing list