[R] Antwort: Buying more computer for GLM

Charles C. Berry cberry at tajo.ucsd.edu
Thu Aug 31 18:13:10 CEST 2006


George,

Logistic regression with ONLY factors?

In principle this can be solved by casting this as a log-linear model of 
counts and using iterative proportional fitting.

For sparse data like yours (i.e. a table with 20000 counts and >= 2^31 
cells), it will be necessary to use a method that does not explicitly 
operate on the table of counts as loglin() does. I would guess that 
rake() in the survey package would handle this, but I've not looked at 
the code it uses.

If you are only using a fraction of the factors then loglm() (in MASS) or 
loglin() may suffice.

HTH,

Chuck

On Wed, 30 Aug 2006, g.russell at eos-finance.com wrote:

> Hello,
>
> at the moment I am doing quite a lot of regression, especially
> logistic regression, on 20000 or more records with 30 or more
> factors, using the "step" function to search for the model with the
> smallest AIC.   This takes a lot of time on this 1.8 GHZ Pentium
> box.   Memory does not seem to be such a big problem; not much
> swapping is going on and CPU usage is at or close to 100%.    What
> would be the most cost-effective way to speed this up?    The
> obvious way would be to get a machine with a faster processor (3GHz
> plus) but I wonder whether it might instead be better to run a dual-
> processor machine or something like that; this looks at least like a
> problem R should be able to parallelise, though I don't know whether it
> does.
>
> Thanks for your help,
>
> George Russell
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                        (858) 534-2098
                                          Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	         UC San Diego
http://biostat.ucsd.edu/~cberry/         La Jolla, San Diego 92093-0717



More information about the R-help mailing list