[R] lean and mean lm/glm?
Thomas Lumley
tlumley at u.washington.edu
Tue Aug 22 16:22:18 CEST 2006
On Mon, 21 Aug 2006, Damien Moore wrote:
>
>> For very large regression problems there is the biglm package (put you
>> data into a database, read in 500,000 rows at a time, and keep updating
>> the fit).
>
> thanks. I took a look at biglm and it seems pretty easy to use and,
> looking at the source, avoids much of the redundancy of lm. Correct me
> if i'm wrong, but I think it would be virtually impossible to extend to
> glm, because of the non-linearity in glm models.
No, it is quite straightforward if you are willing to make multiple passes
through the data. It is hard with a single pass and may not be possible
unless the data are in random order.
Fisher scoring for glms is just an iterative weighted least squares
calculation using a set of 'working' weights and 'working' response. These
can be defined chunk by chunk and fed to biglm. Three iterations should
be sufficient.
-thomas
More information about the R-help
mailing list