[R] lean and mean lm/glm?
Thomas Lumley
tlumley at u.washington.edu
Wed Aug 23 19:15:29 CEST 2006
On Wed, 23 Aug 2006, Damien Moore wrote:
>
> Thomas Lumley < tlumley at u.washington.edu > wrote:
>
>> I have written most of a bigglm() function where the data= argument is a
>> function with a single argument 'reset'. When called with reset=FALSE the
>> function should return another chunk of data, or NULL if no data are
>> available, and when called with reset=TRUE it should go back to the
>> beginning of the data. I don't think this is too inelegant.
>
> yes, that does sound like a pretty elegent solution. It would be even
> more so if you could offer a default implementation of the data_function
> that simply passes chunks of large X and y matrices held in memory.
I have done that for data frames.
> (ideally you would just intialize the data_function to reference the X
> and y data to avoid duplicating it, don't know if that's possible in R.)
The part that is extracted is a copy. The whole thing isn't copied,
though.
The chunk would have to be a copy if it were an R matrix because matrices
are stored in continguous column-major format and a chunk won't be
contiguous. I think an implementation that uses precomputed design
matrices would want to be written in C and call the incremental QR
decomposition routines row by row. The reason for working in chunks in R
is to allow model.frame and model.matrix to work reasonably efficiently,
and they aren't needed if you already have the design matrix.
> how long before its ready? :)
Depends on how many more urgent things intervene.
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
More information about the R-help
mailing list