[R] Re : Large database help
Thomas Lumley
tlumley at u.washington.edu
Tue May 16 23:40:06 CEST 2006
On Tue, 16 May 2006, roger koenker wrote:
> In ancient times, 1999 or so, Alvaro Novo and I experimented with an
> interface to mysql that brought chunks of data into R and accumulated
> results.
> This is still described and available on the web in its original form at
>
> http://www.econ.uiuc.edu/~roger/research/rq/LM.html
>
> Despite claims of "future developments" nothing emerged, so anyone
> considering further explorations with it may need training in
> Rchaeology.
A few hours ago I submitted to CRAN a package "biglm" that does large
linear regression models using a similar strategy (it uses incremental QR
decomposition rather than accumalating the crossproduct matrix). It also
computes the Huber/White sandwich variance estimate in the same single
pass over the data.
Assuming I haven't messed up the package checking it will appear
in the next couple of day on CRAN. The syntax looks like
a <- biglm(log(Volume) ~ log(Girth) + log(Height), chunk1)
a <- update(a, chunk2)
a <- update(a, chunk3)
summary(a)
where chunk1, chunk2, chunk3 are chunks of the data.
-thomas
More information about the R-help
mailing list