[R] large object disorientation

Thomas Lumley thomas at biostat.washington.edu
Tue Nov 21 22:09:59 CET 2000


On Tue, 21 Nov 2000, Roger Koenker wrote:

> This is an inquiry for all those who have been working on external 
> data base applications.  I sent an inquiry (below) to snews about
> this sort of thing a couple of years ago and eventually decided that
> I would wait to see what external database developments occurred and
> then revisit the problem.  I hope that foundations are now better.
> 
> Suppose for the sake of concreteness you have a large dataframe-like
> object stored in some compressed format (e.g.  I have a 48Mb stata
> dataset that is about 2.5 million observations on about 40 variables.)
> and you would like to do lm() fitting.  That is you would like to
> specify that the data frame is somehow external, and using the formula
> specification in lm() generate a sequence of queries that would return
> chunks of rows of the dataframe, accumulate X'X and X'y, do Major
> Cholesky's solve, and return.  All with a modest memory requirement
> and in the blink of the cpu's eye.  I realize that it sounds a bit
> retrograde to be doing least squares computations like this, but if
> there were a good way to do this, then there would be good ways to
> do lots of other more interesting things too, I believe. 

I don't know if this is relevant/useful, but there is Fortran code as part
of the "leaps" package to do linear regression in bounded memory using a
QR decomposition (less retrograde).

	-thomas

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list