[R] update on large object disorientation (fwd)

Roger Koenker roger at ysidro.econ.uiuc.edu
Wed Jan 3 22:52:07 CET 2001


A few weeks ago I sent a note to R-help inquiring about strategies for
implementing lm fitting with large datasets using one or another of
the new database schemes.  Having received some encouragement, but
no very concrete suggestions, I decided to proceed with a rather
naive implementation using RMySQL.  The objective was to produce
a version of lm(), say LM() that would be able to estimate a
model of infant birthweight based on a sample of 2.4 Million observations
on 18 variables from the 1997 U.S. Natality Survey, using only
modest memory requirements, say 150Mb.  Obviously, trying to work with
the entire dataset as one object in a case like this would require vastly 
more memory, hence the "object disorientation" orientation of our approach.

Alvaro Novo and I now have a working version of such an LM() function
that satisfies our original objectives.  It is available and described 
in detail at:

	http://www.econ.uiuc.edu/~roger/research/rq/LM.html

The function provides quite a complete lm() functionality, formulae,
weights, subsets, etc.  Unfortunately, the interaction with MySQL is
not as quick as we had hoped, so the above test case takes about 10 minutes
on one of our linux boxes.  This is about equivalent to what would
be required to read the data in ascii using scan.  We would
greatly appreciate hearing from anyone who might have suggestions
about alternative strategies, particularly if they might  be expected
to yield significant efficiency gains in getting data from MySQL into R.

In fact the LM implementation is really just a stalking horse for a
parallel development of a quantile regression function of this type.
And for this, an efficient way of passing through the data to check
signs of residuals is essential. 

url:	http://www.econ.uiuc.edu		Roger Koenker	
email	roger at ysidro.econ.uiuc.edu		Department of Economics
vox: 	217-333-4558				University of Illinois
fax:   	217-244-6678				Champaign, IL 61820

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list