[R] How to do linear regression with errors in x and y?

Sat Jun 3 17:32:13 CEST 2000

QUESTION: how should I do a linear regression in which there are
errors in x as well as y?

SUPPLEMENT: I've seen folks approach this problem by computing
eigenvectors of the covariance matrix, and that makes sense to me.
But I'm wondering if this has a "pedigree" (i.e. if it makes sense to
folks on this list, and if it's something that has been published, so
I can refer to it.)

BACKGROUND: (I'm providing this for interest of readers, since I
personally find such ancillary comments on this list to be quite
intriguing.)  My problem is something that comes up all the time in
physics (in this case, fluid mechanics).  I have measured variables,
let's call them X and Y, and dimensional analysis suggests that these
be scaled by Lx and Ly say, so the buckingham Pi theorem says that we
must have

	Y/Ly = f(X/Lx, ...)

where the ... is a list of nondimensional parameters of the problem.
(As an aside, the X is depth below the ocean surface, Lx is the RMS
height of waves on the surface, Y is a measure of the turbulence in
the ocean, and Ly is related to the wind stress on the water surface.
The ... is a list of parameters that includes how long the wind has
been blowing; sailors will know that waves take a while to build up.)

A power-law dependence, i.e.

	Y/Ly = (X/Lx)^alpha

seems justified by theory, but the value of alpha is contentious and
we seek to determine it empirically.  (Engineers reading this will
recognize that alpha=-1 is the so-called "law of the wall" for the
decay of turbulence away from a frictional wall.)

Thus, my approach is to try to fit a line like

	log(Y/Ly) ~ log(X/Lx)

but since there are errors in (X,Y,Lx,Ly) (all of which rely on
measurement), we emphatically have errors in both the dependent and
independent variable.  If our scaling is correct, X/Lx and Y/Ly are
roughly of order unity.  The data suggest log(X/Lx) and log(Y/Ly) have
roughly comparable scatter.

Thus, I'd be happy to state that the errors in the dependent and
independent variables are comparable.  And so my question becomes, on
this assumption, how to fit a line through data in which both "x" and
"y" have (equal) uncertainty.  I'm thinking the eigenvector approach
is fine.  Comments?

-- 
Dan E. Kelley                                         phone:(902)494-1694
Oceanography Department, Dalhousie University           fax:(902)494-2885
Halifax, Nova Scotia                             mailto:Dan.Kelley at Dal.CA 
Canada B3H 4J1       http://www.phys.ocean.dal.ca/~kelley/Kelley_Dan.html

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._