[R] How to do linear regression with errors in x and y?

Jan de Leeuw deleeuw at stat.ucla.edu
Sat Jun 3 19:49:33 CEST 2000

Distinguished pedigree.

Karl Pearson
On Lines and Planes of Closest Fit  to Systems of Points in Space
Phil Mag. 2, 1901, 559-572.

Goes back even further (to Adcock, around 1875).

At 12:32 -0300 06/03/2000, Dan E. Kelley wrote:
>QUESTION: how should I do a linear regression in which there are
>errors in x as well as y?
>SUPPLEMENT: I've seen folks approach this problem by computing
>eigenvectors of the covariance matrix, and that makes sense to me.
>But I'm wondering if this has a "pedigree" (i.e. if it makes sense to
>folks on this list, and if it's something that has been published, so
>I can refer to it.)
>BACKGROUND: (I'm providing this for interest of readers, since I
>personally find such ancillary comments on this list to be quite
>intriguing.)  My problem is something that comes up all the time in
>physics (in this case, fluid mechanics).  I have measured variables,
>let's call them X and Y, and dimensional analysis suggests that these
>be scaled by Lx and Ly say, so the buckingham Pi theorem says that we
>must have
>	Y/Ly = f(X/Lx, ...)
>where the ... is a list of nondimensional parameters of the problem.
>(As an aside, the X is depth below the ocean surface, Lx is the RMS
>height of waves on the surface, Y is a measure of the turbulence in
>the ocean, and Ly is related to the wind stress on the water surface.
>The ... is a list of parameters that includes how long the wind has
>been blowing; sailors will know that waves take a while to build up.)
>A power-law dependence, i.e.
>	Y/Ly = (X/Lx)^alpha
>seems justified by theory, but the value of alpha is contentious and
>we seek to determine it empirically.  (Engineers reading this will
>recognize that alpha=-1 is the so-called "law of the wall" for the
>decay of turbulence away from a frictional wall.)
>Thus, my approach is to try to fit a line like
>	log(Y/Ly) ~ log(X/Lx)
>but since there are errors in (X,Y,Lx,Ly) (all of which rely on
>measurement), we emphatically have errors in both the dependent and
>independent variable.  If our scaling is correct, X/Lx and Y/Ly are
>roughly of order unity.  The data suggest log(X/Lx) and log(Y/Ly) have
>roughly comparable scatter.
>Thus, I'd be happy to state that the errors in the dependent and
>independent variables are comparable.  And so my question becomes, on
>this assumption, how to fit a line through data in which both "x" and
>"y" have (equal) uncertainty.  I'm thinking the eigenvector approach
>is fine.  Comments?
>Dan E. Kelley                                         phone:(902)494-1694
>Oceanography Department, Dalhousie University           fax:(902)494-2885
>Halifax, Nova Scotia                             mailto:Dan.Kelley at Dal.CA
>Canada B3H 4J1       http://www.phys.ocean.dal.ca/~kelley/Kelley_Dan.html
>r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
>Send "info", "help", or "[un]subscribe"
>(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

Jan de Leeuw; Professor and Chair, UCLA Department of Statistics;
US mail: 8142 Math Sciences Bldg, Box 951554, Los Angeles, CA 90095-1554
phone (310)-825-9550;  fax (310)-206-5658;  email: deleeuw at stat.ucla.edu
    http://www.stat.ucla.edu/~deleeuw and http://home1.gte.net/datamine/
          No matter where you go, there you are. --- Buckaroo Banzai
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list