[R] Fitting with error on data

Michael Bedward michael.bedward at gmail.com
Fri Oct 1 16:29:14 CEST 2010


There is the lmodel2 package...
http://cran.r-project.org/web/packages/lmodel2/vignettes/mod2user.pdf

Geometric mean regression has been discussed on this list in the past,
for example:
https://stat.ethz.ch/pipermail/r-help/2005-June/072927.html

I've used that approach with nls (rightly or wrongly I'm not sure :)
http://lastresortsoftware.blogspot.com/2010/08/meeting-in-middle-or-fudging-model-ii.html

Hope this helps
Michael

On 1 October 2010 23:11, Ted Harding <ted.harding at wlandres.net> wrote:
> On 27-Sep-10 08:55:13, Maayt wrote:
>> As this forum proved to be very helpful, I got another question...
>> I'd like to fit data points on which I have an error, dx and dy,
>> on each x and y. What would be the common procedure to fit this
>> data by a linear model taking into account uncertainty on each point?
>> Would weighting each point by 1/sqrt(dx2+dy2) (and taking dx and dy
>> as relative errors) in a lm() fit do the job? I would like to
>> propagate uncertainty of the points into the uncertainty of the fit,
>> would that be the case?
>>
>> Thanks for all the help
>> --
>
> It would seem that there has been no response yet to this query.
>
> This type of problem falls under various headers, typically
>
> [A] Fitting a linear functional relationship
> [B] Regression with errors in both variables
>
> For [A], it is envisaged that x and y are, in the real world,
> related by an exact lnear equation
>
>  y = a + b*x  or  x = a' + b'y  or  A*x + B*y = C
>
> and that data (X1,X2,...), (Y1,Y2,...) are obtained by simultaneously
> measuring the exact values (x1,x2,...), (y1,y2,...) where measurement
> errors result in:
>
>  Xi = xi + e.Xi   Yi = yi + e.Yi
>
> where, for each i, e.X is (say) distributed as N(0,s.X^2) and
> e.Y as N(0,s.Y^2), where s.X and s.Y are the standard deviations
> of the errors of measurement in X and Y.
>
> Then it is a question of estimating a and b from the data.
> This can be done by Maximum Likelihood, which requires taking
> as parameters not only a and b, and s.X and s.Y, but also the
> unknown (only observed with error) exact values (x1,x2,...) and
> (y1,y2,...).
>
> This case will not fit into the standard lm() method of fitting.
>
> For [B], whereas in standard regression it is taken that the
> observed X values are used as they stand (i.e. taken as fixed),
> here it is accepted that they two are subject to error (similar
> to [A]). So, whereas (for given values of {Xi}, {Yi}) a standard
> lm(Y ~ X) will give an answer, the X-values on which the result
> depends will themselves be uncertain and this uncertainty has
> to be taken into account, in the sense that it is uncertain what
> values of X Y is being regressed on.
>
> The conceptual difference between [A] and [B] is that, in [A},
> there is no "directional" aspect: x and y are simply being
> considered as related by y = a + b*x, or x = a' + b'*y, with
> no preference between either way of expressing it. The linear
> relationship can be used for any appropriate purpose.
>
> However, in [B] we are looking at a regression problem: y is
> being regressed on x: lm(Y ~ X), and the primary purpose is
> to predict the value of y that would result from a given value
> of x. So it is "directional": x --> y. If we were interested
> in predicting x from y, then we would do it the other way round:
> lm(X ~ Y), so Y --> X, and the respective coefficients of the two
> different regression equations cannot be deduced from each other.
>
> So, in choosing between approach [A] and approach [B], you would
> need to consider what you want to use the results for.
>
> I think the Maximum Likelihood approach to [A] was first properly
> considered by D.V. Lindley in 1947:
>
>  D. V. Lindley.
>  Regression lines and the linear functional relationship.
>  Suppl. J. Roy. Statist. Soc., 9:218-244, 1947.
>
> For this to work properly (i.e. be "consistent" in the technical
> sense), you need to know the ratio of the two standard errors
> (lambda = s.Y/s.X). From your statement of your problem, it looks
> as though you would know this ratio.
>
> The study of [B], regression with errors in both variables, goes
> back a very long way, and many approaches have been considered.
> These include several studies by J.B. Copas.
>
> Neither [A] nor [B] is, in general, a straightforward problem!
>
> A useful overview of approaches to both [A] and [B] can be found
> in the freely downloadable:
>
>  An historical overview of regression with errors in both variables.
>  J.W. Gillard (Cardiff University)
>
> http://www.cardiff.ac.uk/maths/resources/Gillard_Tech_Report.pdf
>
> Now, as to what may be available in R:
>
> I was a bit surprised to find that a full R site search on either of
>
>  "linear functional relationship"
>  "errors in both variables"
>
> yielded nothing relevant. It may be that using different search
> terms would find appropriate methods (such as considered by Gillard,
> or the Lindley approach for [A]), but I'm having difficulty
> thinking what such might be!
>
> So I hope that R-help readers who have used R for this category
> of problem can help!
>
> Ted.
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <ted.harding at wlandres.net>
> Fax-to-email: +44 (0)870 094 0861
> Date: 01-Oct-10                                       Time: 14:10:54
> ------------------------------ XFMail ------------------------------
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list