[R] add a point to regression line and cook's distance

Murray Jorgensen maj at stats.waikato.ac.nz
Thu Dec 4 03:07:59 CET 2003


I suspect that the only way that adding a point at (0,0) would 'improve 
the fit' is by giving R^2 a boost. But this would be a spurious measure 
of fit, including as it does the invented point. The residual sum of 
squares calculated over the actual data would be increased, probably 
only by a modest amount, though.

You say  "Outside the range, the data are very scarse and have high 
level of noises too." Does this mean that you think that the error in 
these points is likely to be larger than the others? You might try a 
weighted regression in which you downweighted these points whicle still 
leaving them with relatively high leverage. Another thing to consider 
might be fitting a function like  y = ax + bx^2   ie y ~ x + I(x^2) -1.

All of this is ad hoc, though, and a bit of understanding about the 
science underlying the data and the likely functional form of the 
regression function would let you get much further, possibly using a 
nonlinear regression approach.

Murray

jonathan_li at agilent.com wrote:

> It is likely that the "true" relationship is nonlinear. There isn't a priori knowledge about linearity. In the small range where we do have enough data, the relationship
> looks linear. Outside the range, the data are very scarse and have high level of noises too.
> This is why adding (0,0) to the data can potentially improve the fit a great deal. But at the
> same time, I have never heard people doing it this way. 
> 
> Jonathan
> 
> -----Original Message-----
> From: Murray Jorgensen [mailto:maj at stats.waikato.ac.nz]
> Sent: Wednesday, December 03, 2003 5:18 PM
> To: Wiener, Matthew
> Cc: jonathan_li at agilent.com; r-help at stat.math.ethz.ch
> Subject: Re: [R] add a point to regression line and cook's distance
> 
> 
> Not a good idea, unless the regression function is *known* to be linear. 
> More likely it is only approximately linear over small ranges.
> 
> Murray Jorgensen
> 
> Wiener, Matthew wrote:
> 
> 
>>If you know that the line should pass through (0,0), would it make sense to
>>do a regression without an intercept?  You can do that by putting "-1" in
>>the formula, like:  lm(y ~ x - 1).
>>
>>Hope this helps,
>>
>>Matt
>>
>>Matthew Wiener
>>RY84-202
>>Applied Computer Science & Mathematics Dept.
>>Merck Research Labs
>>126 E. Lincoln Ave.
>>Rahway, NJ 07065
>>732-594-5303 
>>
>>
>>-----Original Message-----
>>From: r-help-bounces at stat.math.ethz.ch
>>[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Spencer Graves
>>Sent: Wednesday, December 03, 2003 5:51 PM
>>To: jonathan_li at agilent.com
>>Cc: r-help at stat.math.ethz.ch
>>Subject: Re: [R] add a point to regression line and cook's distance
>>
>>
>>      What is the context?  What do the "outliers" represent?  If you 
>>think carefully about the context, you may find the answer. 
>>
>>      hope this helps.  spencer graves
>>p.s.  I know statisticians who worked for HP before the split and who 
>>still work for either HP or Agilent, I'm not certain which.  If you want 
>>to contact me off-line, I can give you a couple of names if that might 
>>help. 
>>
>>jonathan_li at agilent.com wrote:
>>
>>
>>
>>>Hi, 
>>>
>>>This is more a statistics question rather than R question. But I thought
>>
>>people on this list may have some pointers.
>>
>>
>>>MY question is like the following:
>>>I would like to have a robust regression line. The data I have are mostly
>>
>>clustered around a small range. So
>>
>>
>>>the regression line tend to be influenced strongly by outlier points (with
>>
>>large cook's distance). From the application
>>
>>
>>>'s background, I know that the line should pass (0,0), which is far away
>>
>>from the data cloud. I would like to add this
>>
>>
>>>point to have a more robust line. The question is: does it make sense to do
>>
>>this? what are the negative impacts if any?
>>
>>
>>>thanks,
>>>jonathan
>>>
>>>______________________________________________
>>>R-help at stat.math.ethz.ch mailing list
>>>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>>>
>>>
>>
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>>
>>
> 
> 

-- 
Dr Murray Jorgensen      http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: maj at waikato.ac.nz                                Fax 7 838 4155
Phone  +64 7 838 4773 wk    +64 7 849 6486 home    Mobile 021 1395 862




More information about the R-help mailing list