[R] Weird LM behaviour

Sat May 20 01:16:54 CEST 2006

I see what you mean.   Thanks for the correction.

-jason

----- Original Message ----- 
From: "Thomas Lumley" <tlumley at u.washington.edu>
To: "Jason Barnhart" <jasoncbarnhart at msn.com>
Cc: <R-help at stat.math.ethz.ch>; "Rense Nieuwenhuis" 
<r.nieuwenhuis at student.ru.nl>
Sent: Friday, May 19, 2006 2:39 PM
Subject: Re: [R] Weird LM behaviour

> On Fri, 19 May 2006, Jason Barnhart wrote:
>
>> No, not weird.
>>
>> Think of it this way.  As you move point (0,2) to (1,2) the slope which 
>> was
>> 0 is moving towards infinity.  Eventually the 3 points are perfectly
>> vertical and so must have infinite slope.
>>
>> Your delta-x is not sufficiently granular to show the slope change for
>> x-values very close to 1 but not yet 1, like 0.999999999.  Note lm 
>> returns
>> NA when x=1.
>
> This turns out not to be the case. Worked to infinite precision the mean 
> of y is 2 at x and at 1, so the infinite-precision slope is exactly zero 
> for all x!=1 and undefined for x=1.
>
> Now, we are working to finite precision and the slope is obtained by 
> solving a linear system that gets increasingly poorly conditioned as x 
> approaches 1. This means that for x not close to 1 the answer should be 0 
> to withing a small multiple of machine epsilon (and it is) and that for x 
> close to 1 the answer should be zero to within an increasingly large 
> multiple of machine epsilon (and it is).
>
> Without a detailed error analysis of the actual algorithm being used, you 
> can't really predict whether the answer will follow a more-or-less 
> consistent trend or oscillate violently.  You can estimate a bound for the 
> error: it should be a small multiple of the condition number of the design 
> matrix times machine epsilon.
>
> As an example of how hard it is to predict exactly what answer you get, if 
> R used the textbook formula for linear regression the bound would be a lot 
> worse, but in this example the answer is slightly closer to zero done that 
> way.
>
> Unless you really need to know, trying to understand why the fourteenth 
> decimal place of a result has the value it does is not worth the effort.
>
>
>  -thomas
>