[R] Bad points in regression

Fri Mar 16 14:28:00 CET 2007

On Fri, 16 Mar 2007, Alberto Monteiro wrote:

> Ted Harding wrote:
>>
>>> alpha <- 0.3
>>> beta <- 0.4
>>> sigma <- 0.5
>>> err <- rnorm(100)
>>> err[15] <- 5; err[25] <- -4; err[50] <- 10
>>> x <- 1:100
>>> y <- alpha + beta * x + sigma * err
>>> ll <- lm(y ~ x)
>>> plot(ll)
>>
>> ll is the output of a linear model fiited by lm(), and so has
>> several components (see ?lm in the section "Value"), one of
>> which is "residuals" (which can be abbreviated to "res").
>>
>> So, in the case of your example,
>>
>>   which(abs(ll$res)>2)
>>   15 25 50
>>
>> extracts the information you want (and the ">2" was inspired by
>> looking at the "residuals" plot from your "plot(ll)").
>>
> Ok, but how can I grab those points _in general_? What is the
> criterium that plot used to mark those points as bad points?
>
> names(ll)
>
> gives:
>
> [1] "coefficients"  "residuals"     "effects"       "rank"
> [5] "fitted.values" "assign"        "qr"            "df.residual"
> [9] "xlevels"       "call"          "terms"         "model"
>
> None of them include information about those bad points.

But it is the plot method that you are using, not the object ll.  If you 
examine stats::plot.lm you will see what it does: label the points with 
the 'id.n' largest (in absolute value) residuals (standardized residuals 
for types 2 and 3).

And ?plot.lm also tells you that.

BTW, 'bad points' seems your own description: it does not appear in the R 
documentation.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595