[R] Bad points in regression
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Mar 16 14:28:00 CET 2007
On Fri, 16 Mar 2007, Alberto Monteiro wrote:
> Ted Harding wrote:
>>
>>> alpha <- 0.3
>>> beta <- 0.4
>>> sigma <- 0.5
>>> err <- rnorm(100)
>>> err[15] <- 5; err[25] <- -4; err[50] <- 10
>>> x <- 1:100
>>> y <- alpha + beta * x + sigma * err
>>> ll <- lm(y ~ x)
>>> plot(ll)
>>
>> ll is the output of a linear model fiited by lm(), and so has
>> several components (see ?lm in the section "Value"), one of
>> which is "residuals" (which can be abbreviated to "res").
>>
>> So, in the case of your example,
>>
>> which(abs(ll$res)>2)
>> 15 25 50
>>
>> extracts the information you want (and the ">2" was inspired by
>> looking at the "residuals" plot from your "plot(ll)").
>>
> Ok, but how can I grab those points _in general_? What is the
> criterium that plot used to mark those points as bad points?
>
> names(ll)
>
> gives:
>
> [1] "coefficients" "residuals" "effects" "rank"
> [5] "fitted.values" "assign" "qr" "df.residual"
> [9] "xlevels" "call" "terms" "model"
>
> None of them include information about those bad points.
But it is the plot method that you are using, not the object ll. If you
examine stats::plot.lm you will see what it does: label the points with
the 'id.n' largest (in absolute value) residuals (standardized residuals
for types 2 and 3).
And ?plot.lm also tells you that.
BTW, 'bad points' seems your own description: it does not appear in the R
documentation.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list