[R] Interpreting lm Residuals...

David Riebel driebel at pha.jhu.edu
Mon Jun 21 16:27:41 CEST 2010


I am using the lm function in R to fit several linear models to a
fair-sized dataset (~160 collections of ~1000 data points each).  My
data have intrinsic, systematic uncertainty much greater than the
measurement errors on any individual point.  My thought is to use the
residuals of my linear fits to quantify this intrinsic uncertainty, but
I am puzzled over the correct interpretation of R's output.

I have attached plots of the fit and the residuals to one of my
sub-groups, for illustration.  By eye, the overwhelming majority of the
residuals are within +- 0.4, and I would therefore expect the standard
error of the residuals to be ~0.2.  However, the output from lm does not
show this:

>summary(ofit)

Call:
lm(formula = omag ~ oper, weights = (1/oerr))

Residuals:
     Min       1Q   Median       3Q      Max 
-3.32185 -0.41181  0.03983  0.40041  2.52971 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 19.52847    0.03979   490.8   <2e-16 ***
oper        -4.25297    0.02101  -202.4   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.6705 on 2287 degrees of freedom 
Multiple R-squared: 0.9471, Adjusted R-squared: 0.9471 
F-statistic: 4.097e+04 on 1 and 2287 DF,  p-value: < 2.2e-16

The plot thickens when I examine the residuals themselves:
>summary(resid(ofit))
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-0.611800 -0.095720  0.010200  0.005954  0.101100  0.680700 
> sd(resid(ofit))
[1] 0.1533568

These numbers are much more what I see by eye.  There really aren't any
residuals outside ~0.6, certainly nothing as large as 3.3!  The help
feature for lm tells me that the residuals are "the residuals, that is
response minus fitted values."  Exactly what I would expect.  As an
Astronomer, my knowledge of statistics is rather "workman-like" if you
will, but to me, "Residual standard error" means "the standard deviation
of the residuals," but the lm output doesn't seem to agree with this.

I'd appreciate it if someone could clarify what's being output by the
summary function acting on an lm object.

Replies by e-mail preferred.

Thanks,


David Riebel
Graduate Research Assistant
Johns Hopkins University
Department of Physics and Astronomy



-------------- next part --------------
A non-text attachment was scrubbed...
Name: o_seq2_fit.ps
Type: application/postscript
Size: 58665 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100621/e6a8fb56/attachment.ps>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: o_seq2_resid.ps
Type: application/postscript
Size: 58730 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100621/e6a8fb56/attachment-0001.ps>


More information about the R-help mailing list