[R] Interesting behavior of lm() with small, problematic data sets

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Tue Sep 5 18:05:09 CEST 2017

Why does an unreliable fit have to provide "reasonable" results?

More specifically, p-values arise from observed distributions... if your slopes are "in the noise" then the slope estimate's location within that distribution could be anywhere relative to the center and spread of that very narrow distribution, leading to, ah, what was it... oh, right... "unreliable" results.
Sent from my phone. Please excuse my brevity.

On September 5, 2017 6:24:30 AM PDT, "Glover, Tim" <Tim.Glover at amecfw.com> wrote:
>I've recently come across the following results reported from the lm()
>function when applied to a particular type of admittedly difficult
>data.  When working with
>small data sets (for instance 3 points) with the same response for
>different predicting variable, the resulting slope estimate is a
>reasonable approximation of the expected 0.0, but the p-value of that
>slope estimate is a surprising value.  A reproducible example is
>included below, along with the output of the summary of results
>######### example code
>x <- c(1,2,3)
>y <- c(1,1,1)
>#above results in{ (1,1) (2,1) (3,1)} data set to regress
>new.rez <- lm (y ~ x) # regress constant y on changing x)
>summary(new.rez) # display results of regression
>######## end of example code
>lm(formula = y ~ x)
>         1          2          3
> 5.906e-17 -1.181e-16  5.906e-17
>              Estimate Std. Error    t value Pr(>|t|)
>(Intercept)  1.000e+00  2.210e-16  4.525e+15   <2e-16 ***
>x           -1.772e-16  1.023e-16 -1.732e+00    0.333
>Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>Residual standard error: 1.447e-16 on 1 degrees of freedom
>Multiple R-squared:  0.7794,    Adjusted R-squared:  0.5589
>F-statistic: 3.534 on 1 and 1 DF,  p-value: 0.3112
>Warning message:
>In summary.lm(new.rez) : essentially perfect fit: summary may be
>There is a warning that the summary may be unreliable sue to the
>essentially perfect fit, but a p-value of 0.3112 doesn’t seem
>As a side note, the various r^2 values seem odd too.
>Tim Glover
>Senior Scientist II (Geochemistry, Statistics), Americas - Environment
>& Infrastructure, Amec Foster Wheeler
>271 Mill Road, Chelmsford, Massachusetts, USA 01824-4105
>T +01 978 692 9090      D +01 978 392 5383      M +01 850 445 5039
>tim.glover at amecfw.com      amecfw.com
>This message is the property of Amec Foster Wheeler plc and/or its
>subsidiaries and/or affiliates and is intended only for the named
>recipient(s). Its contents (including any attachments) may be
>confidential, legally privileged or otherwise protected from disclosure
>by law. Unauthorised use, copying, distribution or disclosure of any of
>it may be unlawful and is strictly prohibited. We assume no
>responsibility to persons other than the intended named recipient(s)
>and do not accept liability for any errors or omissions which are a
>result of email transmission. If you have received this message in
>error, please notify us immediately by reply email to the sender and
>confirm that the original message and any attachments and copies have
>been destroyed and deleted from your system. If you do not wish to
>receive future unsolicited commercial electronic messages from us,
>please forward this email to: unsubscribe at amecfw.com and include
>“Unsubscribe” in the subject line. If applicable, you will continue to
>receive invoices, project communications and similar factual,
>non-commercial electronic communications.
>Please click http://amecfw.com/email-disclaimer for notices and company
>information in relation to emails originating in the UK, Italy or
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>PLEASE do read the posting guide
>and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list