[R] summary.lm() for zero variance response

Andrews, Chris chrisaa at med.umich.edu
Wed Mar 12 13:51:22 CET 2014


I'm on 64-bit vs your 32-bit.  And if you haven't received this from other R-helpers already, here it is:  FAQ 7.31.  Machine precision is producing numbers very close to zero but not zero.  Then division is practically a random number generator.  Also, I'm certain that t and F are computed separately (i.e., not by computing t and then squaring) so that the relationship t^2 = F fails again due to the machine precision limitation in the intermediate calculations.

-----Original Message-----
From: Vito M. R. Muggeo [mailto:vito.muggeo at unipa.it] 
Sent: Wednesday, March 12, 2014 8:37 AM
To: Andrews, Chris; r-help at r-project.org
Subject: Re: [R] summary.lm() for zero variance response

Hi Chris,
Here my output (I have not yet installed R 3.0.3)

 > n=10;k=1;summary(lm(rep(k,n)~rnorm(n)))

Call:
lm(formula = rep(k, n) ~ rnorm(n))

Residuals:
        Min         1Q     Median         3Q        Max
-1.465e-16  1.564e-18  1.764e-17  2.147e-17  3.492e-17

Coefficients:
               Estimate Std. Error    t value Pr(>|t|)
(Intercept)  1.000e+00  2.021e-17  4.949e+16   <2e-16 ***
rnorm(n)    -1.620e-17  2.236e-17 -7.240e-01    0.489
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.637e-17 on 8 degrees of freedom
Multiple R-squared:  0.6598,    Adjusted R-squared:  0.6173
F-statistic: 15.52 on 1 and 8 DF,  p-value: 0.004301

 > sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)




Il 12/03/2014 13.25, Andrews, Chris ha scritto:
> I get what I would expect.  The tstat and the Fstat are both undefined (0/0); as are the p-values
>
>> n=10;k=1;summary(lm(rep(k,n)~rnorm(n)))
>
> Call:
> lm(formula = rep(k, n) ~ rnorm(n))
>
> Residuals:
>     Min     1Q Median     3Q    Max
>       0      0      0      0      0
>
> Coefficients:
>              Estimate Std. Error t value Pr(>|t|)
> (Intercept)        1          0     Inf   <2e-16 ***
> rnorm(n)           0          0      NA       NA
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Residual standard error: 0 on 8 degrees of freedom
> Multiple R-squared:    NaN,	Adjusted R-squared:    NaN
> F-statistic:   NaN on 1 and 8 DF,  p-value: NA
>
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
>
> -----Original Message-----
> From: Vito M. R. Muggeo [mailto:vito.muggeo at unipa.it]
> Sent: Wednesday, March 12, 2014 6:27 AM
> To: r-help at r-project.org
> Subject: [R] summary.lm() for zero variance response
>
> dear all,
> a student of mine brought to my attention the following, somewhat odd,
> behaviour of summary.lm() when the response variance is zero (yes,
> possibly meaningless from a practical viewpoint). Namely something like
>
> n=10;k=1;summary(lm(rep(k,n)~rnorm(n)))
>
> The values of k, n and the covariate do not matter.
>
> Two awkward points are
> 1) the F stat is different from t squared
> 2) more importantly, p-values from the F-stat are far smaller (and
> "significant" at usual levels 0.05/0.01) than the p-values coming from
> summary(..)$coef[,"Pr(>|t|)"] (i.e. the usual Wald test). Differences
> are dramatic for n>1000 where p(tstat)\approx0.8 and p(Fstat)< 2.2e-16.
>
> I looked for "lm zero variance" or "lm deterministic data", or "lm zero
> residuals" but without success. Also ?lm does not include any warning
> about using it for zero variance data (as reported for instance in ?nls)
>
> Am I missing anything?
> thanks,
> vito
>
>

-- 
==============================================
Vito M.R. Muggeo
Dip.to Sc Statist e Matem `Vianelli'
Università di Palermo
viale delle Scienze, edificio 13
90128 Palermo - ITALY
tel: 091 23895240
fax: 091 485726
http://dssm.unipa.it/vmuggeo

28th IWSM
International Workshop on Statistical Modelling
July 8-12, 2013, Palermo
http://iwsm2013.unipa.it
===============================================
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 




More information about the R-help mailing list