[R] stats::lm has inconsistent output when adding constant to dependent variable

Mark Leeds m@rk|eed@2 @end|ng |rom gm@||@com
Fri Sep 27 20:35:20 CEST 2019


Hi: In your example, you made the response zero in every case which
is going to cause problems.  In glm's, I think they call it the donsker
effect. I'm not sure what it's called
in OLS. probably a lack of identifiability. Note that you probably
shouldn't be using zeros
and 1's as the response in a regression anyway.

If you change the response to below, you get what  you'd expect.

y <- c(rep(0, 15), rep(1,15))

On Fri, Sep 27, 2019 at 1:48 PM David J. Birke <djbirke using berkeley.edu> wrote:

> Dear R community,
>
> I just stumbled upon the following behavior in R version 3.6.0:
>
> set.seed(42)
> y <- rep(0, 30)
> x <- rbinom(30, 1, prob = 0.91)
> # The following will not show any t-statistic or p-value
> summary(lm(y~x))
> #  The following will show t-statistic and p-value
> summary(lm(1+y~x))
>
> My expected output is that the first case should report t-statistic and
> p-value. My intuition might be tricking me, but I think that a constant
> shift of the data should be fully absorbed by the constant and not
> affect inference about the slope.
>
> Is this a bug or is there a reason why there should be a discrepancy
> between the two outputs?
>
> Best,
> David
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list