[R] lm(y ~ x1) vs. lm(y ~ x0 + x1 - 1) with x0 <- rep(1, length(y))

Sat Jan 22 05:33:06 CET 2011

Thank you all (including Dennis), this was elucidating. 

I would have (maybe naively) anticipated that in this somewhat pathological case of fitting without an intercept and re-introducing it via constant x1, R might check whether the design matrix includes a column of ones, and adjust the degrees of freedom accordingly. But now I can see that by explicitly requesting via the formula interface not to fit a constant, I am implicitly stating my hypothesis that y==0, even if I re-introduce my suspicion that y==mu via x1 <- 1. If I understood correctly, x1 is treated as a variable in the latter case, right?

On Jan 22, 2011, at 5:18 , Bert Gunter wrote:

> Well ... as x1 is continuous(numeric), it has no levels. So ...??
> 
> Note that the fits are identical for both models. The issue is only
> what is the Null that you are testing in the two cases. In the first
> case, it is just y = constant, so you are testing the 1 df for x1. In
> the second, it is y = 0 (which rarely makes any sense) and you are
> testing the 2 df for the two terms (x0 and x1). Etc. etc.
> 
> -- Bert
> 
> On Fri, Jan 21, 2011 at 7:48 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>> 
>> On Jan 21, 2011, at 9:03 PM, jochen laubrock wrote:
>> 
>>> Dear list,
>>> 
>>> the following came up in an introductory class. Please help me understand
>>> the -1 (or 0+) syntax in formulae: Why do the enumerator dfs, F-statisics
>>> etc. differ between the models lm(y ~ x1) and lm(y ~ x0 + x1 - 1), if x0 is
>>> a vector containing simply ones?
>> 
>> You are testing something different. In the first case you are testing the
>> difference between the baseline and the second level of x1 (so there is only
>> one d.f.), while in the second case you are testing for both of the
>> coefficients being zero (so the numerator has 2 d.f.). It would be easier to
>> see if you did print() on the fit object. The first model would give you an
>> estimate for an "Intercept", which is really an estimate for the first level
>> of x1.  Having been taught to think of anova as just a special case of
>> regression is helpful here. Look at the model first  and only then look at
>> the anova table.
>> 
>> 
>>> 
>>> Example:
>>> 
>>> N  <- 40
>>> x0 <- rep(1,N)
>>> x1 <- 1:N
>>> vare <- N/8
>>> set.seed(4)
>>> e <- rnorm(N, 0, vare^2)
>>> 
>>> X <- cbind(x0, x1)
>>> beta <- c(.4, 1)
>>> y <- X %*% beta + e
>>> 
>>> summary(lm(y ~ x1))
>>> # [...]
>>> # Residual standard error: 20.92 on 38 degrees of freedom
>>> # Multiple R-squared: 0.1151,   Adjusted R-squared: 0.09182
>>> # F-statistic: 4.943 on 1 and 38 DF,  p-value: 0.03222
>>> 
>>> summary(lm(y ~ x0 + x1 - 1))        # or summary(lm(y ~ 0 + x0 + x1))
>>> # [...]
>>> # Residual standard error: 20.92 on 38 degrees of freedom
>>> # Multiple R-squared: 0.6888,   Adjusted R-squared: 0.6724
>>> # F-statistic: 42.05 on 2 and 38 DF,  p-value: 2.338e-10
>>> 
>>> 
>>> Thanks in advance,
>>> Jochen
>>> 
>>> 
>>> ----
>>> Jochen Laubrock, Dept. of Psychology, University of Potsdam,
>>> Karl-Liebknecht-Strasse 24-25, 14476 Potsdam, Germany
>>> phone: +49-331-977-2346, fax: +49-331-977-2793
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> David Winsemius, MD
>> West Hartford, CT
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 
> 
> -- 
> Bert Gunter
> Genentech Nonclinical Biostatistics
> 467-7374
> http://devo.gene.com/groups/devo/depts/ncb/home.shtml

----
Jochen Laubrock, Dept. of Psychology, University of Potsdam,
Karl-Liebknecht-Strasse 24-25, 14476 Potsdam, Germany
phone: +49-331-977-2346, fax: +49-331-977-2793