[R] How to test omitted level from a multiple level factor against overall mean in regression models?

Mon Mar 26 02:55:56 CEST 2012

The test you are requesting is ***MEANINGLESS***.  The ``effect value'' 
of a single
level is ill-defined (or in the more usual parlance, "not estimable").  
The dummy.coef()
procedure suggested by Gabor gives you point estimates *subject to the 
constraints*
imposed by the contrasts used.  The choice of contrasts is arbitrary, 
essentially a matter
of aesthetics/taste/convenience.  The values returned by dummy.coef() 
have, in and
of themselves, no meaning at all.

You can meaningfully estimate, and test for the "significance" of, 
*differences*
between the "effect values" of factor levels.   For the individual 
levels, no can do.

E.g.  Y = mu + alpha_i + E when the observation is at level i of the 
factor (and "E"
means "random error". In this setting mu = 0, alpha_1 = 1, alpha_2 = 2 
and alpha_3
= 3 is ***EXACTLY THE SAME MODEL*** as mu = 1, alpha_1 = 0, alpha_2 = 1 and
alpha_3 = 2.

It makes no sense to ask (or to test) whether alpha_1 differs from 0.

     cheers,

         Rolf Turner

On 26/03/12 02:08, "Biedermann, Jürgen" wrote:
> Hi Gabor,
>
> Thanks a lot for the answer.
> However, I'm not so much focusing on the pure effect value of the omitted factor level, but more on the statistical test if it
> differs significantly from 0.
> Do you know a way for this purpose too?
>
> Greetings Jürgen
> ________________________________________
> Von: Gabor Grothendieck [ggrothendieck at gmail.com]
> Gesendet: Sonntag, 25. März 2012 14:11
> An: Biedermann, Jürgen
> Cc: r-help at R-project.org
> Betreff: Re: [R] How to test omitted level from a multiple level factor against overall mean in regression models?
>
> 2012/3/25 "Biedermann, Jürgen"<Juergen.Biedermann at charite.de>:
>> Hi there,
>>
>> I have a linear model with one factor having three levels.
>> I want to check if the different levels significantly differ from the overall mean (using contr.sum).
>> However one level (the last) is omitted in the standard procedure.
>>
>> To illustrate this:
>>
>> x<- as.factor(c(1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3))
>> y<- c(1.1,1.15,1.2,1.1,1.1,1.1,1.2,1.2,1.2,2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9,3,3.1)
>> test<- data.frame(x,y)
>> reg1<- lm(y~C(x,contr.sum),data=test)
>> summary(reg1)
>>
>> Coefficients:
>>                  Estimate Std. Error t value Pr(>|t|)
>> (Intercept)       1.63333    0.06577  24.834 8.48e-15 ***
>> C(x, contr.sum)1 -0.48333    0.10792  -4.479  0.00033 ***
>> C(x, contr.sum)2 -0.48333    0.08936  -5.409 4.70e-05 ***
>>
>> Is it possible to get the effect for the third level (against the overall mean) in the table too.
>>
>> I figured out:
>>
>> reg2<- lm(y~C(relevel(x,3),contr.sum),data=test)
>> summary(reg2)
>>
>> C(relevel(x, 3), contr.sum)1  0.96667    0.07951  12.158 8.24e-10 ***
>> C(relevel(x, 3), contr.sum)2 -0.48333    0.10792  -4.479  0.00033 ***
>>
>>
>> The first row now test the third level against the overall mean, but I find this approach not so convenient.
>> Moreover, I wonder if it is meaningful at all regarding the cumulation of alpha error. Would a Bonferroni correction be sensible?
>>
> Try this:
>
>> options(contrasts = c("contr.sum", "contr.poly"))
>> reg1<- lm(y~x,data=test)
>> dummy.coef(reg1)
> Full coefficients are
>
> (Intercept):      1.633333
> x:                       1          2          3
>                  -0.4833333 -0.4833333  0.9666667
>
> --
> Statistics&  Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>