[R-sig-ME] Factor collapsing method

Sat Sep 24 23:02:47 CEST 2011

Iker Vaquero Alba <karraspito at ...> writes:

> After several analyses, I get a significant effect in 
> a factor with 7 levels. As I am doing a stepwise
> simplification, I get my p-values from the anovas comparing
>  pairs of models. But that way, I obviously get
> only one p-value for the factor, but I can't know which of the 
> levels are the most important ones in
> determining that effect. I have an idea from the p-values of 
> the "summary" table, and I can also plot the
> data to see the direction of the effect. However, I have read in a
>  paper that there is a method to collapse
> factor levels to obtain information about which factor levels 
> differ from one another, that is used when an
> explanatory variable has a significant effect in the 
> minimal model and contains more than two factor
> levels. I have looked for it in the Crawley book and 
> in the web, but I actually cannot find anything, and I
> don't know which terms are more appropriate for searching. 

  I'm sorry to critique rather than answering your question, but:

The glht function from the multcomp package at least apparently
works for "mer" objects (i.e. lmer/glmer fits) -- that is, it
gives apparently sensible results in one very quick trial.
That will give you the classical post hoc multiple comparisons
results, subject to caveats about how approximately they apply
in the mixed model context.

If you really want the term-by-term p-values for the null hypotheses
that the individual contrasts are equal to zero, and you need
a more accurate result than the Wald test, you can try expanding
your model out into the model matrix form and dropping the columns
one at a time, or using drop1 -- this will give you likelihood
ratio tests (better than Wald, but still assuming large N).

  Testing hypotheses about individual factor levels is fine (as long
as you either correct for multiple comparisons or specify sensible
[hopefully orthogonal] _a priori_ contrasts, but I start to get
worried about the prospects of lumping levels.  Why are you doing
this?  Doing it carefully, via some form of shrinkage, might make
sense in a data mining/data exploration context, (see reference below,
which I just grabbed off google scholar without reading) especially if
you had _lots_ of levels, but if you're just trying to squeeze out a
few extra degrees of freedom, my suggestion would be: "don't".

Bondell, Howard D, and Brian J Reich. 2009. “Simultaneous Factor
Selection and Collapsing Levels in ANOVA.” Biometrics 65 (1) (March
1):169-177. doi:10.1111/j.1541-0420.2008.01061.x. 
http://onlinelibrary.wiley.com/doi/10.1111/j.1541-0420.2008.01061.x/abstract.