[R-sig-ME] GLMM Q: Categorical fixed effects with more than 2 levels

Sun Jul 22 17:23:38 CEST 2012

On 2012-07-22 01:19, Paul Johnson wrote:

> On Sat, Jul 21, 2012 at 9:24 AM, Julie Kern <juliekern27 at googlemail.com> wrote:

>> I’m running a lmm (with lme) to investigate the influence of several
>> fixed effects on call rate. The model also includes the random terms
>> of group & individual identity. Several of my fixed effects are
>> categorical & some have more than 2 categories (up to 5). E.g. The
>> fixed effect habitat contains the categories ‘open’, ‘medium’ &
>> ‘dense’. The model has shown habitat to be a significant predictor of
>> call rate, but I would now like to test the differences between these.
>> E.g. Are call rates in open, medium & dense habitats different from
>> each other? I am unsure how best to go about this as I still have to
>> take into account repeated measures from the same individuals & groups
>> so don’t think I can simply use a Wilcoxon signed-rank test.
>
> I don't think there is an 'honestly significant difference' test that
> can check whether the 3 are different from each other.  However, I

Assuming a factor X with three levels called "A", "B" and "C".  Isn't 
that just the simultaneous test of

H0_A: A = (B + C) / 2  <=>  A - (B + C) / 2 = 0
H0_B: B = (A + C) / 2  <=>  B - (A + C) / 2 = 0
H0_C: C = (A + B) / 2  <=>  C - (A + B) / 2 = 0

?  If so, that could easily be set-up using the 'multcomp' package:

 > library("nlme")
 > library("multcomp")
Loading required package: mvtnorm
Loading required package: survival
Loading required package: splines
 > fm <- lme(yield ~ 0 + Variety, random = ~ nitro | Block, data = Oats)
 > K <- rbind("GR vs (M, V)" = c(   1, -1/2, -1/2),
+            "M vs (GR, V)" = c(-1/2,    1, -1/2),
+            "V vs (GR, M)" = c(-1/2, -1/2,    1))
 > fm.glht <- glht(fm, linfct = mcp(Variety = K))
 > set.seed(123) # for reproducibility
 > summary(fm.glht, test = adjusted("free")) # see ?adjusted

          Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: User-defined Contrasts

Fit: lme.formula(fixed = yield ~ 0 + Variety, data = Oats, random = 
~nitro |
     Block)

Linear Hypotheses:
                   Estimate Std. Error z value Pr(>|z|)
GR vs (M, V) == 0   0.7917     3.9024   0.203   0.8392
M vs (GR, V) == 0   8.7292     3.9024   2.237   0.0470 *
V vs (GR, M) == 0  -9.5208     3.9024  -2.440   0.0391 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- free method)

So, "Golden Rain" does not differ significantly from the mean of the 
others, "Marvellous" has significantly higher yield than mean of the 
others, and "Victory" has significantly lower yield than mean of the 
others.  This seems consistent with

 > boxplot(yield ~ Variety, data = Oats)

> think you could ask a more specific comparison, such as "does it make
> a difference if I assume the effects of open and medium are the same
> as the effect of dense"?
>
> You can create a new factor variable that recodes open and medium to
> be the same thing.  I find the procedure to do that is difficult to
> describe to my students.  To make that easier, I have a little package
> of regression tools called "rockchalk".  It has in there a function
> combineLevels.  you could run (if dat is the data frame)
>
> dat$hab2 <- combineLevels(dat$habitat, levs = c("open", "medium"),
> newLabel = c("oOrMed"))
>
> if you run that model again, replacing habitat by hab2, then the anova
> function will calculate a likelihood ratio test to check if you have
> lost explanatory power by fixing the coefficients of open and medium
> to be the same.
>
> now, if you have a 5 level variable, say
> c("lo","lom","medium","mhi","hi"),  and you want to reduce the model
> to a comparison that combines "lo" and "lom" versus
> "medium","mhi","hi", you have to run combineLevels twice, first
> putting together "lo" and "lom" and then "medium", "mhi", "hi".
>
> You aren't required to do that with rockchalk, but otherwise you have
> a somewhat conceptually difficult series of steps.
>
> 1. Take the levels of the existing variable,
> 2. Add the new level for the combined category
> 3. Create a new factor variable that uses the new levels object
> 4. Reassign the cases to the new label.

Huh?

 > a <- gl(5, 2, labels = c("lo2", "lo1", "mid", "hi1", "hi2"))
 > a
  [1] lo2 lo2 lo1 lo1 mid mid hi1 hi1 hi2 hi2
Levels: lo2 lo1 mid hi1 hi2
 > levels(a) <- c("lo", "lo", "mid+hi", "mid+hi", "mid+hi")
 > a
  [1] lo     lo     lo     lo     mid+hi mid+hi mid+hi mid+hi mid+hi mid+hi
Levels: lo mid+hi
 >

This seems pretty straightforward.  Honestly, what's "conceptually 
difficult" here?

HTH,
Henric

>
> If the factor is ordinal, then this is a little spicy to splice the
> new one into the levels list at the right spot.  But you can see how
> if you read the code for combineLevels.
>
>
>> Any advice would be greatly appreciated, I'm new to R & have spent a
>> long time searching for this on the web but am going round in ever
>> more confusing circles!
>>
>> Thank you!
>>
>> Julie
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
>