[R-sig-ME] sum coding, omnibus tests for GLMMs, and pvalues for LMMs

Fri Apr 18 13:25:24 CEST 2014

Dear David,

A long and rambling answer to your first and second questions:

On Thu, 01-01-1970, at 01:00, David Sears <david.sears01 at gmail.com> wrote:
> Hello list,
>
>  
>
> I have a GLMM with a within-subjects fixed factor (cadcats = 5 levels) and a
> between-subjects fixed factor (trainings = 2 levels) and a maximal random
> effects structure. I would like to run an .ANOVA-style. GLMM, and Barr
> (2013) has said that one should use deviation coding to do this. I have 4
> questions (my R code and model summary is below the questions).
>
>  
>
> 1.      I have used sum coding (contr.sum), but deviation coding appears to
> be slightly different. Moreover, sum coding results in the loss of the last
> level of my factor (because it is a contrast, so n-1 groups means I can only
> have 4 contrasts). How do I deviation code my factors, and will it also
> result in the loss of the last level of my factor? I would prefer a method
> that allows me to retain all levels in the summary.

contr.sum and deviation coding are essentially the same thing, though
deviation coding as used by Barr et al. (and other authors) uses -.5 and .5
instead of 1 and -1.

These links (the first by Barr) might help here (note that the first does
differentiate between sum and deviation, whereas the second does not)

http://talklab.psy.gla.ac.uk/tvw/catpred/
http://www.ats.ucla.edu/stat/r/library/contrast_coding.htm

Note that the first link sets A1 as -1 (or -5), whereas R, by default, will
set A1 to 1 and A2 to -1 with contr.sum.

You might want to understand how you can get, say, the cell mean from the
coefficients (whether you are using as reference a particular group or the
mean) as well as the consequences of coding for interpreting effects, for
instance the meaning of the footnote of p. 259 in Barr et al: "For
higher-order designs, treatment and deviation coding schemes will lead to
different interpretations for lower- order effects (simple effects for
contrast coding and main effects for deviation coding)".

I find the explanation of McCullagh and Nelder's "Generalized Linear
Models" (2nd ed), in their section 3.5.2 (pp. 65 to 68), extremely
clear. The book by Fox and Weisberg, "An R companion to applied regression"
(2nd ed), section 4.6, pp. 213 and ff, (note that they refer to this as
"deviation coding", by the way, even if using 1, -1) also explains the use
of contr.sum. 

Whether you use deviation or sum, in both cases you "loose" the last level
(since it is redundant). You can of course get the estimate by just adding
all the rest (e.g., with contr.sum do
-sum(model$coefficients[for.this.factor])) or you can just refit the model
after changing the ordering of the factor levels :-).

By the way, and having mentioned Fox's book, I always use his "contr.Sum"
(from package car), when using deviation/sum coding, instead of contr.sum,
because of the labeling of the contrasts (and the flexibility of getting
things labeled with parenthesis, brackets, or whatever ---see
?decorate.contrasts and ?decorate.contr.Sum)

>
> 2.      How do I interpret the summary for variables that have been sum
> coded? Florian Jaeger has said that .for sum coding. the coefficients
> correspond to main effects.. In my output does this mean that a significant
> effect of cadcat1 means that the estimated mean for cadcat1 is significantly
> different from the grand mean, which is represented by the intercept? 
>

I think there are two issues here:

a) whether cadcat (cadcat as "a whole", not just cadcat1) should or should
not be in the model, together with its interaction with trainings.

b) interpreting cadcat1.

a) Is not obvious, and you'll probably want to do this carefully (as
explained in Barr or the FAQ --- http://glmm.wikidot.com/faq--- or Ben
Bolker's notes from http://ms.mcmaster.ca/~bolker/classes/s4c03/, or
...). Note that there are four comparisons of marginal means (cadcat1 to 4)
and two of those have very large p-values. Likewise, the interaction does
not seem very large.

So I wonder if a model without cadcat (AND without the interaction, of
course) would be good enough.

b) Provided cadcat is kept in the model (and the interaction is kept, too),
then the results tell you that the marginal mean of cadcat1 is different
from the overall mean.

However, even if contr.sum allows you to estimate the marginal effect, it
is up to you to decide if that is something you want to interpret if there
are large interactions , specially if there are reversals of effects (what
is "very", "large", etc, can be highly context dependent).

The coefficient cadcats1:trainings1 is -0.71485, which means that a subject
with trainings1 and cadcats1 deviates from the value we would predict based
on the sum of the  intercept (2.64) the cadcats1 effect (1.015) and the
trainings1 effect (0.79), by -0.715.

Alternatively, you can look at it as a cell means model:

- the mean for a subject with cadcats1 and trainings1 is

2.6 + 1.015 + 0.79 - 0.71485 ~ 3.6

- the mean for a subject with cadcats1 and trainings2

2.6 + 1.015 - 0.79 + 0.71485 ~ 3.5

- cadcats2 and trainings1:

2.6 + 0.59 + 0.79 - 0.48 ~ 3.5

- cadcat2 and trainings2:

2.6 + 0.59 - 0.79 + 0.48 ~ 2.88

etc

Best,

R.

P.D. Just before hitting send I noticed you are using a binomial model, so
where I say "mean" you should understand the logit, etc.

> 3.      Is there a way to do omnibus tests for GLMMs (like pamer.fnc does
> for LMMs)?
>
> 4.      My last question is perhaps the most frustrating for the experts on
> this list to answer, but I would greatly appreciate your help. I would also
> like to report significance levels in the tables that summarize my LMMs
> (it.s a priming study, with GLMMs for accuracy and LMMs for logRT), but of
> course lmer does not provide pvalues in the model summary. In spite of the
> authors discontinuing mcmcsamp, most of the priming studies from the last 2
> years appearing in the Journal of Memory and Language still report pvalues
> for the model estimates (see, for example, Table 5 from Nooteboom & Quen.,
> 2013), but I don.t know how they did this. I would omit the pvalues, but I
> am publishing in music psychology, and these models are only just beginning
> to appear in our field, so any help or code people would be willing to
> provide would be much appreciated.
>
>  
>
> David Sears
>
> PhD Candidate, Music Theory
>
> Music Perception and Cognition Lab (MPCL)
>
> Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT)
> Schulich School of Music, McGill University
>
>  
>
>  
>
> CODE 
>
>  
>
> # Create factor variables.
>
> acc$subject <- factor(acc$subject)
>
> acc$training <- factor(acc$training)
>
> acc$correct <- factor(acc$correct)
>
> acc$cadcat <- factor(acc$cadcat)
>
>  
>
> # Create IVs with sum coding.
>
> acc$trainings <- acc$training
>
> contrasts(acc$trainings)=contr.sum(2)
>
> acc$cadcats <- acc$cadcat
>
> contrasts(acc$cadcats)=contr.sum(5)
>
>  
>
> # Maximal model.
>
> cadsum.acc <-
> glmer(correct~cadcats*trainings+(cadcats|subject)+(trainings|stimulus),data=
> acc,family="binomial",control=glmerControl(optCtrl=list(maxfun=100000)))
>
>  
>
>  
>
>> summary(cadsum.acc)
>
> Generalized linear mixed model fit by maximum likelihood ['glmerMod']
>
> Family: binomial ( logit )
>
> Formula: correct ~ cadcats * trainings + (cadcats | subject) + (trainings |
> stimulus) 
>
>    Data: acc 
>
>  
>
>       AIC       BIC    logLik  deviance 
>
>  789.5965  931.7901 -366.7983  733.5965 
>
>  
>
> Random effects:
>
> Groups   Name        Variance Std.Dev. Corr                   
>
>  stimulus (Intercept) 0.84853  0.9212                          
>
>           trainings1  0.01080  0.1039   -0.85                  
>
>  subject  (Intercept) 0.43439  0.6591                          
>
>           cadcats1    0.55165  0.7427   -0.34                  
>
>           cadcats2    0.76990  0.8774    0.31  0.52            
>
>           cadcats3    1.32294  1.1502   -0.09 -0.23 -0.85      
>
>           cadcats4    0.06946  0.2635   -0.42 -0.69 -0.64  0.14
>
> Number of obs: 1186, groups: stimulus, 40; subject, 30
>
>  
>
> Fixed effects:
>
>                     Estimate Std. Error z value Pr(>|z|)    
>
> (Intercept)          2.63867    0.22993  11.476  < 2e-16 ***
>
> cadcats1             1.01518    0.44687   2.272   0.0231 *  
>
> cadcats2             0.59321    0.42280   1.403   0.1606    
>
> cadcats3             0.54040    0.44940   1.202   0.2292    
>
> cadcats4            -0.69632    0.36178  -1.925   0.0543 .  
>
> trainings1           0.79023    0.17559   4.500 6.78e-06 ***
>
> cadcats1:trainings1 -0.71485    0.32224  -2.218   0.0265 *  
>
> cadcats2:trainings1 -0.48321    0.30379  -1.591   0.1117    
>
> cadcats3:trainings1 -0.22673    0.33094  -0.685   0.4933    
>
> cadcats4:trainings1  0.08375    0.21199   0.395   0.6928    
>
> ---
>
> Signif. codes:  0 .***. 0.001 .**. 0.01 .*. 0.05 ... 0.1 . . 1
>
>  
>
> Correlation of Fixed Effects:
>
>             (Intr) cdcts1 cdcts2 cdcts3 cdcts4 trnng1 cdc1:1 cdc2:1 cdc3:1
>
> cadcats1     0.047                                                        
>
> cadcats2     0.078 -0.187                                                 
>
> cadcats3    -0.006 -0.284 -0.368                                          
>
> cadcats4    -0.123 -0.282 -0.239 -0.187                                   
>
> trainings1   0.063 -0.035 -0.004  0.021 -0.004                            
>
> cdcts1:trn1 -0.037  0.006  0.011 -0.010  0.004  0.040                     
>
> cdcts2:trn1 -0.005  0.012  0.054 -0.029 -0.015  0.151 -0.116              
>
> cdcts3:trn1  0.022 -0.009 -0.028  0.100 -0.031 -0.035 -0.291 -0.467       
>
> cdcts4:trn1 -0.006  0.006 -0.018 -0.039  0.103 -0.256 -0.317 -0.244 -0.110
>
>  
>
>  
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina
Universidad Autónoma de Madrid 
Arzobispo Morcillo, 4
28029 Madrid
Spain

Phone: +34-91-497-2412

Email: rdiaz02 at gmail.com
       ramon.diaz at iib.uam.es

http://ligarto.org/rdiaz