[R-sig-eco] reporting results from binomial glm with categorical variable

Scott Foster scott.foster at csiro.au
Fri Dec 2 00:29:36 CET 2011


Hi Matt,

This is only obliquely an R question.  Here is an answer nonetheless.

If you have G levels of the categorical factor then there are exactly G 
means to estimate (irrespective of the outcome type).  This means that 
you cannot estimate an overall grand mean *and* the individual level 
means, as there would then be G+1 parameters for G means and the 
estimates would be non-unique...  I suspect that you already knew this 
though.

The way around this is to impose some sort of constraint on the overall 
mean and the level means.  Commonly this is done by assigning one of the 
level `deviations' to be zero -- this is called a corner-point 
constraint.  Another type is sum-to-zero where there is a grand mean 
(actually the mean) and G deviations that are constrained by their sum.  
This is the constraint that you mentioned.  There are others, of course, 
but less common.  One that I find very useful is to omit estimating the 
overall mean and just estimate the G factor level means.  Generally 
though, the choice of constraint is not all that important but 
corner-point constraints can be easier to interpret, sometimes.

If you do want to use sum-to-zero constraints then all you need to do is 
alter the `contrast' attribute of your categorical variable.  This is 
done in R using the C() function (note capitalisation).  Your glm() call 
would use a formula like cbind( nsuccess,nfailure)~1+C(myFac,"sum").

How to report the results?  Good question...  For me, it depends 
strongly on what information I want to convey.  Typically, for this kind 
of analysis, that would be the means of the factor levels (unless there 
is more to this than we are seeing).  This is most easily done using R's 
inbuilt prediction functions (see ?predict.glm for example).  A call to 
this function would have a newdata argument given as a G row data frame 
with one row for each level of the factor.  Note that it will not matter 
which contrasts you give it -- they will all perform equally well (they 
are all equally valid).

I hope this helped (it is certainly long enough),

Scott

PS  A couple of good references (oldies but goodies) for topics related 
to this are
Lane and Nelder (1982) Analysis of covariance and standardisation as 
instances of prediction.  Biometrics, 38, 613-621
Nelder (1994) The statistics of linear models: back to basics.  
Statistics and Computing, 4, 221-234

On 02/12/11 09:54, Matthew Forister wrote:
> Dear All,
>
> I have two questions about reporting results from a binomial GLM (logit
> link) that includes a categorical variable.  I understand how dummy coding
> works.  My two questions are about interpretation and presentation:
>
> 1) The default in R seems to be to use the first level of a categorical
> variable as the reference.  It makes more sense to me to use the grand mean
> as the reference -- I found a webpage that describes this as "deviation
> coding".  This seems so commonsensical, that I'm surprised that I don't see
> more people using it instead of the default comparison to the first level.
>   Am I missing something here? is this deviation coding a reasonable way to
> go?  This is the website where I found that:
> http://www.ats.ucla.edu/stat/r/library/contrast_coding.htm
>
> 2) Whether I use the default in R or switch to deviation coding, I will get
> multiple coefficients associated with the different levels of my predictor
> variable.  What is the convention for reporting the information associated
> with the "dummy coded" levels of a categorical variable?  I had assumed
> that I would report details associated with each of the dummy coded levels,
> but I can't seem to find an example where someone has done that...
>
> thanks for your help,
> Matt
>
>
>

-- 
Scott Foster
CSIRO Mathematics, Informatics and Statistics
GPO Box 1538
Castray Esplanade
Hobart 7001
Tasmania
Australia

Phone:     (03) 6232 5178
Fax:       (03) 6232 5000
Email:     scott.foster at csiro.au



More information about the R-sig-ecology mailing list