[R] simplifying a GLM-removing categorical variables

Ben Bolker bolker at ufl.edu
Tue Mar 4 15:21:41 CET 2008


mariannej <marianne.james <at> abdn.ac.uk> writes:

> I have created a GLM (using the quasipoisson family) and am now trying to
> simplify it.  One of my explanatory variables is categorical (vegetation
> type, with 6 different levels).  In the model, 5 of the 6 levels are
> significant and one is not. 
> 
> How should I simplify my model?  Do I need to take out the whole category
> (i.e. all of vegetation type), or just the level that is not significant
> (but how would I explain this biologically?)
> 
> Please spell out any anwers simply, I am new to R,
> 
   This is really a statistical rather than an R question,
but the short answer is: you probably shouldn't try to
remove the "non-significant" level.  Depending on the
details of your model -- the "significance" of the parameters,
which I assume you're gleaning from summary(), refers 
to the difference of the levels from the baseline (first)
level.  If 5 out of the 6 levels are significantly different
from the baseline, then the factor belongs in the model.
(You could _conceivably_ try to lump the "non-significant"
level together with the baseline level, but this really
goes in the direction of data-dredging.)

   I would strongly recommend that you consult a good
general text on generalized linear models for strategies
of model simplification and interpretation -- to repeat,
this is really a statistical question and not an
R-specific one ...

  good luck,
    Ben Bolker



More information about the R-help mailing list