[R-sig-ME] grasshoppers and mixed models

Wed Jan 28 20:26:26 CET 2009

Frank Dziock wrote:
> Hi there!
> 
> our response variable is the number of grasshoppers (abundance) on ski
> slopes/non ski study plots in alpine ecosystems.
> 
> We are interested in the fixed effects of skiing, fertilization,
> management intensity and other properties of the study plots on
> grasshopper abundance.
> 
> We have 82 study plots arranged as pairs (one plot on a ski slope, the
> other away from the slope), we call this a BLOCK. We studied these plots
> in three different study areas. This lead us to believe we have a nested
> design with random effect= 1~AREA/BLOCK
> 
> data table arranged as follows:
> 
> AREA	BLOCK	grasshopper	fertilizer	management	skiing
> 1	0	8		1		2		0
> 1	1	4		1		1		1
> 1	0	16		0		3		0
> 1	1	3		1		0		1
> 2	0	4		0		2		0
> 2	1	5		1		2		1
> 
> etc.
> 
> 
> The number of grasshoppers recorded is a count variable, so we used
> Poisson-distribution.
> 
> model1 <- glmmPQL(abundance~fertilizer+management,random=~1| AREA/BLOCK,
> family="poisson")

  If your means are fairly low (values <5 per sample much of the time)
then PQL is biased -- you should probably bite the bullet and use
(g)lmer in the lme4 package (or glmmML, or glmmADMB ...)

> I was wondering, whether a GLMM like this would be appropriate to decide
> which predictor variables have distinct effects on grasshopper numbers.
> We are especially unsure, whether the random effect has been properly
> defined, because one of the predictor variables (skiing) is also our
> BLOCK factor.

  That should be OK.  "skiing" (which is missing in your model above)
describes the overall effect of skiing on expected grasshoppers, while
BLOCK describes the variation within AREAs.  However, you may have
a technical problem in that you seem to have a single observation per
block -- that means that you can't separate "residual" variation from
"within-AREA" variation.  In principle this might still be OK since
you are using a GLMM where the expected variation is fixed (i.e., it
should be equal to the expected mean, anything extra must be
within-AREA/between-BLOCK variation), but you may run into fitting
troubles.
> 
> We have 15 predictor variables in total and in order not to overfit our
> model we did the following:
> 
> 1. Calculate separate models for single predictor variables (linear,
> quadratic and logarithmic terms)
> 2. We included only the most significant terms of each variable in the
> initial model
> 3. Model was simplified by stepwise removing variables according to
> their p-values
> 4. This was done until all remaining varibles had p-values below 0.05
> 
> As I followed the former discussions in your group on p-values, you will
> probably want to slaughter me for that approach. But I am an absolute
> beginner in mixed models and it would be very helpful for me to receive
> a hint, how I could decide which variables play a major role in
> determining my grasshoppers abundance while taking into account the
> nested study design structure.

   The nested design and the overfitting/model selection problems are
separate issues. I'd recommend Frank Harrell's book on model reduction
strategies.  Your model selection is definitely problematic -- expecting
to sort out 3 response shapes (lin/quad/logarithmic) * 15 variables
from a data set with 82 total samples seems highly problematic.

  Ecological analysis problems that combine multivariate structure
with non-normality and block structure are quite challenging.
Non-normal+block = GLMM, moderately challenging;
Multivariate = ordination techniques (see vegan)
Multivariate+non-normal = ordination techniques with randomization to
   establish confidence bounds/significance
Multivariate+non-normal+blocks = ? (constrained randomization)?

  I haven't yet looked at Zuur et al's books ( http://www.highstat.com/
) but probably should.  Although I don't know if I will agree with them
or not.

   cheers
    Ben Bolker

-- 
Ben Bolker
Associate professor, Biology Dep't, Univ. of Florida
bolker at ufl.edu / www.zoology.ufl.edu/bolker
GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc