[R-sig-ME] specifying/interpreting random effects with near-zero variance in glmer()

Sat Jan 14 01:49:38 CET 2012

Margaret Metz <mrmetz at ...> writes:

[snip]

> I am using glmer() and a logit link for the survival model,
> including fixed factors of 3 topographic models ("topo1", "topo2",
> and "topo3" for simplicity), starting height ("ht") I have 130+
> species ("sp") found at 200 census stations ("station").  Not all
> species are found at all stations, and the sample size per species
> ranges from 10 - 1200 individuals (and I could restrict these
> further to ones with a sample size greater than some threshold).

  The topo variables are continuous, right?

  You probably don't need to -- this is one of the strengths of 
the mixed modeling approach.

>  I would like to know whether the topographic variables are
> significant predictors of mortality while including the random
> factors of census station to account for non-independence of
> seedlings at the same location (which have the same topo
> measurements) and species to allow for variation in species'
> responses.  I expect that both the slope and intercept of species'
> responses to each variable could be quite different.  To allow for
> different slopes/intercepts among species, I have centered the
> continuous variables and specified the model as:

> glmer(survival ~ topo1 + topo2 + topo3 + ht + 
> (0 + topo1 | sp) + (0 + topo2 | sp) + (0 + topo3 | sp) + (1 | sp) + 
> (1 | station), data=seedlingdata, family=binomial)

  This looks reasonable, you might want to check for overdispersion.

> Questions: When I do this, there is a random intercept for station,
> a random intercept for species, and then random slopes among species
> for the relationship with the topographic variables as follows in
> the model output.  I believe this is allowing for the variation
> among species that I intend, but would like confirmation of this
> specification vs. something like (topo1 | sp) or (1 + topo1 | sp) as
> someone else has suggested to me.

(topo1 | sp) is equivalent to (1 | topo1 | sp) (as
(0 + topo1 | sp) is equivalent to (topo1 - 1 | sp)

  If you have enough data you could try

(topo1 + topo2 + topo3 | sp ) 

which allows for correlation among the effects of the topographic
variables -- although you can run out of data pretty quickly in
some cases, and it sounds from stuff below as though you're running
low on signal anyway.  (This model has (n+1)*(n+2)/2 = 10 parameters --
4 variances (topo[1-3] plus intercept) and 6 covariances -- as opposed
to the 4 variances of the model you are using.) (I'm not counting
the station variable in these totals.)

> Any version of these models that I have run results in significant
> fixed factors and zero or near-zero variances for the random
> effects.  I interpret this to mean that the topographic variables
> are important predictors of seedling mortality, but that the
> relationship does not vary among species groups nor census
> locations.  Is this your interpretation too or need I worry about
> model specification or the sample size or variance structure of my
> variables?

   This is a reasonable interpretation.  However, be aware that this
is signal-to-noise / sample-size dependent.  There could be (is, by
definition, in an ecological system) some among-species and
among-station variance that you just can't detect with this data set.
(In a classical model with a balanced, nested, etc. design you would
probably just find a small (non-significant) variance in this case,
rather than a practically-zero one -- on the other hand, there are
other classical models where you would actually estimate a *negative*
variance.)

>  A suggestion was made to confirm a lack of spatial autocorrelation
> in the residuals of this model, but I am not sure that is
> appropriate given the inclusion of the random effect of census
> station and the fixed effects of topography, which are shared by
> seedlings at the same station.  Can anyone suggest an appropriate
> reference to support or refute this suggestion?

  I don't have a reference but I would suggest that checking for
spatial autocorrelation might be worthwhile. Spatial autocorrelation
would detect the effects of _unmeasured_ covariates that were more
similar among nearby stations.

> Finally, if the response to topography DID significantly vary among
> species, where in this model would I see it?  In a large variance
> for the species slopes or intercept?  

  Exactly (variance among species in responses to topo1, topo2, topo3)

Or would I need to include
> species as a fixed factor crossed with the topographic variables?

  (topo1 | sp) is effectively crossing topo with species.

  I would consider looking (at least graphically) for evidence
of nonlinearity in the responses to the continuous variables ...
you could fit a GAM without *too* much extra effort, and with
this size dataset it might produce interesting results.