[R-sig-ME] specifying/interpreting random effects with near-zero variance in glmer()
Douglas Bates
bates at stat.wisc.edu
Sat Jan 14 17:27:41 CET 2012
On Fri, Jan 13, 2012 at 6:49 PM, Ben Bolker <bbolker at gmail.com> wrote:
> Margaret Metz <mrmetz at ...> writes:
>
> [snip]
>
>> I am using glmer() and a logit link for the survival model,
>> including fixed factors of 3 topographic models ("topo1", "topo2",
>> and "topo3" for simplicity), starting height ("ht") I have 130+
>> species ("sp") found at 200 census stations ("station"). Not all
>> species are found at all stations, and the sample size per species
>> ranges from 10 - 1200 individuals (and I could restrict these
>> further to ones with a sample size greater than some threshold).
>
> The topo variables are continuous, right?
>
> You probably don't need to -- this is one of the strengths of
> the mixed modeling approach.
>
>> I would like to know whether the topographic variables are
>> significant predictors of mortality while including the random
>> factors of census station to account for non-independence of
>> seedlings at the same location (which have the same topo
>> measurements) and species to allow for variation in species'
>> responses. I expect that both the slope and intercept of species'
>> responses to each variable could be quite different. To allow for
>> different slopes/intercepts among species, I have centered the
>> continuous variables and specified the model as:
>
>
>> glmer(survival ~ topo1 + topo2 + topo3 + ht +
>> (0 + topo1 | sp) + (0 + topo2 | sp) + (0 + topo3 | sp) + (1 | sp) +
>> (1 | station), data=seedlingdata, family=binomial)
>
> This looks reasonable, you might want to check for overdispersion.
>
>> Questions: When I do this, there is a random intercept for station,
>> a random intercept for species, and then random slopes among species
>> for the relationship with the topographic variables as follows in
>> the model output. I believe this is allowing for the variation
>> among species that I intend, but would like confirmation of this
>> specification vs. something like (topo1 | sp) or (1 + topo1 | sp) as
>> someone else has suggested to me.
>
> (topo1 | sp) is equivalent to (1 | topo1 | sp) (as
> (0 + topo1 | sp) is equivalent to (topo1 - 1 | sp)
To forestall future confusion, I think you meant that (topo1 | sp) is
equivalent to (1 + topo1 | sp)
> If you have enough data you could try
>
> (topo1 + topo2 + topo3 | sp )
>
> which allows for correlation among the effects of the topographic
> variables -- although you can run out of data pretty quickly in
> some cases, and it sounds from stuff below as though you're running
> low on signal anyway. (This model has (n+1)*(n+2)/2 = 10 parameters --
> 4 variances (topo[1-3] plus intercept) and 6 covariances -- as opposed
> to the 4 variances of the model you are using.) (I'm not counting
> the station variable in these totals.)
>
>> Any version of these models that I have run results in significant
>> fixed factors and zero or near-zero variances for the random
>> effects. I interpret this to mean that the topographic variables
>> are important predictors of seedling mortality, but that the
>> relationship does not vary among species groups nor census
>> locations. Is this your interpretation too or need I worry about
>> model specification or the sample size or variance structure of my
>> variables?
>
> This is a reasonable interpretation. However, be aware that this
> is signal-to-noise / sample-size dependent. There could be (is, by
> definition, in an ecological system) some among-species and
> among-station variance that you just can't detect with this data set.
> (In a classical model with a balanced, nested, etc. design you would
> probably just find a small (non-significant) variance in this case,
> rather than a practically-zero one -- on the other hand, there are
> other classical models where you would actually estimate a *negative*
> variance.)
>
>> A suggestion was made to confirm a lack of spatial autocorrelation
>> in the residuals of this model, but I am not sure that is
>> appropriate given the inclusion of the random effect of census
>> station and the fixed effects of topography, which are shared by
>> seedlings at the same station. Can anyone suggest an appropriate
>> reference to support or refute this suggestion?
>
> I don't have a reference but I would suggest that checking for
> spatial autocorrelation might be worthwhile. Spatial autocorrelation
> would detect the effects of _unmeasured_ covariates that were more
> similar among nearby stations.
>
>
>> Finally, if the response to topography DID significantly vary among
>> species, where in this model would I see it? In a large variance
>> for the species slopes or intercept?
>
> Exactly (variance among species in responses to topo1, topo2, topo3)
>
> Or would I need to include
>> species as a fixed factor crossed with the topographic variables?
>
> (topo1 | sp) is effectively crossing topo with species.
>
> I would consider looking (at least graphically) for evidence
> of nonlinearity in the responses to the continuous variables ...
> you could fit a GAM without *too* much extra effort, and with
> this size dataset it might produce interesting results.
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
More information about the R-sig-mixed-models
mailing list