[R-sig-ME] GLMM, interpretation and set up of random/fixed effects in a longitudinal/hierarchical dataset

Mon Oct 24 23:41:13 CEST 2011

Dear r-sig mixed models list,

My idea of writing to the lme4 specialized group, is to hopefully get a
feedback on the methods I am using, on the data I am using and on how to
interpret the output I am getting (for that I am specifying some
questions below). I am trying to fit a GLMM on the following set up:

Longitudinal data set consisting of 5 census (10 years) of vascular
epiphytes, which are plants that use trees (host trees) for support (not
parasites). In the first census we inspected a certain number of
individuals of a particular species of host tree in a defined area for
abundance of epiphytes and followed the fate of the individuals found in
the following census and included new individuals. The database “db.e.o”
looks like this:

'data.frame':   36935 obs. of   17 variables:

  $ census         : Factor w/ 5 levels "1999-03-01","2002-07-01",..: 1 1 1 1 1 1 1 1 1 1 ...
  $ tree           : Factor w/ 113 levels "35357","35373",..: 1 1 1 1 1 1 1 1 1 1 ...
  $ family         : Factor w/ 16 levels "Araceae","Aspleniaceae",..: 3 16 16 1 1 1 1 1 1 1 ...
  $ genera         : Factor w/ 47 levels "Aechmea","Ananthacorus",..: 1 2 3 5 5 5 5 5 5 5 ...
  $ spp            : Factor w/ 89 levels "AECTIL","ANAANG",..: 1 2 3 4 5 6 7 8 9 10 ...
  $ inhabited.t    : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
  $ d.census       : Date, format: "1999-03-01" "1999-03-01" ...
  $ abundance      : int   0 2 0 0 0 0 0 0 0 0 ...
  $ abs.pres       : Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ...
  $ abs.pres.prev : Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ...
  $ abundance.1    : int   0 2 0 0 0 0 0 0 0 0 ...
  $ abundance.prev: int   NA NA NA NA NA NA NA NA NA NA ...
  $ dbh            : num   12 12 12 12 12 12 12 12 12 12 ...
  $ heightest.t    : num   11.8 11.8 11.8 11.8 11.8 ...
  $ height         : num   NA NA NA NA NA NA NA NA NA NA ...
  $ avail.surface : num   4.44 4.44 4.44 4.44 4.44 ...
  $ c.census       : num   10651 10651 10651 10651 10651 ...

Just to clarify, I have set up the data in a extended format, therefore
the large amount of rows (36935), normally each year/census is a matrix.
In this format, each observation belongs to one species per tree per
year (the array of species is the same, trees are mostly the same in
time, but some died off and some are included)

*I wanted to see what is the most accountable factor for colonization?*.
So fitted the following model:

glmm1<-glmer(abs.pres~c.census+avail.surface*abs.pres.prev +
(1|tree)+(1|spp), data=db.e_St, family=binomial(link=logit))

I took as explanatory variable absence/presence, which is a binary
response and as explanatory variables: -*/fixed factors/*: time (census,
is continuous), available surface (continuous) and previous presence or
absence of species (binary); -*/random factors/*: tree and species
(spp), trees have a unique ID per individual and they could be seeing as
plots, for which we don’t have control, because their abundance varies
within the area we work at. Trees were repeatedly inspected for epiphyte
species per each census.

The data shown above is not standardized, but I did standardized 2 of
the fixed factors (time and available.surface) to be able to compare all
variables and to improve convergence. Before fitting/choosing the last
GLMmodel I followed Bates (2008) recommendations, as well as Zuur et al
(2009). I did the graphical check, fitted a GLM, inspected for
homogeneity and normality. My output is the following:

Generalized linear mixed model fit by the Laplace approximation

Formula: abs.pres ~ c.census + avail.surface * abs.pres.prev + (1 | tree) +       (1 | spp)

    Data: db.e_St

   AIC   BIC logLik deviance
  4316 4373   -2151      4302

Random effects:

  Groups Name         Variance Std.Dev.
  tree    (Intercept) 0.45637   0.67555
  spp     (Intercept) 1.22015   1.10460
Number of obs: 23407, groups: tree, 89; spp, 89
Fixed effects:
                              Estimate Std. Error z value Pr(>|z|)

(Intercept)                   -5.13168     0.16365   -31.36<   2e-16 ***
c.census                       0.32997     0.06455     5.11 3.19e-07 ***
avail.surface                  0.56631     0.06460     8.77<   2e-16 ***
abs.pres.prev1                 4.91306     0.13569    36.21<   2e-16 ***
avail.surface:abs.pres.prev1 -0.52553     0.08968    -5.86 4.63e-09 ***

---
Signif. codes:   0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:

             (Intr) c.cnss avl.sr abs..1

c.census     -0.214
avail.surfc -0.150 -0.034
abs.prs.pr1 -0.192   0.103   0.239
avl.srf:..1   0.095 -0.016 -0.373 -0.562

For this, I have some questions that hope somebody could give me an
insight on:

1.Since my data is longitudinal, I wanted to control for time
correlation, thus I included “time” (in a previous model no presented
here) as a term of interaction. All interactions came out non
significant (-would that allow me to say that the effect that the
explanatory variables have on the dependent variable doesnot change in
time?).

As the interactions were not significant, I removed the interaction term
and that is reflected in the model shown above (glmm1). My question is,
if I removed that term, can I still control for time correlation, only
through setting up “trees” (as plots) as a random factor?, or that would
only control for structural correlation? or should I do as follow?:

Fitting the same model, but changing the random variables, instead of
simply (1+TREE) and (1+CENSUS) (as factor), I fitted the following
random variables: (census|tree) + (1|spp), see the outcome below. The
AIC value of the second model is more than 10 units less than the first
model. The estimates of fixed effects are quiet similar. Values of
variance for the random effects are 4 instead of 5, because no data on
previous absence/presence for the first year is available. I am having a
hard time on wrapping my head around what the variance values of the
random effects mean for my model?

>   glmm1TIMERE

Generalized linear mixed model fit by the Laplace approximation

Formula: abs.pres ~ c.census + avail.surface * abs.pres.prev + (census| tree) + (1 | spp)

    Data: db.e_St

   AIC   BIC logLik deviance
  4301 4430   -2135      4269

Random effects:

  Groups Name              Variance Std.Dev. Corr
  tree    (Intercept)       0.70332   0.83864
         census2004-04-01 0.20673   0.45467   -0.158
         census2007-02-01 0.66210   0.81370   -0.741   0.554
         census2009-12-01 0.70563   0.84002   -0.554   0.865   0.622
  spp     (Intercept)       1.22597   1.10723

Number of obs: 23407, groups: tree, 89; spp, 89

Fixed effects:

                              Estimate Std. Error z value Pr(>|z|)

(Intercept)                   -5.26341     0.16851   -31.23<   2e-16 ***
c.census                       0.36097     0.08464     4.26 2.00e-05 ***
avail.surface                  0.61724     0.07121     8.67<   2e-16 ***
abs.pres.prev1                 5.08338     0.14000    36.31<   2e-16 ***
avail.surface:abs.pres.prev1 -0.54146     0.09116    -5.94 2.86e-09 ***
--
Signif. codes:   0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:

             (Intr) c.cnss avl.sr abs..1
c.census     -0.316
avail.surfc -0.189 -0.034
abs.prs.pr1 -0.201   0.089   0.179
avl.srf:..1   0.100 -0.024 -0.322 -0.562

2.My second problem is at the interpretation (of the fixed effects of
any of the models, since they are quiet similar), since I have a binary
response variable. In principle, would I be allow to conclude from this
mode the following?:

a.that “c.census”, “avail.surface”, “abs.pres.prev1"have a positive and
significant effect on “abs.pres”? and if so, that the
“abs.pres.prev1"explains most of the variance in the data? If so, Isn’t
variance also partitioned among random variables? If so, how can I
differentiate that and calculate how much variance each fixed factor
explains, and how much is taking into account by the random factors?

b.about the interaction term, the only one that came out significant in
the model was: "available.surface*previous absence.presence" and is
negative. Could I say that one of the variables counteracts the positive
effect of the other one?

I know that this is a rather long email and that some of my questions
could be crossed referenced with other threads, but I think my dataset,
is somewhat different, because of the dimensionality of it. I
have looked at different threads which have helped me to understand
GLMMs and walk me through some of the steps. Any comments, help or answer
would be greatly appreciated.

Thanks in advance to anyone that would take their time to help me out,

Glenda Mendieta-Leiva
PhD student
Functional Ecology, University of Oldemburg
Smithsonian Tropical Ecology