[R-sig-ME] Poisson intercept-only multilevel model doesn't appear to return the correct population mean

Wed Mar 9 01:18:23 CET 2022

I have a data set consisting of counts of a particular plant species in
quadrats. The data
were collected using a two-stage sampling design, whereby 75 macroplots
were randomly positioned
in the area to be sampled and five quadrats (actually belt transects) were
sampled in each
macroplot. All of the macroplots are the same size, 25m x 25m, as are the
quadrats (each quadrat
is 0.25m x 25m).

The data file is available here:
https://www.dropbox.com/s/hpptsqap0oysje8/stratum3.csv?dl=0

There are five variables in the data file: plot_id is a unique number
assigned to each macroplot
(there are 75 unique numbers but these are not all consecutive);
belt_number is the number (1-5)
of each belt transect (quadrat) in each macroplot; counts contains the
number of individuals
counted in each belt, weights1 contains sampling weights for the macroplots
(stage 1) and weights2
contains sampling weights for the quadrats (stage 2).

The mean number of plants per quadrat for this dataset is 44.24. Several
covariates were measured
in each of the quadrats, but I'm ignoring these here and they are not
included in the data file.
Instead, I'd like to focus on the estimated population mean I get from
running several different
regressions below. Also, the data are overdispersed and a negative binomial
model seems appropriate,
but I use the Poisson here because it's available in the svyglm() function.

First I present the results from calculating the mean and standard error
from the survey package.
I set the following survey design:

cnt_design3 <- svydesign(weights = ~ weights1 + weights2,
                       id = ~plot_id + belt_number,
                       strata = ~stratum_num,
                       data = stratum3,
                       nest = TRUE)

(Sampling weights aren't really required here since the weights are equal
in each of the two stages.
We'd get the same results if we left them out.)

Running svymean() using this design I get:
svymean(~counts, design = cnt_design3)
       mean     SE
counts 44.24 9.8146

If I use glmmTMB and run an intercept-only multilevel model assuming
(incorrectly) a gaussian
distribution, I get:

glmm3 <- glmmTMB(counts ~ 1 + (1|plot_id), data = stratum3)
Conditional model:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   44.240      9.749   4.538 5.68e-06 ***

The mean is exactly the same as that from svymean() and the SE is very
close.

My question arises when I use a Poisson regression on the data set. Here I
focus only on the
estimated population mean and not the standard error since the standard
error can't be
correctly exponentiated.

If I use svyglm() from the survey package and a quasipoisson distribution I
get:

svyglm2 <- svyglm(formula = counts ~ 1, design = cnt_design3, family =
quasipoisson)
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   3.7896     0.2218   17.08   <2e-16 ***
> exp(coef(svyglm2))
(Intercept)
      44.24

This is the same mean as returned by the other analyses above.

But if I use glmmTMB to run a multilevel model with family = poisson I get
the following:
glmm3.2 <- glmmTMB(counts ~ 1 + (1|plot_id), data = stratum3,
                  family = poisson)
> fixef(glmm3.2)

Conditional model:
(Intercept)
      2.424
which when exponentiated:
> exp(fixef(glmm3.2)[[1]])
(Intercept)
   11.29252

So my question is why don't I get the same value as I get from the other
analyses, particularly
the svyglm() function using quasipoisson? I don't think it's an issue with
geometric vs arithmetic
means. If I ignore the two-stage design and pretend that all 375 quadrats
were randomly positioned
in the population and then run an intercept-only model with family =
poisson I get the correct mean
value of 44.24:

glmm3_no_random <- glmmTMB(counts ~ 1, data = stratum3,
                          family = poisson)
> fixef(glmm3_no_random)

Conditional model:
(Intercept)
       3.79
which when exponentiated:
> exp(fixef(glmm3_no_random)[[1]])
(Intercept)
      44.24

Why would including random intercepts for plot change the grand mean
(exponentiated) from
44.24 to 11.29? Note that this only happens with the multilevel Poisson
regression and not the
multilevel Gaussian. I suspect I'm missing something, but I'll be darned if
I can figure out what.

Thanks for any help you can provide.

John Willoughby

	[[alternative HTML version deleted]]