[R-sig-ME] Poisson intercept-only multilevel model doesn't appear to return the correct population mean

Wed Mar 9 15:15:13 CET 2022

This happens precisely because there is a random effect in the model 'glmm3.2'. We have a model that basically says that log Y is normally distributed with an estimate mean of 2.424. That implies that Y is lognormal, and therefore 

    E[Y] = exp(mu + sigma^2/2)

If you use VarCorr(glmm3.2) to get the estimate of sigma^2, and add half of that before you exponentiate, you'll get a lot closer to the estimate you expect.

Russ Lenth

-----Original Message-----

Date: Tue, 8 Mar 2022 16:18:23 -0800
From: John Willoughby <johnwillec using gmail.com>
To: r-sig-mixed-models using r-project.org
Subject: [R-sig-ME] Poisson intercept-only multilevel model doesn't
	appear to return the correct population mean
Message-ID:
	<CAKk2L3J7JMZ+N2bNGpZfKZ-Rp+cFoEYU++XnJN6=WZeQMTdXYQ using mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

I have a data set consisting of counts of a particular plant species in quadrats. The data were collected using a two-stage sampling design, whereby 75 macroplots were randomly positioned in the area to be sampled and five quadrats (actually belt transects) were sampled in each macroplot. All of the macroplots are the same size, 25m x 25m, as are the quadrats (each quadrat is 0.25m x 25m).

The data file is available here:
https://www.dropbox.com/s/hpptsqap0oysje8/stratum3.csv?dl=0

There are five variables in the data file: plot_id is a unique number assigned to each macroplot (there are 75 unique numbers but these are not all consecutive); belt_number is the number (1-5) of each belt transect (quadrat) in each macroplot; counts contains the number of individuals counted in each belt, weights1 contains sampling weights for the macroplots (stage 1) and weights2 contains sampling weights for the quadrats (stage 2).

The mean number of plants per quadrat for this dataset is 44.24. Several covariates were measured in each of the quadrats, but I'm ignoring these here and they are not included in the data file.
Instead, I'd like to focus on the estimated population mean I get from running several different regressions below. Also, the data are overdispersed and a negative binomial model seems appropriate, but I use the Poisson here because it's available in the svyglm() function.

First I present the results from calculating the mean and standard error from the survey package.
I set the following survey design:

cnt_design3 <- svydesign(weights = ~ weights1 + weights2,
                       id = ~plot_id + belt_number,
                       strata = ~stratum_num,
                       data = stratum3,
                       nest = TRUE)

(Sampling weights aren't really required here since the weights are equal in each of the two stages.
We'd get the same results if we left them out.)

Running svymean() using this design I get:
svymean(~counts, design = cnt_design3)
       mean     SE
counts 44.24 9.8146

If I use glmmTMB and run an intercept-only multilevel model assuming
(incorrectly) a gaussian
distribution, I get:

glmm3 <- glmmTMB(counts ~ 1 + (1|plot_id), data = stratum3) Conditional model:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   44.240      9.749   4.538 5.68e-06 ***

The mean is exactly the same as that from svymean() and the SE is very close.

My question arises when I use a Poisson regression on the data set. Here I focus only on the estimated population mean and not the standard error since the standard error can't be correctly exponentiated.

If I use svyglm() from the survey package and a quasipoisson distribution I
get:

svyglm2 <- svyglm(formula = counts ~ 1, design = cnt_design3, family =
quasipoisson)
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   3.7896     0.2218   17.08   <2e-16 ***
> exp(coef(svyglm2))
(Intercept)
      44.24

This is the same mean as returned by the other analyses above.

But if I use glmmTMB to run a multilevel model with family = poisson I get the following:
glmm3.2 <- glmmTMB(counts ~ 1 + (1|plot_id), data = stratum3,
                  family = poisson)
> fixef(glmm3.2)

Conditional model:
(Intercept)
      2.424
which when exponentiated:
> exp(fixef(glmm3.2)[[1]])
(Intercept)
   11.29252

So my question is why don't I get the same value as I get from the other analyses, particularly the svyglm() function using quasipoisson? I don't think it's an issue with geometric vs arithmetic means. If I ignore the two-stage design and pretend that all 375 quadrats were randomly positioned in the population and then run an intercept-only model with family = poisson I get the correct mean value of 44.24:

glmm3_no_random <- glmmTMB(counts ~ 1, data = stratum3,
                          family = poisson)
> fixef(glmm3_no_random)

Conditional model:
(Intercept)
       3.79
which when exponentiated:
> exp(fixef(glmm3_no_random)[[1]])
(Intercept)
      44.24

Why would including random intercepts for plot change the grand mean
(exponentiated) from
44.24 to 11.29? Note that this only happens with the multilevel Poisson regression and not the multilevel Gaussian. I suspect I'm missing something, but I'll be darned if I can figure out what.

Thanks for any help you can provide.

John Willoughby