[R-sig-ME] Question on glmer.nb in power simulation

Tue Apr 14 23:13:01 CEST 2020

Hi all,

I would like to ask for solution or advice on strange result that glmer.nb from lme4 generated when simulating using simR package. I’m working on longitudinal gut microbiome abundance data (23 patients, 2 cases (equal to time points), 482 bacteria), and they’re all count data. Here I will focus on only 2 bacteria among them, bacteria A and B. 

Bacteria_A <- structure(list(Individual = c(rep(c(26, 64, 1, 35, 33, 30, 3, 24, 55, 46, 39, 34, 16, 49, 61, 52, 28, 65, 62, 68, 74, 37, 67), each = 2)), Case = c(3, 2, 3, 2, 2, 3, 3, 2, 3, 2, 2, 3, 2, 3, 3, 2, 3, 2, 3, 2, 3, 2, 2, 3, 2, 3, 2, 3, 2, 3, 3, 2, 3, 2, 3, 2, 2, 3, 2, 3, 2, 3, 2, 3, 3, 2), Abundance_value = c(18, 4, 10, 2, 0, 0, 0, 0, 16, 1, 0, 0, 4, 16, 10, 18, 0, 0, 8, 7, 35, 16, 2, 22, 1, 6, 16, 9, 7, 12, 38, 32, 22, 4, 17, 13, 19, 20, 0, 6, 7, 13, 1, 22, 0, 0)),  class = "data.frame", row.names = c(NA, 46L), .Names = c("Individual", "Case", "Abundance_value"))

Bacteria_B <- structure(list(Individual = c(rep(c(26, 64, 1, 35, 33, 30, 3, 24, 55, 46, 39, 34, 16, 49, 61, 52, 28, 65, 62, 68, 74, 37, 67), each = 2)), Case = c(3, 2, 3, 2, 2, 3, 3, 2, 3, 2, 2, 3, 2, 3, 3, 2, 3, 2, 3, 2, 3, 2, 2, 3, 2, 3, 2, 3, 2, 3, 3, 2, 3, 2, 3, 2, 2, 3, 2, 3, 2, 3, 2, 3, 3, 2), Abundance_value = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)),  class = "data.frame", row.names = c(NA, 46L), .Names = c("Individual", "Case", "Abundance_value"))

I tried both lmer and glmer.nb, and the code is as below.

lmer(formula = rank(Abundance_value) ~ Case + (1| Individual)) # I use rank() here because abundance values aren’t normal-distributed
glmer.nb(formula = Abundance_value ~ Case + (1| Individual)) # I also tried negative binomial regression
car::Anova(model, type=c("II"),  test.statistic=c("Chisq”)) # Followed by Anova to get p value of “Case"

I found that compared to using lmer, glmer.nb gave much lower p values, and thus resulting in more significance of bacterial abundance changing over time. However, among the bacteria that showed significance, there were many having really low effect size (paired Cliff’s delta) between 2 cases. I’m wondering if glmer.nb is so sensitive that it can detect such low effect size changes. I did simulation by using the powerCurve function from simR package to check the power of glmer.nb.

library(simR)
m_NB <- glmer.nb(formula = Abundance_value ~ Case + (1| Individual), REML = F)  # Fit the negative-binomial regression model
m_NB_ext <- extend(m_NB, n=1000, along = "Individual”)    # Use the “extend" function in simR to increases sample size
powerCurve(m_NB_ext, along = "Individual", nsim = 1000, alpha = 0.1/N,  test = simr::fixed("Case", method = "chisq”)) # Plot the simulated power at different sample size. The tested target here is whether “Case” (which is time point) can explain the variation of abundance value.

I plotted simulated power curve for different bacteria with Cliff’s delta between 2 cases being 0.609 (bacteria A) and 0.0435 (bacteria B). The power curve of 0.609 is smooth and looks normal, but the curve of 0.0435 looks really strange. It rose fast at the beginning, peaked at n=50, and then dropped all along the way to n = 600.  

Bacteria A power curve: https://drive.google.com/open?id=1B95_2WrNiSL4w9C2O315OXLKr4o_uq3c <https://drive.google.com/open?id=1B95_2WrNiSL4w9C2O315OXLKr4o_uq3c>
Bacteria B power curve:https://drive.google.com/open?id=1dSN0TA4-BxQVNZdWKsT5LT3VSMBlO6xg <https://drive.google.com/open?id=1dSN0TA4-BxQVNZdWKsT5LT3VSMBlO6xg>

The power curve of bacteria B is really weird; nevertheless, it did reflect the truth that I got many low-effect size yet significant bacteria at n = 23 in my data set. I thought of some possible reasons and would like to ask for your  advices.

1. It is due to sparsity. Most of my data are zero-inflated, and for bacteria A there were only 24% that the abundance equals to zero, while for bacteria B, there were 66%. Could the weird power curve being resulted from the high extent of zero-inflation? Is there any assumptions of data distribution for negative binomial regression using glmer.nb (or glmmTMB) that I wasn’t aware of? 
2. Maybe there is something wrong with my simR syntax?

Thank you very much for reading through this mail. Any advices or comments are appreciated! Thank you in advance.

Best regards,
Chia-Yu Chen
	[[alternative HTML version deleted]]