[R-sig-ME] Fitting a heteroscedastic glmm for binomial responses

Mon Jan 12 21:18:46 CET 2015

 I have data from the following experimental design: my observations are counts of the numbers of successes (K) out of corresponding number of trials (N), measured for two groups each comprised of I individuals, from T treatments, where in each such factor combination there are R replicates. Hence, altogether I have 2 * I * T * R K's and corresponding N's.
I'm interested in estimating the effects that group and treatment have on the dispersion/variance of the success probabilities (i.e., K/N), and I believe id needs to be specified as a random effect. Therefore I'm looking for an appropriate glmm in which in addition to modelling the expected value of the response the variance of the response is also modeled.
Clearly, the variance of a binomial success probability is affected by the number of trials and the underlying success probability (the higher the number of trials is and the more extreme the underlying success probability is (i.e., near 0 or 1), the lower the variance of the success probability), so that has to be accounted for.
Here's an R code for simulated example data:library(MASS)
set.seed(1)
I <- 20 # individuals in each group
G <- 2  # groups
T <- 3  # treatments
R <- 30 # replicates of each individual, in each group, in each treatment

groups <- letters[1:G]
ids <- c(sapply(groups, function(g){ paste(rep(g, I), 1:I, sep=".") }))
treatments <- paste(rep("t", T), 1:T, sep=".")
 # create random mean number of trials for each individual and 
 #  dispersion values to simulate trials from a negative binomial:
mean.trials <- rlnorm(length(ids), meanlog=10, sdlog=1)
thetas <- 10^6/mean.trials
 # create the underlying success probability for each individual:
p.vec <- runif(length(ids), min=0, max=1)
 # create a dispersion factor for each success probability, where the 
 #  individuals of group 2 have higher dispersion thus creating a group effect:
dispersion.vec <- c(runif(length(ids)/2, min=0, max=0.1),
                   runif(length(ids)/2, min=0, max=0.2))
 # create empty an data.frame:
data.df <- data.frame(id=rep(sapply(ids, function(i){ rep(i, R) }), T),
                     group=rep(sapply(groups, function(g){ rep(g, I*R) }), T),
                     treatment=c(sapply(treatments, 
                                        function(t){ rep(t, length(ids)*R) })),
                     N=rep(NA, length(ids)*T*R), 
                     K=rep(NA, length(ids)*T*R) )
 # fill N's and K's - trials and successes. Trials are assumed to be overdisperesed and hence N's are drawn from a negative binomial
for(i in 1:length(ids)){
  N <- rnegbin(T*R, mu=mean.trials[i], theta=thetas[i])
  probs <- runif(T*R, min=max((1-dispersion.vec[i])*p.vec[i],0),
                max=min((1+dispersion.vec)*p.vec[i],1))
  K <- rbinom(T*R, N, probs)
  data.df$N[which(as.character(data.df$id) == ids[i])] <- N
  data.df$K[which(as.character(data.df$id) == ids[i])] <- K
}Attached are the following plots (group a are in orange and group b are in black; treatment t.1 are in squares, treatment t.2 are in circles, and treatment t.3 are in triangle):1. Sample variance vs. sample mean of the estimated success probability (denoted as p hat = K/N), which illustrates that extreme success probabilities have lower variance.2. If I apply the arcsin square root transformation to K/N (denoted as asin(sqrt(p.hat))), this relationship is largely eliminated.3. Sample variance of asin(sqrt(p.hat))  vs. the mean N across replicates for each individual shows the expected negative relationship.4. Sample variance of asin(sqrt(p.hat)) for the two groups shows that group b has slightly higher variances, which is how I simulated the data.5. Sample variance of asin(sqrt(p.hat)) for the three treatments shows no difference between treatments, which is how I simulated the data.
Is there any form of a glmm with which I can quantify the group and treatment effects on the variance of the success probabilities? Perhaps a heteroscedastic glmm (i.e. which estimates the effect of regressors on the variance of the response)?
Help would be most appreciated.Thanks a lot,Dan
P.S.I also posted this on cross validated but it wasn't answered.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: var.p.hat.vs.mean.p.hat.png
Type: image/png
Size: 3919 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20150112/50f635d4/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: var.asin.sqrt.p.hat.vs.mean.asin.sqrt.p.hat.png
Type: image/png
Size: 4162 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20150112/50f635d4/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: var.asin.sqrt.p.hat.vs.mean.N.png
Type: image/png
Size: 4195 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20150112/50f635d4/attachment-0007.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: var.asin.sqrt.p.hat.vs.group.png
Type: image/png
Size: 3806 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20150112/50f635d4/attachment-0008.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: var.asin.sqrt.p.hat.vs.treatment.png
Type: image/png
Size: 3991 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20150112/50f635d4/attachment-0009.png>