[R-sig-ME] A mixed effects model for variances of binomialprobability as a responsee

Ramon Diaz-Uriarte rdiaz02 at gmail.com
Thu Mar 27 09:39:09 CET 2014


Dear Nimrod,

Just my 2 cents (even less than 2):

On Thu, 01-01-1970, at 01:00, nimrod.rubinstein <nimrod.rubinstein at gmail.com> wrote:
> Hi all,
>
> I have the following data:
>
> 50 subjects that belong to 2 groups (25 in each group). For each subject I
> have replicated measurements (in the range of 15-30 replicates per subject)
> of the number of trails as well as the number successes out of these
> trails. Each single trial in each single replicate for each subject is
> assumed to be a Bernoulli experiment, therefore the number of successes in
> each replicate is assumed to be a binomially distributed random variable.
> To complicate things even more, for each subject from each group I have not
> 1 but 3 series of these replicated measurements, from 3 different
> environments (the 3 environments are the same for all subjects).
>
> A few example lines of how my data looks like:
> subject group environment     number.trails  number.successes
>
>     S1    G1          E1 10100,13209,17621  8500,11622,14684
>
>     S1    G1          E2 13322,12225,10024  10351,10222,7951
>
>     S1    G1          E3  9812,13500,13214  7245,10622,10684
>
>     .
>
>     .
>
>     .
>
>     S50    G2          E1 60547,75345,68412 12014,13214,15244
>
>     S50    G2          E2 50897,62541,61247 10141,13251,30645
>
>     S50    G2          E3 59214,84264,74156 17245,16425,11123
>
>
> Where the comma separated number of successes correspond to the comma
> separated number of trials in each replicate.
>
> I want to test whether the variances of the success probabilities of
> subjects from group G1 are significantly smaller than the variances of the
> success probabilities of subjects from group G2.
>

But, as you say in point 1. below (and in another reply), if this is really
binomial, then you already know that if probabilities in G1 are different
from probabilities in G2 your variances will differ.

So I am not sure I even understand this hypothesis/question.


> In other words, for each subject in each replicate a success probability
> can be estimated from the observed data. Then the variance of those success
> probabilities over all replicates from a specific environment can be
> estimated. I would like to know whether these variances are smaller in
> subjects from G1 vs. subjects from G2.
>
> The issues are:
>
>
>    1. The variance of a success probability depends on the success
>    probability itself (natural to binomially distributed data). I need to
>    account for this effect.
>    2. The number of trails (although at least in the hundreds for all
>    subjects) varies several orders of magnitude among subjects. That by itself
>    affects the accuracy of estimating the success probability, but in
>    addition, I cannot rule out an inherent dependency between the number of
>    observed trials and the success probability (there is no control over the
>    number of trials in the experiment - it is observed). In other words,
>    having a lower number of trails by itself may affect the success
>    probability.
>

That last issue seems very problematic to me. If I understand correctly, we
could think of a perverse situation where, say, you can never observe low
success probabilities if those are related to number of trials around 0.


Can you do a simple plot of probability of success (or its logit) vs number
of trials, to rule out perverse scenarios? Is probability of success always
larger than 0? And number of trials always sufficiently different from 0?


And you say "varies by subject", so the number of trials is always the same
for a subject? And the potential relationship prob <-> number of trials is
due to among-subject variation? So you could approach the problem via a
random effect for subject? If you compare G1 and G2 for number of trials:
are there differences?





>
> I though a simple way to test my hypothesis is to compute the success
> probability in each replicate as number.successes/number.trails, and then
> the sample mean and sample variance of these success probabilities over all
> replicates, as well as the mean number of trials over all replicates (added
> to my data as sample.mean, sample.var and mean.trials fields). Then I can
> use the lmer function in the lme4 package this way:
>
> null.model = lmer(sample.var ~ sample.mean + mean.trials + (1|environment),
> my.data, REML = F)
> alt.model = lmer(sample.var ~ sample.mean + mean.trials + group +
> (1|environment), my.data, REML = F)
>
> So fixed effects are: sample.mean, mean.trials and group, and random
> effects is environment since the three environments in my data are simply a
> random sample from many other possible environments.


I do not see how this would answer your questions, given the above
concerns/issues and other possible approaches (just compare prob. between
G1 and G2, accounting for environment).


Best,


R.


>
> To test whether group has a significant effect I'll do:
>
> anova(null.model, alt.model).
>
> My concerns are:
>
>
>    1. Should I add subject as a random effect? My intuition is not to since
>    subjects are unique to groups, which is the effect I'm after. However, I do
>    want to account for the fact that sample variances of the same subject from
>    the three different environments are correlated.
>    2. The response - sample variances, cannot be assumed to follow a normal
>    distribution (they only take positive values to begin with and as far as I
>    know (n-1)*sample.variance/population.variance ~ chi.square with df = n-1)
>    and therefore the assumptions of a linear mixed effects model do not
>    strictly hold. Is there a generalized linear model that would be
>    appropriate for this?
>
> Thanks a lot,
> nimrod
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina
Universidad Autónoma de Madrid 
Arzobispo Morcillo, 4
28029 Madrid
Spain

Phone: +34-91-497-2412

Email: rdiaz02 at gmail.com
       ramon.diaz at iib.uam.es

http://ligarto.org/rdiaz



More information about the R-sig-mixed-models mailing list