[R-sig-ME] mixed effects model proportion data that are not 0 and 1's

Tue Oct 13 14:43:25 CEST 2015

Ok running this model:

y <- with(juv2, cbind(total females during, (total_counts_during - total
females during)))

 mod <- glmer(y ~ Proportion_infected + Month
+ (1 |Site/Month), family=binomial, weights = total_counts, data=d3)

gives the following warning:
 Model failed to converge: degenerate  Hessian with 1 negative eigenvalues

Overfit again?

On Tue, Oct 13, 2015 at 7:37 AM, M West <westm490 at gmail.com> wrote:

> Thanks all for these great answers - I really appreciate it!
>
> M
>
> On Mon, Oct 12, 2015 at 2:16 PM, Ben Bolker <bbolker at gmail.com> wrote:
>
>> On 15-10-12 01:42 PM, M West wrote:
>> > Hello,
>> >
>> > I am trying to decide the best approach for analyzing a short time
>> series.
>> >
>> > The goal is to see if there is a significant relationship between the
>> > percentage change in females (# females/total population during
>> epidemic -
>> > # females/total population pre-epidemic) and disease prevalence
>> (continuous
>> > variable: percentage infected).
>> >
>> > Samples are across 15 sites and 5 months. So a very short time series.
>> >
>> > Here's the model and weights that I've used thus far.
>> > v3 <- varComb(varIdent(form =~ 1 | Month) , varExp(form =~
>> > Proportion_infected))
>> > mod <- lme(Change_in_proportion_females ~ Proportion_infected + Month,
>> > random = ~ 1 |Site/Month, weights = v3)
>> >
>> > The main problems are:
>> > 1) If I plot the residuals vs. the fitted values, there is a strong
>> > relationship demonstrating that the variance increase with the mean.
>> How do
>> > I account for this in a mixed effects model? I've tried the weights
>> option
>> > (with a couple of variations), however, I receive an error message:
>> > "iteration limit reached without convergence"
>>
>>   weights= is the right way to go about it; we'd have to see more
>> details to know what the problem is in your particular case.
>>
>>  Your model looks like it might be overfitted: if you only have one
>> observation per Site/Month combination, then the Site-by-Month
>> interaction term (Site/Month expands to "Site plus Site-by-Month")
>> will be confounded with the residual variance.  I have found in the
>> past that lme may allow you to fit the overfitted model without
>> complaining; one way of diagnosing this problem is to try intervals() on
>> your fitted model (try it without the weights= argument for
>> simplicity) and see if you get warnings/errors.
>>
>> >
>> > 2) I also have a question about the way that R treats proportion data
>> that
>> > are not 0 and 1's when using glm (or glmer) with the binomial
>> distribution.
>> > Crawley, for example, suggests that you create a y variable where you
>> > account for the total number of observations (e.g., y <- cbind(total
>> > females, (total population - total females)) and then run a glm (or in
>> may
>> > case, a generalized linear mixed model) using the binomial distribution.
>> >
>> > But what is actually going on under the hood here? Is R running some
>> sort
>> > of proportion test?  All the examples that I have found for proportion
>> data
>> > or non-normal residuals suggest using a logistic regression approach
>> with
>> > the binomial distribution. Is this also the best option for proportion
>> data
>> > that are not 0 and 1's (i.e., my data on the percentage change would not
>> > match the typical logistic regression plot)?
>>
>>   Binomial and Bernoulli (binary) logistic regression are basically
>> equivalent; when you have multiple individuals observed for a particular
>> set of predictor variables (e.g. month/site), it's practical
>> and computationally efficient to collapse them to a binomial regression.
>> Under the hood there is indeed the equivalent of a proportion test --
>> not necessarily quite as accurate for small numbers of counts as some
>> classical proportion tests, but much more flexible.  In lme4 (the
>> successor to nlme, which implements generalized linear MMs that can
>> handle a binomial response), the 'weights' argument is simpler/can be
>> used to specify the denominator or total number of counts:
>>
>> library(lme4)
>>  mod <- glmer(Change_in_proportion_females ~ Proportion_infected + Month
>> + (1 |Site/Month),
>> family=binomial, weights = total_counts, data=...)
>>
>>  (here, including the site-by-month interaction is OK -- it represents
>> the possibility of extra-binomial variance (overdispersion).)
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
>

	[[alternative HTML version deleted]]