[R-sig-eco] mixed model with proportion data

Wed Mar 8 16:49:34 CET 2017

Mariano:  Just as a follow up on Phil Dixon's comment that is I think spot
on, you probably are better off modeling the response as the logit of the
proportions.  But to more easily deal with true zeros or ones, and to avoid
the back-transformation bias associated with means on nonlinear
transformations like the logit, you might want to consider estimating your
models with logistic quantile regression (see Bottai et al. 2010.
Statistics in Medicine 29: 309-317) rather than some mean regression model.
This is easily done with a fixed-effects model from the quantreg package.
There also are mixed-effects variants of quantile regression but I've not
tried to use them in the logistic quantile framework.  Some other poster
suggested beta regression, which also might be reasonable.  In my
experience, the logistic quantile regression model has greater flexibility
to handle true zeros and ones and odd dispersion patterns than beta
regression.  And of course, you can back-transform the quantile estimates
in the logit scale to the proportion scale without bias.

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  cadeb at usgs.gov <brian_cade at usgs.gov>
tel:  970 226-9326

On Wed, Mar 8, 2017 at 6:20 AM, Dixon, Philip M [STAT] <pdixon at iastate.edu>
wrote:

> Mariano,
>
> There is a huge and important difference between the two approaches
> suggested for your data.  The log ratio of proportions (i.e. the empirical
> logit of the Yes proportion) estimates the residual variance.  The binomial
> model assumes the residual variance is determined by the arbitrary (and
> made-up) sample size of 20 "tries" per response, in combination with the
> estimated mean proportions.  To see the arbitrariness, if you don't
> already, re-express your proportions out of 200, instead of 20, because
> 0/200, 10/200, ... 200/200 also give your observed responses.  The
> coefficient estimates will be the approximately same but their variances
> will not.  (If you didn't have additional random effects in the model, the
> coefficient estimates would be exactly the same but the variances would be
> 1/10's those from N=20).
>
> If you are going to use the binomial GLM, I believe you must add
> overdispersion to the model.  Either as an individual random effect, or by
> using a quasibinomial response distribution.  Overdispersion is not
> necessary for the log proportion response because the residual error
> variance conceptually estimates that overdispersion.
>
> Philip
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
>

	[[alternative HTML version deleted]]