[R-sig-ME] Phylogenetic Logistic Regression for non-binary data: best practices and programs?

Thu May 31 17:19:33 CEST 2018

Hi Jon,

a few thoughts about your response variable first.

When dealing with proportions (values between 0 and 1) the beta
distribution is what is usually being used. However, the beta distribution
cannot handle observations at the boundary (i.e. y = 0 or 1).

That's obviously a problem for your data. We have multiple options to deal
with that:

We can use a zero-one-inflated-beta distribution which models the data as
three separate processes (0, 1, and everything in between).

Alternatively, and probably something I would prefer for your data, one
could model the data using an ordinal distribution. This will require the
values between 0 and 1 to be brokenup into (not too many) discrete
categories,
which will lead to some information loss but at least is more informative
than a simple 0 1 treatement as in logistic regression.

You can fit all of these models above in combination with phylogenetic
structures using the brms R package (some are available in MCMCglmm as
well). Type vignette("brms_phylogenetics") in R for more details.

Best,
Paul

2018-05-31 16:42 GMT+02:00 jonnations <jonnations using gmail.com>:

> Hi Listserv,
>
> I am new to this type of work and have tried to make this as clear as
> possible.
>
> I am working on a project that models habitat use (y = ground(0) vs.
> tree(1)) and body size (x = body size, continuous). My y variables are from
> the formula:
>
>  y=((tree captures / tree effort)) / (tree captures / tree effort) +
> (ground captures / ground effort)
>
> which should provide a ratio of captures in a given habitat while
> accounting for effort. My y values are mostly binary, but some species'
> values are between 0 and 1. The data look like this example:
>
> y = c(0, 0, 0, 0, 0, 0, 0.25, 0.4, 0.6, 0.9, 0.9, 1, 1, 1, 1)
>
> My goal for the model is to use the species with known habitat "scores" to
> predict the habitat value (y) of species from their body size value (x).
>
> There are 2 "random" effects in the model, the relatedness of the species
> (the phylogeny, Rp) and the intraspecific variation of the x measurement
> (Rs). These are both very important as my 150 data points are distributed
> between 22 species.
>
> Using logistic regression, the model takes the form: logit (Pr ( Y = 1 ))
> =  a +  Bx + Rp + Rs +  e
>
> I have two questions for the group. First, is it appropriate to use
> logistic regression (or a logit link) on these kinds of non-binary y
> values? I have found several examples online of logistic regression with
> non-binary variables (links below) but I have not found a publication with
> a study design like mine.
>
> Second, any suggestions of programs for setting up the model? I am
> interested in using a bayesian glmm method (MCMCglmm, jags, etc.), however
> I am worried that the programs will view these data as non-binary and
> either insist on an ordinal regression (not what I am doing) or otherwise
> provide categorical groupings on the response variable and produce strange
> results. Can any glmm program handle my Rp, Rs, and the non-binary nature
> of the y variables?
>
> I hope this is clear. Any suggestions will be greatly appreciated! Thanks
> for your help and patience.
>
> Best,
> Jon
>
> Links mentioned above:
> https://stats.stackexchange.com/questions/33562/choose-
> best-model-between-logit-probit-and-nls?rq=1
> https://stats.stackexchange.com/questions/69886/using-
> logistic-regression-for-a-continuous-dependent-variable?rq=1
> --
> Jonathan A. Nations
> PhD Candidate
> Esselstyn Lab
> Museum of Natural Sciences
> Louisiana State University
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]