[R-sig-ME] Phylogenetic Logistic Regression for non-binary data: best practices and programs?

Thu May 31 22:31:24 CEST 2018

Hi John,

See https://psyarxiv.com/x8swp/ for a detailed introduction to ordinal
models. For your data I think the cumulative() family probably makes the
most sense among the ordinal families.

Please keep in mind that ordinal models do not "automatically" categorized
the response. You have to categorize it yourself.

Paul

2018-05-31 22:14 GMT+02:00 jonnations <jonnations using gmail.com>:

> Hi Paul,
>
> Thank you for the quick response! This is the exact kind of information I
> was hoping for. I had just heard of brms in passing, but after looking
> through the vignettes it seems like a good choice. I have been interested
> in STAN's algorithms but correctly scripting a phylogenetic glmm from
> scratch seemed daunting.
>
> Quick question concerning ordinal regression: I though - perhaps naively -
> that ordinal models always "categorize" data and fit separate "slopes" for
> each category. My ultimate goal is to use the model to predict a response
> value from an explanatory variable for species lacking habitat (response)
> data. I had anticipated a posterior distribution of a continuous response
> variable for each "newdata" value in predict().
>
> I see that there are several additional ordinal families in brms that I am
> unfamiliar with. Perhaps one of these would be best for predicting a
> (continuous?) response value, or maybe it is just "better" to predict
> membership in an apriori discrete category.
>
> Your thoughts would be greatly appreciated. Thanks again for the help!
>
> Jon
>
>
> On Thu, May 31, 2018 at 10:19 AM, Paul Buerkner <paul.buerkner using gmail.com>
> wrote:
>
>> Hi Jon,
>>
>> a few thoughts about your response variable first.
>>
>> When dealing with proportions (values between 0 and 1) the beta
>> distribution is what is usually being used. However, the beta distribution
>> cannot handle observations at the boundary (i.e. y = 0 or 1).
>>
>> That's obviously a problem for your data. We have multiple options to
>> deal with that:
>>
>> We can use a zero-one-inflated-beta distribution which models the data as
>> three separate processes (0, 1, and everything in between).
>>
>> Alternatively, and probably something I would prefer for your data, one
>> could model the data using an ordinal distribution. This will require the
>> values between 0 and 1 to be brokenup into (not too many) discrete
>> categories,
>> which will lead to some information loss but at least is more informative
>> than a simple 0 1 treatement as in logistic regression.
>>
>> You can fit all of these models above in combination with phylogenetic
>> structures using the brms R package (some are available in MCMCglmm as
>> well). Type vignette("brms_phylogenetics") in R for more details.
>>
>> Best,
>> Paul
>>
>> 2018-05-31 16:42 GMT+02:00 jonnations <jonnations using gmail.com>:
>>
>>> Hi Listserv,
>>>
>>> I am new to this type of work and have tried to make this as clear as
>>> possible.
>>>
>>> I am working on a project that models habitat use (y = ground(0) vs.
>>> tree(1)) and body size (x = body size, continuous). My y variables are
>>> from
>>> the formula:
>>>
>>>  y=((tree captures / tree effort)) / (tree captures / tree effort) +
>>> (ground captures / ground effort)
>>>
>>> which should provide a ratio of captures in a given habitat while
>>> accounting for effort. My y values are mostly binary, but some species'
>>> values are between 0 and 1. The data look like this example:
>>>
>>> y = c(0, 0, 0, 0, 0, 0, 0.25, 0.4, 0.6, 0.9, 0.9, 1, 1, 1, 1)
>>>
>>> My goal for the model is to use the species with known habitat "scores"
>>> to
>>> predict the habitat value (y) of species from their body size value (x).
>>>
>>> There are 2 "random" effects in the model, the relatedness of the species
>>> (the phylogeny, Rp) and the intraspecific variation of the x measurement
>>> (Rs). These are both very important as my 150 data points are distributed
>>> between 22 species.
>>>
>>> Using logistic regression, the model takes the form: logit (Pr ( Y = 1 ))
>>> =  a +  Bx + Rp + Rs +  e
>>>
>>> I have two questions for the group. First, is it appropriate to use
>>> logistic regression (or a logit link) on these kinds of non-binary y
>>> values? I have found several examples online of logistic regression with
>>> non-binary variables (links below) but I have not found a publication
>>> with
>>> a study design like mine.
>>>
>>> Second, any suggestions of programs for setting up the model? I am
>>> interested in using a bayesian glmm method (MCMCglmm, jags, etc.),
>>> however
>>> I am worried that the programs will view these data as non-binary and
>>> either insist on an ordinal regression (not what I am doing) or otherwise
>>> provide categorical groupings on the response variable and produce
>>> strange
>>> results. Can any glmm program handle my Rp, Rs, and the non-binary nature
>>> of the y variables?
>>>
>>> I hope this is clear. Any suggestions will be greatly appreciated! Thanks
>>> for your help and patience.
>>>
>>> Best,
>>> Jon
>>>
>>> Links mentioned above:
>>> https://stats.stackexchange.com/questions/33562/choose-best-
>>> model-between-logit-probit-and-nls?rq=1
>>> https://stats.stackexchange.com/questions/69886/using-logist
>>> ic-regression-for-a-continuous-dependent-variable?rq=1
>>> --
>>> Jonathan A. Nations
>>> PhD Candidate
>>> Esselstyn Lab
>>> Museum of Natural Sciences
>>> Louisiana State University
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>
>>
>
>
> --
> Jonathan A. Nations
> PhD Candidate
> Esselstyn Lab <http://www.museum.lsu.edu/esselstyn>
> Museum of Natural Sciences <http://sites01.lsu.edu/wp/mns>
> Louisiana State University
>
>

	[[alternative HTML version deleted]]