[R-sig-ME] Is it ok to use lmer() for an ordered categorical (5 levels) response variable?

Wed Mar 6 15:11:17 CET 2019

Hello Nicolas,

Each of your individual item is a Bernoulli variable. So the
background seems in favor of a binomial law if you can assume

 - that probability of each of the 5 items of the score is the same
   (that is, p(« having ivy ») = p(« having nettles ») = ...

 - that these events are independant.

If both are true, then the total score should be exactly a binomial
variable. Otherwise, it's not clear that the result should be a
binomial. But it may be worth trying the results it gives!

On Wed, Mar 06, 2019 at 03:00:32PM +0100, Nicolas Deguines wrote:
« Hello Phillip and all,
« 
« Thanks a lot Phillip for your very interesting and useful answer, and for
« the paper from Liddell & Kruschke. It helps a lot.
« 
« About trying other link and threshold functions in clmm: no huge difference
« in my case unfortunately. I tried different combinations of each.
« 'equidistant' did do better, but the improvement was far from enough.
« 
« I computed density plots for my response variable as observed and as
« predicted from my lmer() model (similar to what Liddell and Kruschke do in
« Figure 6): the linear mixed-model does pretty well in fitting the data.
« => so I'd be enclined to trust the results from my lmer models in the
« present case (but Liddell and Kruschke did show very clear cases when a
« linear model fit very poorly the ordinal data).
« 
« Meanwhile, I thought of another alternative for analyzing this response
« variable and I would be curious to read what people may think about it.
« Before presenting that alternative, I need to say more about that 5-levels
« response variable.
« It is a score built by Muratet and Fontaine (2015)* to assess the
« naturalness of a given private backyard (it is shown to be correlated with
« higher abundance of butterflies).
« In the backyard: fallow area, nettles (*Urtica dioica*), ivy (*Hedera helix*),
« and brambles (*Rubus spp.*) are each scored one if present, and the
« naturalness index was computed as the sum of these scores.
« => it results in a 5-levels ordinal variable because it can go from 0 to 4,
« and each increase in 1 means a backyard with more features of 'naturalness'.
« I wonder thus if this could be modelled using a glmer() with family =
« binomial and feeding to the model two columns: cbind(sum of 1's, sum of
« 0's) (see R documentation for family{stats}, in the Details: "*As a
« two-column integer matrix: the first column gives the number of successes
« and the second the number of failures.*")
« I will try and see how the model fit the data. But I would be interested in
« getting a theoretical opinion.
« 
« I hope this can help others too
« 
« Best regards,
« Nicolas Deguines
« 
« *
« https://www.sciencedirect.com/science/article/abs/pii/S0006320714004704?via%3Dihub
« 
« ----------------------------------
« Postdoctoral Research Associate
« Laboratoire Ecologie, Systématique et Evolution
« Université Paris Sud, Orsay, France
« Website: http://nicolasdeguines.weebly.com/
« 
« 
« On Tue, 5 Mar 2019 at 13:04, Phillip Alday <phillip.alday using mpi.nl> wrote:
« 
« > Hi Nicolas,
« >
« > How much you can get away bending the assumptions depends in some ways
« > on how well the resulting model fits your data. If the resulting model
« > is a poor fit, then it's not a great model for performing inference. The
« > other problem with bending assumptions is that a lot of 'error
« > statistics' (standard errors, t-values, and basically anything related
« > to significance testings) aren't guaranteed to do what they are supposed
« > to do. (In your case, the good behavior of your residuals suggests that
« > this won't be a huge problem, but there are no promises.)
« >
« > You can get around this a bit by doing things like cross-validation or
« > other inferential steps based on how well the model generalizes to /
« > predicts new data instead of significance testing of coefficients or
« > linear hypotheses.
« >
« > John Kruschke has written about this issue at some length and seems
« > convinced that it's (almost) always a bad idea to bend the
« > metric/continuous assumption when dealing with ordinal data:
« >
« >
« > http://doingbayesiandataanalysis.blogspot.com/2017/12/which-movie-is-rated-better-dont-treat.html
« >
« >
« > http://doingbayesiandataanalysis.blogspot.com/2018/09/analyzing-ordinal-data-with-metric.html
« >
« > The latter is largely a link/"press release" for the associated paper:
« >
« > Liddell, T. M., & Kruschke, J. K. (2018). Analyzing ordinal data with
« > metric models: What could possibly go wrong? Journal of Experimental
« > Social Psychology , 79 , 328–348. doi:10.1016/j.jesp.2018.08.009
« >
« > Finally, have you tried other link and threshold functions in clmm?
« > Those can make a huge difference!
« >
« > Phillip
« >
« > On 5/3/19 11:00 am, Nicolas Deguines wrote:
« > > Hello everyone,
« > >
« > > I am investigating how engagement into a citizen science program can
« > change
« > > participants' behavior in terms of implementing gardening techniques
« > > benefitting biodiversity.
« > > There are 2362 participants, distributed into 7 cohorts (= year in which
« > > they joined the program), and I have repeated gardening technique
« > > information for multiple years for each participant.
« > > So I need to use mixed modeling.
« > >
« > > One of the response variable is a score that can takes 5 values: 0, 1, 2,
« > > 3, or 4. It's ordered, it's not continuous (there are 5 levels).
« > > I would analyze this into a cumulative link mixed models (using clmm()
« > from
« > > ordinal package) but the Hessian condition I obtained with such model is
« > >
« > > 5.0e+06. I.e. assumption is violated (simplifying my initial full model
« > did
« > > not help at all).
« > >
« > > As an alternative, I am wondering if I could treat this response variable
« > > has a continuous one into a lmer() model.
« > > When I do:
« > > - Normality of model residuals is nicely met
« > > - Homoscedasticity of model residuals is met as well.
« > > => does meeting these two assumptions is enough to validate the use of a
« > > lmer() model for an ordered categorical response variable?
« > >
« > > In one of Douglas Bates' presentation (slide 3 of Jan. 2011, Madison:
« > > http://lme4.r-forge.r-project.org/slides/2011-01-11-Madison/5GLMM.pdf),
« > it
« > > is said that
« > > "When using LMMs we assume that the response being modeled is on a
« > > continuous scale.
« > > Sometimes we can bend this assumption a bit if the response is an ordinal
« > > response with a moderate to large number of levels.
« > > For example, [...a response variable taking] integer values on the scale
« > of
« > > 1 to 10."
« > > => is 5 levels too few to be treated as continuous? Or would it be ok
« > given
« > > residuals behave nicely?
« > >
« > > I would appreciate any help and thoughts on this.
« > > I checked that this was not treated in a previous post and I hope I did
« > not
« > > miss it (sorry if I did).
« > >
« > > Best,
« > > Nicolas Deguines
« > > ----------------------------------
« > > Postdoctoral Research Associate
« > > Laboratoire Ecologie, Systématique et Evolution
« > > Université Paris Sud, Orsay, France
« > > Website: http://nicolasdeguines.weebly.com/
« > >
« > >       [[alternative HTML version deleted]]
« > >
« > > _______________________________________________
« > > R-sig-mixed-models using r-project.org mailing list
« > > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
« > >
« >
« 
« 	[[alternative HTML version deleted]]
« 
« _______________________________________________
« R-sig-mixed-models using r-project.org mailing list
« https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
                                Emmanuel CURIS
                                emmanuel.curis using parisdescartes.fr

Page WWW: http://emmanuel.curis.online.fr/index.html