[R-sig-ME] Is it ok to use lmer() for an ordered categorical (5 levels) response variable?

D. Rizopoulos d@r|zopou|o@ @end|ng |rom er@@mu@mc@n|
Wed Mar 6 15:11:43 CET 2019


Hi Nicolas,

You could instead use a continuation ratio model that entails a couple 
of data management steps to transform your ordinal response variable 
into a binary one, and fit it with a mixed effects logistic regression. 
In addition, under this is model it is straightforward to assess/relax 
the ordinality assumption (i.e., that the effect of the predictors is 
the same for levels of the ordinal response).

You can find an example using the GLMMadaptive package here: 
https://drizopoulos.github.io/GLMMadaptive/articles/Ordinal_Mixed_Models.html

Best,
Dimitris


On 3/6/2019 3:00 PM, Nicolas Deguines wrote:
> Hello Phillip and all,
> 
> Thanks a lot Phillip for your very interesting and useful answer, and for
> the paper from Liddell & Kruschke. It helps a lot.
> 
> About trying other link and threshold functions in clmm: no huge difference
> in my case unfortunately. I tried different combinations of each.
> 'equidistant' did do better, but the improvement was far from enough.
> 
> I computed density plots for my response variable as observed and as
> predicted from my lmer() model (similar to what Liddell and Kruschke do in
> Figure 6): the linear mixed-model does pretty well in fitting the data.
> => so I'd be enclined to trust the results from my lmer models in the
> present case (but Liddell and Kruschke did show very clear cases when a
> linear model fit very poorly the ordinal data).
> 
> Meanwhile, I thought of another alternative for analyzing this response
> variable and I would be curious to read what people may think about it.
> Before presenting that alternative, I need to say more about that 5-levels
> response variable.
> It is a score built by Muratet and Fontaine (2015)* to assess the
> naturalness of a given private backyard (it is shown to be correlated with
> higher abundance of butterflies).
> In the backyard: fallow area, nettles (*Urtica dioica*), ivy (*Hedera helix*),
> and brambles (*Rubus spp.*) are each scored one if present, and the
> naturalness index was computed as the sum of these scores.
> => it results in a 5-levels ordinal variable because it can go from 0 to 4,
> and each increase in 1 means a backyard with more features of 'naturalness'.
> I wonder thus if this could be modelled using a glmer() with family =
> binomial and feeding to the model two columns: cbind(sum of 1's, sum of
> 0's) (see R documentation for family{stats}, in the Details: "*As a
> two-column integer matrix: the first column gives the number of successes
> and the second the number of failures.*")
> I will try and see how the model fit the data. But I would be interested in
> getting a theoretical opinion.
> 
> I hope this can help others too
> 
> Best regards,
> Nicolas Deguines
> 
> *
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.sciencedirect.com%2Fscience%2Farticle%2Fabs%2Fpii%2FS0006320714004704%3Fvia%253Dihub&data=02%7C01%7Cd.rizopoulos%40erasmusmc.nl%7C6be346dd1ef044242f7008d6a23c415f%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C1%7C636874777051485248&sdata=n7z7lU50tmKYG6goqLHnw9HrMSsIXfTPYvSO8XYE0yM%3D&reserved=0
> 
> ----------------------------------
> Postdoctoral Research Associate
> Laboratoire Ecologie, Systématique et Evolution
> Université Paris Sud, Orsay, France
> Website: https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fnicolasdeguines.weebly.com%2F&data=02%7C01%7Cd.rizopoulos%40erasmusmc.nl%7C6be346dd1ef044242f7008d6a23c415f%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C1%7C636874777051485248&sdata=tT40Zd6AKGsntDjw2cJ1j8lXnpIsF2ILVQIkoQQ9NtA%3D&reserved=0
> 
> 
> On Tue, 5 Mar 2019 at 13:04, Phillip Alday <phillip.alday using mpi.nl> wrote:
> 
>> Hi Nicolas,
>>
>> How much you can get away bending the assumptions depends in some ways
>> on how well the resulting model fits your data. If the resulting model
>> is a poor fit, then it's not a great model for performing inference. The
>> other problem with bending assumptions is that a lot of 'error
>> statistics' (standard errors, t-values, and basically anything related
>> to significance testings) aren't guaranteed to do what they are supposed
>> to do. (In your case, the good behavior of your residuals suggests that
>> this won't be a huge problem, but there are no promises.)
>>
>> You can get around this a bit by doing things like cross-validation or
>> other inferential steps based on how well the model generalizes to /
>> predicts new data instead of significance testing of coefficients or
>> linear hypotheses.
>>
>> John Kruschke has written about this issue at some length and seems
>> convinced that it's (almost) always a bad idea to bend the
>> metric/continuous assumption when dealing with ordinal data:
>>
>>
>> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdoingbayesiandataanalysis.blogspot.com%2F2017%2F12%2Fwhich-movie-is-rated-better-dont-treat.html&data=02%7C01%7Cd.rizopoulos%40erasmusmc.nl%7C6be346dd1ef044242f7008d6a23c415f%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C1%7C636874777051485248&sdata=cQnJprVCR3EvfNaD8Rb6fjs0eDipjI2heMflyne8%2F4Y%3D&reserved=0
>>
>>
>> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdoingbayesiandataanalysis.blogspot.com%2F2018%2F09%2Fanalyzing-ordinal-data-with-metric.html&data=02%7C01%7Cd.rizopoulos%40erasmusmc.nl%7C6be346dd1ef044242f7008d6a23c415f%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C1%7C636874777051495257&sdata=Oy2ddDyCDIiFOWB9p%2Fwhd5x%2FImpc4cTollEqDA%2Fh1Yk%3D&reserved=0
>>
>> The latter is largely a link/"press release" for the associated paper:
>>
>> Liddell, T. M., & Kruschke, J. K. (2018). Analyzing ordinal data with
>> metric models: What could possibly go wrong? Journal of Experimental
>> Social Psychology , 79 , 328–348. doi:10.1016/j.jesp.2018.08.009
>>
>> Finally, have you tried other link and threshold functions in clmm?
>> Those can make a huge difference!
>>
>> Phillip
>>
>> On 5/3/19 11:00 am, Nicolas Deguines wrote:
>>> Hello everyone,
>>>
>>> I am investigating how engagement into a citizen science program can
>> change
>>> participants' behavior in terms of implementing gardening techniques
>>> benefitting biodiversity.
>>> There are 2362 participants, distributed into 7 cohorts (= year in which
>>> they joined the program), and I have repeated gardening technique
>>> information for multiple years for each participant.
>>> So I need to use mixed modeling.
>>>
>>> One of the response variable is a score that can takes 5 values: 0, 1, 2,
>>> 3, or 4. It's ordered, it's not continuous (there are 5 levels).
>>> I would analyze this into a cumulative link mixed models (using clmm()
>> from
>>> ordinal package) but the Hessian condition I obtained with such model is
>>>
>>> 5.0e+06. I.e. assumption is violated (simplifying my initial full model
>> did
>>> not help at all).
>>>
>>> As an alternative, I am wondering if I could treat this response variable
>>> has a continuous one into a lmer() model.
>>> When I do:
>>> - Normality of model residuals is nicely met
>>> - Homoscedasticity of model residuals is met as well.
>>> => does meeting these two assumptions is enough to validate the use of a
>>> lmer() model for an ordered categorical response variable?
>>>
>>> In one of Douglas Bates' presentation (slide 3 of Jan. 2011, Madison:
>>> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flme4.r-forge.r-project.org%2Fslides%2F2011-01-11-Madison%2F5GLMM.pdf&data=02%7C01%7Cd.rizopoulos%40erasmusmc.nl%7C6be346dd1ef044242f7008d6a23c415f%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C1%7C636874777051495257&sdata=A0huudQxBTempIcQXjHS%2BfZWPTsHl3ZiDKMEX0B2Z%2B4%3D&reserved=0),
>> it
>>> is said that
>>> "When using LMMs we assume that the response being modeled is on a
>>> continuous scale.
>>> Sometimes we can bend this assumption a bit if the response is an ordinal
>>> response with a moderate to large number of levels.
>>> For example, [...a response variable taking] integer values on the scale
>> of
>>> 1 to 10."
>>> => is 5 levels too few to be treated as continuous? Or would it be ok
>> given
>>> residuals behave nicely?
>>>
>>> I would appreciate any help and thoughts on this.
>>> I checked that this was not treated in a previous post and I hope I did
>> not
>>> miss it (sorry if I did).
>>>
>>> Best,
>>> Nicolas Deguines
>>> ----------------------------------
>>> Postdoctoral Research Associate
>>> Laboratoire Ecologie, Systématique et Evolution
>>> Université Paris Sud, Orsay, France
>>> Website: https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fnicolasdeguines.weebly.com%2F&data=02%7C01%7Cd.rizopoulos%40erasmusmc.nl%7C6be346dd1ef044242f7008d6a23c415f%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C1%7C636874777051495257&sdata=PFyamOaNafTF9tACqTx1y%2BNH6q34P4m%2B9waT4nng2VY%3D&reserved=0
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org mailing list
>>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-mixed-models&data=02%7C01%7Cd.rizopoulos%40erasmusmc.nl%7C6be346dd1ef044242f7008d6a23c415f%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C1%7C636874777051495257&sdata=Jx%2FX%2BiPkf24jsL32CWAkt48CSLnaHac9SoUwlvsEQsc%3D&reserved=0
>>>
>>
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-mixed-models&data=02%7C01%7Cd.rizopoulos%40erasmusmc.nl%7C6be346dd1ef044242f7008d6a23c415f%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C1%7C636874777051495257&sdata=Jx%2FX%2BiPkf24jsL32CWAkt48CSLnaHac9SoUwlvsEQsc%3D&reserved=0
> 

-- 
Dimitris Rizopoulos
Professor of Biostatistics
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web (personal): http://www.drizopoulos.com/
Web (work): http://www.erasmusmc.nl/biostatistiek/
Blog: http://iprogn.blogspot.nl/


More information about the R-sig-mixed-models mailing list