[R-sig-ME] What to do with zero inflated, negative skewed, negative data: a question about GLMMs
Ben Bolker
bbo|ker @end|ng |rom gm@||@com
Mon Nov 30 18:09:29 CET 2020
I think Gabriella may have abandoned the linear mixed model (i.e.
Gaussian distribution) because of a skewed distribution of responses. A
couple of things to keep in mind about this:
- you don't need to worry about the *marginal* distribution of the
data (i.e., what you get if you plot the histogram or density of your
response variable). The assumptions in LMMs (like most models) are about
the *conditional* distribution, i.e. the distribution of the residuals
(e.g., fit your model first, then examine lattice::qqmath(fitted_model)
or hist(residuals(fitted_model))
- non-normality (including skewness) even in the conditional model
is much less important to the validity (accuracy of the parameter
estimates, confidence intervals, etc.) than many people think
- in principle you could transform the response variable to deal
with this, although admittedly the choice of transformations is much
more limited for non-positive data (e.g. Yeo-Johnson transformations,
see `?car::yjPower`, although there are some issues here about whether
you're transforming the marginal or the conditional distribution ...
cheers
Ben Bolker
On 11/30/20 2:50 AM, Thierry Onkelinx via R-sig-mixed-models wrote:
> Dear Gabriella,
>
> I'd try to fit a single model to the data.The response seems continuous to
> me. So I'd try a Gaussian distribution. You might need to fit a different
> variance for each of the questions.
>
> library(nlme)
> lme(sentiment ~ question + age + (1|patient))
> lme(sentiment ~ question + age + (1|patient), weight = VarIdent(form = ~
> 1|question))
>
> Best regards,
>
> ir. Thierry Onkelinx
> Statisticus / Statistician
>
> Vlaamse Overheid / Government of Flanders
> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
> FOREST
> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
> thierry.onkelinx using inbo.be
> Havenlaan 88 bus 73, 1000 Brussel
> www.inbo.be
>
> ///////////////////////////////////////////////////////////////////////////////////////////
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
> ///////////////////////////////////////////////////////////////////////////////////////////
>
> <https://www.inbo.be>
>
>
> Op ma 30 nov. 2020 om 01:24 schreef Gabriella Kountourides <
> gabriella.kountourides using sjc.ox.ac.uk>:
>
>> Hello everyone,
>>
>> This is my first question to this list :) I hope this email finds you all
>> well.
>>
>>
>> I have been struggling for the past few weeks to set an appropriate
>> model for my data. I have read Prof Bolker's practical guide for ecology
>> and evolution paper, as well as the GLMM FAQs which have been immensely
>> helpful. I am only just beginning my stats journey (and R!) and although I
>> am really enjoying it, I have found myself completely stumped with my
>> dataset. I will describe the data set below, and below that the various
>> attempts I have made to analyse it. I would be incredibly grateful to hear
>> your thoughts.
>>
>> All the very best
>>
>> Data:
>>
>>
>> I want to look whether there is a relationship between the phrasing used
>> when a question is asked (positive, negative, neutral wording) and the
>> polarity of the response from the individual.
>>
>>
>> 2638 people were asked a question about medical symptoms.
>>
>> 1/3 of the people were asked it with a negative wording, 1/3 with a
>> neutral one, 1/3 with a positive one.
>>
>> The big question is: does the way the question is asked affect the
>> polarity of the response
>>
>>
>> From this, I did sentiment analysis (using trincker's<
>> https://github.com/trinker/sentimentr> package), which provides a
>> polarity score (this can be negative, neutral or positive) to see whether
>> their responses were more positive or negative, depending on the wording of
>> the question.
>>
>>
>> Sentiment analysis breaks down responses into sentences, so I have 2638
>> people, but 7924 sentences, so I would assume to fit ID as a random effect.
>>
>>
>> Range: -4.0376 to + 0.7915.
>> Median :-0.1830
>> Mean :-0.2149
>>
>> Mode: 0
>> skew: -1.7
>>
>> There are many 0s in my model, these are true 0s, they represent a
>> 'neutral' response, which is important. My data is negatively skewed, so
>> more people answer in a negative way. But I still want to know, whether the
>> phrasings affect the skew/is one phrasing leading to 'less negative'
>> responses?
>>
>> What I've tried:
>> Initially, I tried to do a glm with the raw data, but I can't use poisson
>> as it is negative, it is skewed so its not gaussian, and its not binomial.
>>
>> So next I made 3 new variables, which were counts. For example 'PosCount'
>> scored 1 for each row with a +polarity score, and a 0 if not. Idem for
>> neutral (sentiment=0) and positive (sentiment>0). Decided to run Zero
>> Inflated Poisson
>>
>> I ran a glmm for each count variable-example for the positive one:
>> pos <-glmmTMB(PosCount~ wordingQ + (1|id) + age, data=allprimesent,
>> ziformula=~1, family=poisson)
>>
>> and then the 'overdisp_fun' function which gave
>>> overdisp_fun(posmodel)
>> chisq ratio rdf p
>> 6268.8427185 0.8295412. 7557.0000000 1.0000000
>>
>> So I suppose my questions are: do you think this is the best thing to do
>> with my data? Do you know of any better thing I can do with the raw data,
>> I'd rather not lose the information about the strength of the sentiment,
>> but if I keep it, I need a model that can deal with 0 inflation, negative
>> skew, and negative numbers.
>>
>> Many thanks if you've read this! I look forward to hearing from you!
>> All the best
>>
>> p.s. I am relatively new to stats and R, please bare that in mind with
>> your terminology if you are kind enough to answer
>>
>>
>> Gabriella Kountourides
>>
>> DPhil Student | Department of Anthropology
>>
>> Evolutionary Medicine and Public Health Group
>>
>> St. John’s College, University of Oxford
>>
>> gabriella.kountourides using sjc.ox.ac.uk
>>
>> Tweet me: https://twitter.com/GKountourides
>>
>> ________________________________
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
More information about the R-sig-mixed-models
mailing list