[R-sig-ME] What to do with zero inflated, negative skewed, negative data: a question about GLMMs

Mon Nov 30 08:50:49 CET 2020

Dear Gabriella,

I'd try to fit a single model to the data.The response seems continuous to
me. So I'd try a Gaussian distribution. You might need to fit a different
variance for each of the questions.

library(nlme)
lme(sentiment ~ question + age + (1|patient))
lme(sentiment ~ question + age + (1|patient), weight = VarIdent(form = ~
1|question))

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx using inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>

Op ma 30 nov. 2020 om 01:24 schreef Gabriella Kountourides <
gabriella.kountourides using sjc.ox.ac.uk>:

> Hello everyone,
>
> This is my first question to this list :) I  hope this email finds you all
> well.
>
>
>   I have been struggling for the past few weeks to set an appropriate
> model for my data. I have read Prof Bolker's practical guide for ecology
> and evolution paper, as well as the GLMM FAQs which have been immensely
> helpful. I am only just beginning my stats journey (and R!) and although I
> am really enjoying it, I have found myself completely stumped with my
> dataset. I will describe the data set below, and below that the various
> attempts I have made to analyse it. I would be incredibly grateful to hear
> your thoughts.
>
> All the very best
>
> Data:
>
>
> I want to look whether there is a relationship between the phrasing used
> when a question is asked (positive, negative, neutral wording) and the
> polarity of the response from the individual.
>
>
> 2638 people were asked a question about medical symptoms.
>
> 1/3 of the people were asked it with a negative wording, 1/3 with a
> neutral one, 1/3 with a positive one.
>
> The big question is: does the way the question is asked  affect the
> polarity of the response
>
>
> From this, I did sentiment analysis (using trincker's<
> https://github.com/trinker/sentimentr> package), which provides a
> polarity score (this can be negative, neutral or positive) to see whether
> their responses were more positive or negative, depending on the wording of
> the question.
>
>
> Sentiment analysis breaks down responses into sentences, so I have 2638
> people, but 7924 sentences, so I would assume to fit ID as a random effect.
>
>
> Range: -4.0376 to + 0.7915.
> Median :-0.1830
> Mean   :-0.2149
>
> Mode: 0
> skew: -1.7
>
> There are many 0s in my model, these are true 0s, they represent a
> 'neutral' response, which is important. My data is negatively skewed, so
> more people answer in a negative way. But I still want to know, whether the
> phrasings affect the skew/is one phrasing leading to 'less negative'
> responses?
>
> What I've tried:
> Initially, I tried to do a glm with the raw data, but I can't use poisson
> as it is negative, it is skewed so its not gaussian, and its not binomial.
>
> So next I made 3 new variables, which were counts. For example 'PosCount'
> scored 1 for each row with a +polarity score, and a 0 if not.  Idem for
> neutral (sentiment=0) and positive (sentiment>0). Decided to run Zero
> Inflated Poisson
>
> I ran a glmm for each count variable-example for the positive one:
> pos <-glmmTMB(PosCount~ wordingQ + (1|id) + age, data=allprimesent,
> ziformula=~1, family=poisson)
>
> and then the 'overdisp_fun' function which gave
> > overdisp_fun(posmodel)
>  chisq                  ratio                          rdf            p
> 6268.8427185    0.8295412.   7557.0000000    1.0000000
>
> So I suppose my questions are: do you think this is the best thing to do
> with my data? Do you know of any better thing I can do with the raw data,
> I'd rather not lose the information about the strength of the sentiment,
> but if I keep it, I need a model that can deal with 0 inflation, negative
> skew, and negative numbers.
>
> Many thanks if you've read this! I look forward to hearing from you!
> All the best
>
> p.s. I am relatively new to stats and R, please bare that in mind with
> your terminology if you are kind enough to answer
>
>
> Gabriella Kountourides
>
> DPhil Student | Department of Anthropology
>
> Evolutionary Medicine and Public Health Group
>
> St. John’s College, University of Oxford
>
> gabriella.kountourides using sjc.ox.ac.uk
>
> Tweet me: https://twitter.com/GKountourides
>
> ________________________________
>
>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]