[R-sig-ME] lme for data that is not normally distributed

Wed Aug 3 23:20:21 CEST 2016

A point to note is that it is the distribution of the relevant sampling
distribution, not the normality of the residuals as given by the
function resid(), that matters for reliance on the standard errors
and t-statistics of parameters.  In Moses’ example, the ID effects
are what will matter for this purpose.  

(In highly unbalanced designs, the estimated effects can have very 
non-normal distributions, even under strict model assumptions, and 
simulation may be the only way to get good insight on what is to be 
expected under those circumstances.)

The plot of residuals against fitted values is useful in checking
linearity (for heteroscedasticity this is more fraught because again
it is homogeneity for the relevant sampling distribution that matters),
and for checking leverage effects (a few IDs with an over-riding 
potential influence on the fitted response).

John Maindonald             email: john.maindonald at anu.edu.au

> On 4/08/2016, at 08:14, Ben Bolker <bbolker at gmail.com> wrote:
> 
>   For what it's worth, this graph is assessing
> linearity/heteroscedasticity rather than Normality (you would want a Q-Q
> plot, not a fitted vs residuals plot, for that).  This doesn't look too
> terrible, but there does seem to be a bit of 'flare' at the
> large-fitted-value end, which supports Paul's suggestion that you try a
> log transformation ...
> 
> On 16-08-03 03:58 PM, moses selebatso via R-sig-mixed-models wrote:
>> Thank you both Paul and Alain for your help. You both point out that
>> I shouldn't test for normality before running a model. I appreciate
>> that. Paul I have tried you new scripts and, I guess you were right
>> about experience in visually assessing for normality. Not straight
>> forward. Below is the plot, for your appreciation. library(lme4) 
>> install.packages("devtools") library(devtools) 
>> devtools::install_github("pcdjohnson/GLMMmisc") library(GLMMmisc) 
>> data<-read.csv("clipboard",sep="\t") m <- lmer(Distance ~ Time + (1 |
>> ID), data = data) sim.residplot(m) Regards, Moses SELEBATSO Home:
>> (+267) 318 5219 (H)  Mobile:  (+267) 716 39370  or  (+267) 738
>> 39370"Those who will ALWAYS agree with you may be oppressed by you"
>> 
>> On Wednesday, 3 August 2016, 15:54, Paul Johnson
>> <paul.johnson at glasgow.ac.uk> wrote:
>> 
>> 
>> 
>> Hi Moses,
>> 
>> I wouldn’t test normality of residuals — better to assess them by
>> eye. I know this sounds ad hoc but given that almost no real
>> distribution in nature is perfectly normal, the question should be
>> “how non-normal can the residuals be before seriously harming my
>> inferences?”. This is a more difficult question to answer and
>> basically requires experience. A test conflates the degree of
>> non-normality and sample size  so a significant result can mean
>> “quite normal but high n” while a non-significant result can mean
>> “very non-normal but low n”:
>> 
>> set.seed(1) x <- rpois(1000, 50) hist(x)  # looks beautifully normal 
>> shapiro.test(x) # significantly non-normal hist(log(x[1:20])) # looks
>> pretty bad shapiro.test(log(x[1:20])) # passes the test
>> 
>> Given that your distance response measure is (probably) constrained
>> to be positive, there’s a good change that it’s right-skewed and
>> potentially made more normal by log-transformation (if there are no
>> zero distances).
>> 
>> A good way to visually assess residuals is to plot them against the
>> fitted values, then compare these to residuals simulated from the
>> fitted model — they should look similar, give or take sampling
>> variation. You can do this with a function I recently wrote called
>> sim.residplot (available here:
>> https://github.com/pcdjohnson/GLMMmisc), although you’ll have to
>> refit your model using lmer in the lme4 package:
>> 
>> library(lme4) library(GLMMmisc) m <- lmer(Distance ~ Time + (1 | ID),
>> data = data) sim.residplot(m) # repeat a few times to allow for
>> sampling variation
>> 
>> Good luck, Paul
>> 
>> 
>> 
>>> On 3 Aug 2016, at 14:25, moses selebatso via R-sig-mixed-models
>>> <r-sig-mixed-models at r-project.org> wrote:
>>> 
>>> Thank very much for your helpful advice. I ran the model and tested
>>> the residuals. They are not normally distributed, and I am still
>>> stuck with how I proceed. I tried to copy the output on the email,
>>> but I get an error message that the message format cannot sent. 
>>> Regards, Moses
>>> 
>>> On Wednesday, 3 August 2016, 12:15, Highland Statistics Ltd
>>> <highstat at highstat.com> wrote:
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> Date: Wed, 3 Aug 2016 09:40:20 +0000 (UTC) From: moses selebatso
>>>> <selebatsom at yahoo.co.uk> To: R-sig-mixed-models
>>>> <r-sig-mixed-models at r-project.org> Subject: [R-sig-ME] lme for
>>>> data that is not normally distributed Message-ID: 
>>>> <127496753.15122202.1470217220406.JavaMail.yahoo at mail.yahoo.com> 
>>>> Content-Type: text/plain; charset="UTF-8"
>>>> 
>>>> ?Hello I have some data that I would to analyse with mixed models
>>>> (lme). As a standard procedure I tested for the normality of the
>>>> data and it is not normal. Any ideas of how deals with this kind
>>>> of data? I have a sample below and the model that I was hoping to
>>>> use (if?the data?was normal) m <-
>>>> lme(Distance~Time,random=~1|ID,data=data).
>>> 
>>> 
>>> Checking normality of the response variable before doing the
>>> analysis is a misconception. Why should it be normally distributed?
>>> Fit your model and check your residuals for normality.
>>> 
>>> 
>>> Alain
>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> |
>>>> 
>>>> 
>>>> | ID |
>>>> 
>>>> 
>>>> | Time |
>>>> 
>>>> 
>>>> | Distance |
>>>> 
>>>> 
>>>> |
>>>> 
>>>> 
>>>> | 10187A |
>>>> 
>>>> 
>>>> | Pre_dry |
>>>> 
>>>> 
>>>> | 4.31287 |
>>>> 
>>>> 
>>>> |
>>>> 
>>>> 
>>>> | 10187A |
>>>> 
>>>> 
>>>> | Pre_dry |
>>>> 
>>>> 
>>>> | 6.867578 |
>>>> 
>>>> 
>>>> |
>>>> 
>>>> 
>>>> | 10187A |
>>>> 
>>>> 
>>>> | Pre_dry |
>>>> 
>>>> 
>>>> | 4.640427 |
>>>> 
>>>> 
>>>> |
>>>> 
>>>> 
>>>> | 10187A |
>>>> 
>>>> 
>>>> | Post_dry |
>>>> 
>>>> 
>>>> | 4.497807 |
>>>> 
>>>> 
>>>> |
>>>> 
>>>> 
>>>> | 10187A |
>>>> 
>>>> 
>>>> | Post_dry |
>>>> 
>>>> 
>>>> | 9.726069 |
>>>> 
>>>> 
>>>> |
>>>> 
>>>> 
>>>> | 10187A |
>>>> 
>>>> 
>>>> | Post_dry |
>>>> 
>>>> 
>>>> | 5.150089 |
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Regards, Moses SELEBATSO? [[alternative HTML version deleted]]
>>>> 
>>>> 
>>>> 
>>>> ------------------------------
>>>> 
>>>> Subject: Digest Footer
>>>> 
>>>> _______________________________________________ 
>>>> R-sig-mixed-models mailing list R-sig-mixed-models at r-project.org 
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>> 
>>>> ------------------------------
>>>> 
>>>> End of R-sig-mixed-models Digest, Vol 116, Issue 4 
>>>> **************************************************
>>>> 
>>> 
>>> -- Dr. Alain F. Zuur
>>> 
>>> First author of: 1. Beginner's Guide to GAMM with R (2014). 2.
>>> Beginner's Guide to GLM and GLMM with R (2013). 3. Beginner's Guide
>>> to GAM with R (2012). 4. Zero Inflated Models and GLMM with R
>>> (2012). 5. A Beginner's Guide to R (2009). 6. Mixed effects models
>>> and extensions in ecology with R (2009). 7. Analysing Ecological
>>> Data (2007).
>>> 
>>> Highland Statistics Ltd. 9 St Clair Wynd UK - AB41 6DZ Newburgh 
>>> Tel:  0044 1358 788177 Email: highstat at highstat.com URL:
>>> www.highstat.com
>>> 
>>> _______________________________________________ 
>>> R-sig-mixed-models at r-project.org mailing list 
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>> 
>>> 
>>> 
>>> 
>>> [[alternative HTML version deleted]]
>>> 
>>> _______________________________________________ 
>>> R-sig-mixed-models at r-project.org mailing list 
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________ 
>> R-sig-mixed-models at r-project.org mailing list 
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> 
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models