[R-sig-ME] Ok with a "small amount" of non-normality?

Sat May 4 02:10:05 CEST 2013

The question that you ask does not admit of an easy answer!
Hopefully the following will shed light on the general tack to be taken.

One can do simulations and check the effect of one or other level of
skewness (usually skewness to the right) on the parameter estimates.

The residuals (whether level 0 or level 1 in your case) are rarely the
right quantities to check for normality.  The way they are offered as
a source of insight in the Pinheiro and Bates book seems to me 
misleading in this respect.  

Consider a split plot design, with treatments estimated at the level of
plots within blocks, as in the kiwishade dataset in the DAAG package.
What matters for comparing treatments is the (approximate) normality
of what in the Genstat world would be called effects at the plot level.  
There are just 12 of these.  They can for this balanced design be 
obtained by basing the analysis on the plot means; they are the 
residuals from that analysis.

Any skewness at the subplot level gets somewhat averaged out.
(There are 4 subplot values per plot.)    The residuals from the lme
model, whether at the subplot or plot level, will exaggerate any 
skewness that may be due to variation between subplots.  Even in this
simple case, these 'effect' estimates are correlated, which somewhat
complicates the checking for normality. 

Direct checks for the distribution of the relevant quantities get quite messy
for unbalanced designs.  Bootstrap methods, having regard to the 
covariance structure, might be considered.  Or make a stab at the
distributions of the relevant component effects (now as in an lme or lmer
sense), and simulate.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 04/05/2013, at 4:36 AM, "Boulanger, Yan" <Yan.Boulanger at rncan-nrcan.gc.ca> wrote:

> Hi folks,
> This may be more of a "philosophical"- student question. In Zuur et al. (2009). "Mixed effects models and extensions in ecology with R", it is mentioned on page 20 that "[...] we can get away with a small amount of non-normality"
> I'm little bit puzzled when I face this kind of affirmation in a textbook. What is really "a small amount"?  Of course, it depends on your "judgement"...  In my case, I have level0 and level1 residuals that are unskewed and that show a relatively modest kurtosis (unbiased) of about 2.5 - 3.0. My models are based on several tens of thousands of individuals and normality tests (e.g., shapiro.test) always fail for residuals. QQ-plot show these rather long tails which correspond to "some" outliers (considering my data, there are several hundreds of "outliers" in this case). Homoscedaticity, when considering or not random effects, is not violated so I wondered if I could rely on these model's estimates considering the non-normality of the residuals. My judgement in this case would be that the departure from normality is not that high and this might not be a problem. But, as an ecologist, not a statistician, I have hard time to convince myself on this...  Any thoughts?
> 
> Thanks
> 
> Yan
> 
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models