[R-sig-ME] Model comparison with anova and AIC: p=0 and different AIC-values

Sun Nov 17 18:36:06 CET 2013

On 13-11-16 01:21 PM, Stefan Th. Gries wrote:
> I am attaching the data and the code separately, 
> trying the code you sent now.
> STG
> 

TL;DR It's pretty clear that the full model you're trying to fit is too
complicated, and weird, but it *ought* to fit or at least warn you that
it can't. I will try to work with it a bit and see if we can use it as a
test case for detecting badly fitted models/figure out what's going wrong.

================
  As a start, this model is attempting to fit 10 variance-covariance
parameters to only 72 observations; while I don't know a great rule of
thumb for how big a data set one should have for N var-cov parameters,
it should definitely be *more* conservative than the "10-20 observations
per parameter" rule of thumb (e.g. see Harrell _Regression Modeling
Strategies_) -- var-cov parameters are harder to estimate than location
(fixed-effect) parameters.

  A few other comments:

* using a quadratic effect poly(SAMPLE,2) for a variable with only three
levels (100, 200, 500) seems a bit odd, although I guess the model has
the same number of parameters as if you had just treated SAMPLE as a
factor in the first place (if you treat it as an *ordered* factor then
you will get orthogonal polynomial contrasts by default anyway).

* in general, if you want to correspond to a traditional aov()-style
analysis, you want to use a random effect of the form (1|A/B) rather
than (B|A); if A and B are both categorical, then the first (an
intercept effect at grouping levels A and B-within-A) is different from,
and more parsimonious than, the second (random effects of levels B
across grouping variables A) -- in particular, the latter includes all
the covariances between effects of B.

* the original aov()-style analysis has no room left for residual error:
since the experimental design has a single observation per
NAME:USED:SAMPLE combination (i.e. an unreplicated randomized block
design), using Error(NAME/(USED*SAMPLE)) gives one error value for each
observation -- so when you try to replicate this via

m.01c.reml <- lmer(OVERLAParcsine ~ USED*SAMPLE+
                   (1|NAME/(USED:SAMPLE)),
                   REML=TRUE,data=x)

you get

Error in checkNlevels(reTrms$flist, n = n, control) :
  number of levels of each grouping factor must be < number of observations

This seems to be OK and probably the closest in spirit to the original
aov() analysis.

lmer(OVERLAParcsine ~ USED*SAMPLE+
		   (1|NAME)+(1|NAME:USED)+(1|NAME:SAMPLE),
                   REML=TRUE,data=x,
                   control=lmerControl(check.nobs.vs.rankZ="ignore"))

(I don't know why the rank-Z testing check is giving an error here --
false positive, or this model is really *not* OK but I haven't figured
that out)

  Finally, if possible (i.e. if the denominator of the 'OVERLAP'
variable is known), I would consider a binomial GLMM rather than
trying to transform (e.g. see e.g. Warton and Hui "The arcsine is
asinine" Ecology 2011) ...