[R-sig-ME] likelihood-ratio tests in conflict with coefficiants in maximal random effect model
Levy, Roger
rlevy at ucsd.edu
Fri Mar 7 18:13:46 CET 2014
On Mar 7, 2014, at 6:21 AM, Shravan Vasishth <vasishth.shravan at gmail.com> wrote:
> Hi Roger and Emilia, and others,
>
> I just wanted to say that in Emilia's data, she has 36 subjects and 20
> items. Roger, would you agree that it is very difficult with this amount of
> data to accurately estimate the full variance-covariance matrices for
> subjects and for items random effects, especially the correlation
> parameters? The numbers that lmer returns, for such sizes of data, are
> pretty wild estimates, and often have no bearing to the true underlying
> correlations. I think that in this situation we might be asking too much
> from lmer, without giving it enough data. If, on the other hand, we have a
> lot of data by subjects and items, it becomes possible to estimate these
> parameters.
>
> I believe this may have been, at least partly, the intent of Douglas Bates'
> original message about overparameterization.
That’s a good question. I imagine there is a fair bit of uncertainty regarding the correlation parameters, though I would guess that it’s not huge for this-sized dataset. The point estimates that lme4(.0) give us don’t quantify this uncertainty, but of course we could use Bayesian methods to get a better sense of them.
More generally, this point that you raise, Shravan, is precisely the reason that I tend to favor likelihood-ratio tests over the t-statistic for the purposes of confirmatory hypothesis tests like Emilia’s. As Baayen, Davidson and Bates (2008, page 396) crucially point out, the t-statistic is computed conditional on a point estimate of the random-effects covariance matrix, and fails to take into account uncertainty in the estimate of this matrix. The likelihood ratio does not have this problem. (It has other problems — namely that the log likelihood ratio is not truly chi-squared distributed — but with 20 items and 36 subjects in a balanced design I would expect that the chi-squared approximation is fairly close. And at any rate, the same problem exists with the t statistic.)
So my take is that how much we should worry about these issues depends in part on our modeling goals. For a confirmatory hypothesis test like Emilia’s on her dataset, I wouldn’t worry much about overparameterization for the models she was showing us. If she wanted to aggressively interpret the parameter estimates resulting from a particular model fit, on the other hand, I would be much more cautious.
Best
Roger
More information about the R-sig-mixed-models
mailing list