[R-sig-ME] lmer: LRT and mcmcpvalue for fixed effects

Tue Jul 15 15:02:06 CEST 2008

On Tue, Jul 15, 2008 at 3:33 AM, Rune Haubo <rhbc at imm.dtu.dk> wrote:
> 2008/7/15 Simon Blomberg <s.blomberg1 at uq.edu.au>:
>> On Tue, 2008-07-15 at 13:43 +0800, Julie Marsh wrote:
>>
>>>
>>>
>>> Given that I am using 2 different tests for two different hypotheses I
>>> still would have expected these p-values to be more similar.
>>
>> Well, as Pinheiro and Bates say in their book (worth reading!), the LRT
>> for mixed effects models is anti-conservative. So your LRT p-value is
>> almost certainly too small. The posterior p-value might be more
>> accurate, if you accept the usual caveats re: priors and convergence
>> etc. Also, when calculating p-values by hand using pchisq, you should
>> probably use pchisq(..., lower.tail=FALSE) instead of 1-pchisq(...),
>> which is inaccurate. The log.p option might also be useful if you really
>> need to compare small probabilities. And why were you using pchisq with
>> 0 df (which always == 1)? I don't understand that at all.
>
> Regarding the mixture of a chi-square distribution, this is more
> appropriate when testing a single variance component. The ordinary
> test with one df is conservative, since the test is on the boundary of
> the parameter space. But Julie is not testing a variance component, so
> the mixture is not appropriate here.
>
> I can think of three reasons, that the p-values Julie obtains are
> different in the likelihood ratio test and the posterior sampling.

> 1) the extensive use of control parameters indicates that convergence
> may be an issue. If one or both the models have not reached
> convergence, obviously the likelihood ratio test is based on wrong
> likelihoods and will be misleading. I suppose the MCMC sampling will
> also be inappropriate in this situation. The need for control
> parameters could also indicate that some problems are related to the
> data structure such as severe unbalance. Perhaps there is too little
> information on some parameters and convergence is hard to achieve?

I think that the use of those control parameters is based on examples
in various postings to email lists or in papers.  As you mention
below, it is important to take note of the version of the lme4 package
being used.  Most of those control parameters are, to quote Ron
Ziegler, President Nixon's press secretary, "no longer operative".  In
a way, it's a pity.  Some of the best mathematics I have ever done was
the derivation of the analytic gradient and the ECME updates for the
profiled deviance and then I found that the use of these elegant
formulas actually slowed things down on large problems.  (There's a
lesson there for those who think that statistical computing research
consists of deriving a formula and letting others worry about the
trivial implementation details.)

I guess my question would be if it really is necessary to set the
maximum number of iterations so high.  (By the way, changing the
maximum number of iterations doesn't have an effect in the current
version of the lme4 package.  I will reimplement it in the next
release.)

> If the models converged nicely and problems with data or models are
> unlikely, I would be inclined to trust the likelihood ratio test. It
> is a test on one df with 3000+ observations, so as far as I know,
> there should be no problems.

I agree.  If convergence is smooth then the likelihood ratio statistic
of 46 on 1 degree of freedom is definitive.  I imagine that the
t-statistic for that coefficient is also large in magnitude (close to
+/- 7).  In those realms, assigning a numeric p-value is less
important than the information that "it is very, very small".

> 2) the MCMC sampling could be influenced by the priors or the chain
> could get stuck in a specific region.

Yes, I would definitely want to look at the xyplot of the mcmcsamp
output, just to see if the chain managed to get itself stuck
somewhere.

> 3) which version of lmer are you using? Douglas Bates recently posted
> a message to this list reporting problems with MCMCsamp in the most
> recent version of lme4.

Indeed - although I am more concerned about the sample of the variance
component parameters than about the sample from the distribution of
the fixed-effects parameters.  If the sample from the variance
component parameters does not have the correct properties that will,
of course, affect the sample from the distribution of the
fixed-effects parameters but I am more confident of the conditional
sampling of the fixed effects (it is just a multivariate normal - I
could still manage to botch that but it is less likely than is my
getting the more complicated distributions wrong) than of the
conditional sampling of the variance component parameters.

However, given the description of the problem the mcmcsamp would need
to have been created using lme4 version 0.99875-x (because versions
0.999375-y can't yet handle vector-valued random effects).

I think the bottom line is that the likelihood ratio test and the
t-statistic are definitive and there is no need to follow up with the
mcmcsamp approach.

> /Rune
>
>>
>>> I am so
>>> sorry that I can't post the data and the lmer output but I am bound by
>>> confidentiality. <big sigh>  I understand completely if it is not
>>> possible to provide any help given this lack of further information.
>>>
>>> I have eagerly read and re-read the rwiki help page  .........
>>>
>>> http://wiki.r-project.org/rwiki/doku.php?id=guides:lmer-tests
>>>
>>> ....... but still am unable to explain why the results should be so
>>> different. Much as I would love to argue against the reliance on
>>> p-values I'm afraid I am a resigned pragmatist when it comes to trying
>>> to get anything published.  <sorry!>  Needless to say I will swamp the
>>> article with far more informative plots and CI's.
>>>
>>> Any help would be very much appreciated.
>>>
>>> kindest regards,  julie marsh.
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> --
>> Simon Blomberg, BSc (Hons), PhD, MAppStat.
>> Lecturer and Consultant Statistician
>> Faculty of Biological and Chemical Sciences
>> The University of Queensland
>> St. Lucia Queensland 4072
>> Australia
>> Room 320 Goddard Building (8)
>> T: +61 7 3365 2506
>> http://www.uq.edu.au/~uqsblomb
>> email: S.Blomberg1_at_uq.edu.au
>>
>> Policies:
>> 1.  I will NOT analyse your data for you.
>> 2.  Your deadline is your problem.
>>
>> The combination of some data and an aching desire for
>> an answer does not ensure that a reasonable answer can
>> be extracted from a given body of data. - John Tukey.
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
>
>
> --
> Rune Haubo Bojesen Christensen
>
> Master Student, M.Sc. Eng.
> Phone: (+45) 30 26 45 54
> Mail: rhbc at imm.dtu.dk, rune.haubo at gmail.com
>
> DTU Informatics, Section for Statistics
> Technical University of Denmark, Build.321, DK-2800 Kgs. Lyngby, Denmark
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>