[R-sig-ME] Fwd: Interpretation of lmer output in R

Thu Feb 24 19:34:17 CET 2011

On Thu, Feb 24, 2011 at 7:23 AM, Thilo Kellermann
<thilo.kellermann at rwth-aachen.de> wrote:
> Dear Thierry, Ben, Douglas, Julia and list :-)

> sorry for mixing in with another (maybe stupid) question, but Julia's
> result of the likelihood-ratio-test (LRT) reveals a confusion I came
> across with my own data: The test states that the two models differ
> significantly from each other, and in Pinheiro & Bates (2000) I read
> that I should (or at least can) prefer the model with the lower AIC or
> BIC value.

> In Julia's model comparison it happens that these values seem to be
> inconsistent, since the AIC "prefers" model fd whereas the BIC "prefers"
> model fd1 (see below). As mentioned above I observed similar results
> with my own data (using lme) which rendered the LRT inconclusive to
> me...?

Not an uncommon situation.  Generally AIC is more liberal that BIC in
allowing for more terms in the model.  The AIC criterion penalizes
each additional parameter by two units on the deviance scale.  That
is, an additional parameter must reduce the deviance by at least two
units to be able to reduce the AIC.  BIC is more conservative in that
it requires each additional parameter to reduce the deviance by log(n)
units where n is the number of observations.

If I have nested models, as these are, I prefer the likelihood ratio
test rather than AIC or BIC comparisons.  I regard these "2 units per
parameter" or "log(n) units per parameter" as somewhat arbitrary.  Of
course the likelihood ratio test on a single parameter gives a p-value
less than 5% when the change in the deviance is greater than

> qchisq(0.95, df=1)
[1] 3.841459

so it is rather arbitrary too.  Nonetheless I prefer to put the change
in the deviance on a probability scale where I think I can interpret
it more meaningfully.

> Best regards,
> Thilo
>
>> Data:
>> Models:
>> fd1: Foraging ~ 1 + (1 | Bird)
>> fd: Foraging ~ Depth + (1 | Bird)
>>        Df    AIC      BIC       logLik     Chisq Chi Df Pr(>Chisq)
>> fd1   2    1909.0  1920.5  -952.52
>> fd    3    1904.8  1922.0  -949.40   6.2385      1     0.0125 *
>> ---
>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
>
>
>
> On Thu, 2011-02-24 at 12:56 +0000, ONKELINX, Thierry wrote:
>> Dear Julia,
>>
>> Plotting the mean probability of foraging versus depth will make thing more clear.
>>
>> Depth <- seq(0, 50, length = 101)
>> plot(Depth, plogis(-1.6898789+0.0007075*Depth), type = "l")
>>
>> Note that the difference in probability between Depth = 0 and Depth = 50 is quite small and thus not relevant. So you have a variable (depth) in the model which improves the model, but has no biological relevance. Such thing can happen when you have a lot of data.
>>
>> I would keep depth in the model, and conclude that is it have a neglectable effect. This is a case were rejecting the STATISICAL null-hypothesis allows to accept the BIOLOGICAL null-hypotheses. Which is stronger than not being able to reject the BIOLOGICAL null-hypothesis.
>>
>> Best regards,
>>
>> Thierry
>>
>> ----------------------------------------------------------------------------
>> ir. Thierry Onkelinx
>> Instituut voor natuur- en bosonderzoek
>> team Biometrie & Kwaliteitszorg
>> Gaverstraat 4
>> 9500 Geraardsbergen
>> Belgium
>>
>> Research Institute for Nature and Forest
>> team Biometrics & Quality Assurance
>> Gaverstraat 4
>> 9500 Geraardsbergen
>> Belgium
>>
>> tel. + 32 54/436 185
>> Thierry.Onkelinx at inbo.be
>> www.inbo.be
>>
>> To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
>> ~ Sir Ronald Aylmer Fisher
>>
>> The plural of anecdote is not data.
>> ~ Roger Brinner
>>
>> The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
>> ~ John Tukey
>>
>>
>> > -----Oorspronkelijk bericht-----
>> > Van: r-sig-mixed-models-bounces at r-project.org
>> > [mailto:r-sig-mixed-models-bounces at r-project.org] Namens
>> > Julia Sommerfeld
>> > Verzonden: donderdag 24 februari 2011 13:34
>> > Aan: r-sig-mixed-models at r-project.org
>> > Onderwerp: [R-sig-ME] Fwd: Interpretation of lmer output in R
>> >
>> > ---------- Forwarded message ----------
>> > From: Julia Sommerfeld <Julia.Sommerfeld at utas.edu.au>
>> > Date: 2011/2/24
>> > Subject: Re: [R-sig-ME] Interpretation of lmer output in R
>> > To: Douglas Bates <bates at stat.wisc.edu>
>> >
>> >
>> > ... it's iluminating the confusion in my brain every day a
>> > little bit more!
>> >
>> > Yes, the confusion comes from my coding 0,1 both parameter. I
>> > hope you don't mind that I keep asking....
>> >
>> > 1. Maybe it helps if I have a look at a different data set
>> > where only the response variable has a coding of 0,1.
>> >
>> > I want to test if foraging (0,1 where 0=means "NO foraging"
>> > and 1=means "foraging" ) is influenced by water depth (Depth)
>> > (in meters), i.e. does my model including "Depth" fit better
>> > than without?
>> >
>> >  My model:
>> >
>> > >fm<-lmer(Foraging~Depth+(1|Bird),family=binomial)
>> > >fm1<-lmer(Foraging~1+(1|Bird),family=binomial)
>> > >anova(fm,fm1)
>> >
>> > Data:
>> > Models:
>> > fd1: Foraging ~ 1 + (1 | Bird)
>> > fd: Foraging ~ Depth + (1 | Bird)
>> >        Df    AIC      BIC       logLik     Chisq Chi Df Pr(>Chisq)
>> > fd1  2    1909.0  1920.5  -952.52
>> > fd    3    1904.8  1922.0  -949.40   6.2385      1     0.0125 *
>> > ---
>> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>> >
>> >
>> > > summary(fm)
>> > Generalized linear mixed model fit by the Laplace approximation
>> > Formula: Foraging ~ Depth + (1 | Bird)
>> >   AIC  BIC logLik deviance
>> >  1905 1922 -949.4     1899
>> > Random effects:
>> >  Groups Name        Variance Std.Dev.
>> >  Bird   (Intercept)     0.29505  0.54319
>> > Number of obs: 2249, groups: Bird, 23
>> >
>> > Fixed effects:
>> >                  Estimate      Std. Error     z value  Pr(>|z|)
>> > (Intercept) -1.6898789  0.1463908  -11.544   <2e-16 ***
>> > Depth        0.0007075   0.0002884   2.453     0.0142 *
>> > ---
>> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>> >
>> > Correlation of Fixed Effects:
>> >       (Intr)
>> > Depth 0.244
>> >
>> >
>> > >From this ouput I conclude that the model including "Depth" (fm)
>> > >explains
>> > the observed data significanlty better than without Depth (fm1).
>> >
>> > 2. But, if I want to know HOW Depth is influencing/related to
>> > the foraging event (1). I.e., do birds preferably forage in
>> > shallow or deep waters? How can I get this information out of
>> > the summary? Would plogis() still be the right approach in this case?
>> >
>> > plogis(1.6898789)
>> >
>> > plogis(1.6898789+0.0007075*second_column?)
>> >
>> > above plogis based on foraging=1, but what about "Depth"?
>> >
>> > I think I'm still getting someting wrong here...
>> >
>> > Cheers,
>> >
>> > Julia
>> >
>> >
>> >
>> >
>> >
>> > 2011/2/23 Douglas Bates <bates at stat.wisc.edu>
>> >
>> > > On Wed, Feb 23, 2011 at 6:17 AM, Julia Sommerfeld
>> > > <Julia.Sommerfeld at utas.edu.au> wrote:
>> > > > Dear Douglas and list,
>> > >
>> > > > Again, thanks for the thorough answers.
>> > >
>> > > > Regarding my last email: I'm mixing up my own data! Sorry
>> > for that.
>> > >
>> > > > 1. plogis(x)
>> > >
>> > > > SameSite=1 means that birds did use the SAME nest site,
>> > hence: they
>> > > > show site-fidelity.
>> > >
>> > > > SameSite=0 means that birds did CHANGE nest site, hence: no
>> > > site-fidelity.
>> > >
>> > > > This means that site-fidelity of "bird" who failed breeding
>> > > > corresponds
>> > > to a
>> > > > probability of 44% and that bird who was successful
>> > corresponds to a
>> > > > probability of 72% (see below).
>> > >
>> > > Yes.  When you code the response as 0 and 1 then the model provides
>> > > the probability that the response will be a 1.
>> > >
>> > > > Thus, with "plogis" of the Intercept and/or Intercept+Fixed Effect
>> > > Estimate
>> > > > I can calculate the probability of my observed event
>> > (R-help: plogis
>> > > gives
>> > > > the distribution function; being the inverse logit-function)?
>> > >
>> > > Yes.  It happens that the logit function (see
>> > > http://en.wikipedia.org/wiki/Logit) is the quantile
>> > function for the
>> > > standard logistic distribution (the R function qlogis) and
>> > its inverse
>> > > is the cumulative probability function for the standard
>> > logistic (the
>> > > R function plogis).  Because I know that the qlogis and plogis
>> > > functions are carefully written to handle difficult cases
>> > gracefully I
>> > > use them instead of writing out the expressions explicitly.
>> > >
>> > > > That is: plogis of the intercept estimate is based upon
>> > BreedSuc1=0,
>> > > > SameSite=1 and plogis of the intercept+BreedSuc1 Estimate
>> > is based
>> > > > upon BreedSuc1=1, SameSite=1?
>> > >
>> > > Yes.  Another way of looking at it would be to extract the model
>> > > matrix for your fitted model, using the function model.matrix.  You
>> > > will see that the first column is all 1's and the second
>> > column will
>> > > be 0's and 1's according to that bird's breeding success.
>> > Technically
>> > > the fixed-effects part of the predicted log-odds for a
>> > particular bird
>> > > is (Intercept) * first_column + BreedSuc1 * second_column
>> > but, because
>> > > every row is either 1 0 or 1 1,  the only possible values of the
>> > > predicted log-odds are (Intercept) and (Intercept) + BreedSuc1
>> > >
>> > > > And the "result" will always depend upon my coding (BreedSuc1=1
>> > > > means successful, SameSite=1 means site-fidelity). This
>> > is still a
>> > > > bit confusing...
>> > >
>> > > I would consider the two factors separately.  Regarding the
>> > SameSite
>> > > encoding just think of it as you are eventually modeling the
>> > > probability of an "event" for which there are two possible outcomes.
>> > > You can call them Failure/Success or Off/On or No/Yes or
>> > whatever you
>> > > want but when you get to a numerical coding as 0/1 the
>> > model will be
>> > > for the probability of getting a 1.
>> > >
>> > > As for the breeding success factor, you again need to
>> > characterize it
>> > > according to some numerical encoding to be able to create the model
>> > > matrix.  You happened to choose a 0/1 encoding so in the model the
>> > > coefficient for that term is added to the intercept when
>> > there is a 1
>> > > for that factor and not added when there is a 0.
>> > >
>> > > > My question was:
>> > > > What if both codings change? I.e. BreedSuc1=0 means
>> > successful and
>> > > > SameSite=0 means site-fidelity? Would it then still be
>> > 1-p instead of p?
>> > >
>> > > If change the encoding of the response, then your coefficients will
>> > > all flip signs.  What was previously a log-odds of site
>> > fidelity of,
>> > > say, 1.2 will now become a log-odds of site switching of -1.2.  You
>> > > can check that
>> > >
>> > > plogis(-x) = 1 - plogis(x)
>> > >
>> > > If you change the encoding of BreedSuc then the (Intercept)
>> > > coefficient will be the log-odds of a '1' outcome (whether
>> > that means
>> > > site fidelity or site switching) for a bird who was successful and
>> > > (Intercept) + BreedFail1 will be the log-odds of a 1 outcome for a
>> > > bird who was successful.
>> > >
>> > > I think part of the confusion for you is that you are trying to
>> > > combine the encoding of site-fidelity and breeding success.  It is
>> > > best to keep them distinct because site-fidelity is the
>> > response and
>> > > breeding success is a covariate.
>> > >
>> > > > 2. Z-value:
>> > > >
>> > > > I assume that z-value is the same as z-score? I've found the
>> > > > following
>> > > link
>> > > > (
>> > >
>> > http://resources.esri.com/help/9.3/arcgisdesktop/com/gp_toolref/spatia
>> > > l_statistics_toolbox/what_is_a_z_score_what_is_a_p_value.htm
>> > > ).
>> > > >
>> > >
>> > > > There, the z-score is defined as:
>> > > >
>> > > > "The Z score is a test of statistical significance that helps you
>> > > > decide whether or not to reject the null hypothesis".
>> > > >
>> > > > "Z scores are measures of standard deviation. For
>> > example, if a tool
>> > > returns
>> > > > a Z score of +2.5 it is interpreted as "+2.5 standard deviations
>> > > > away
>> > > from
>> > > > the mean". P-values are probabilities. Both statistics are
>> > > > associated
>> > > with
>> > > > the standard normal distribution. This distribution
>> > relates standard
>> > > > deviations with probabilities and allows significance and
>> > confidence
>> > > > to
>> > > be
>> > > > attached to Z scores and p-values".
>> > >
>> > > > Is this correct?
>> > >
>> > > Well, not really.  The logic behind statistical hypothesis
>> > testing is
>> > > subtle and frequently misunderstood.  When you try to simplify the
>> > > explanation, as is done above, it often ends up not quite accurate.
>> > >
>> > > The way I present it to students is that we have two
>> > competing claims,
>> > > which for us will boil down to a simpler model being
>> > adequate (called
>> > > the "null hypothesis" as in "nothing going on here, folks") or the
>> > > simpler model is not adequate and we need to extend it.  The second
>> > > claim is called the alternative hypothesis.  In your case the null
>> > > hypothesis is that breeding success doesn't influence the
>> > probability
>> > > of site fidelity and the alternative hypothesis is that it
>> > does.  You
>> > > want to establish the alternative hypothesis but, because of the
>> > > variability in the data, you can't "prove" it directly.
>> > You have to
>> > > use an indirect method, which is to assume that it is not the case
>> > > (i.e. the null hypothesis is true) and demonstrate that it
>> > is unlikely
>> > > to obtain the data that you did, or something even more unusual, if
>> > > the null hypothesis were true.
>> > >
>> > > So that is the subtlety.  You can't prove your point
>> > directly so you
>> > > set up the "straw man" argument (the null hypothesis) and try to
>> > > refute it.  But nothing is certain.  You can't prove that it is
>> > > impossible to get the data that you did if breeding success did not
>> > > influence site fidelity.  The best you can do is say it is
>> > extremely
>> > > unlikely.
>> > >
>> > > Now we need to formalize this argument by settling on a "test
>> > > statistic", which is a quantity that can be calculated from the
>> > > observed data.  Along with the test statistic we need a reference
>> > > distribution which is the distribution of the test
>> > statistic when the
>> > > null hypothesis is true.  Formally, we are supposed to set
>> > up all the
>> > > rules about the test statistic and the reference
>> > distribution before
>> > > we see the data - sort of like the "pistols at dawn, march
>> > ten paces
>> > > then turn and fire" rules of a duel.  In practice we
>> > sometimes look at
>> > > the data before we decide exactly what hypotheses we will
>> > test.  This
>> > > is the "data snooping" that Ben wrote about.
>> > >
>> > > Anyway, when we have set up the rules and we have the data
>> > we evaluate
>> > > the test statistic and then determine the probability of
>> > seeing that
>> > > value or something even more extreme when the null
>> > hypothesis is true.
>> > >  This is what we call the p-value.  If that probability is
>> > very small
>> > > then either we encountered a rare case or the null hypothesis is
>> > > false.  That's why you, as the researcher, want the p-value to be
>> > > small, when you are trying to establish the alternative.
>> > >
>> > > (As an aside, this is where most people lose track of the argument.
>> > > They think the p-value is related to the probability of one of the
>> > > hypotheses being true.  That is not the case.  The p-value is the
>> > > probability of seeing the data that we did, or something
>> > more unusual,
>> > > if the null hypothesis - the "no change" assumption - were true.)
>> > >
>> > > Now we get to the point of how do we calculate the p-value.
>> >  We need
>> > > to come up with a way of characterizing "unusual" data.
>> > What I prefer
>> > > to do is to see how good the fit is without allowing for
>> > differences
>> > > due to breeding success and how good it is when allowing for
>> > > differences due to breeding success.  The measure of "how
>> > good is the
>> > > fit" is called the deviance.  So we fit the model with the
>> > BreedSuc1
>> > > term and without it and compare the deviance values.  The reference
>> > > distribution for this difference is generally taken to be a
>> > > chi-squared distribution.  Technically all we can say is
>> > that when the
>> > > sample sizes get very large, the difference in the deviance values
>> > > tends to a chi-squared distribution.  But I would claim
>> > that it is a
>> > > reasonable way of judging the difference in quality of fit
>> > for finite
>> > > samples too.
>> > >
>> > > An alternative approach to judging if a coefficient is
>> > "significantly
>> > > different" from zero is to take the value of the coefficient and
>> > > divide by its standard error.   If everything is behaving well, this
>> > > ratio would have a standard Gaussian (or "normal")
>> > distribution when
>> > > the null hypothesis is true.  We write a standard normal random
>> > > variable as Z so this ratio is called a z-statistic.
>> > >
>> > > For me the problem with using a z-statistic is that you are
>> > comparing
>> > > two models (with and without the term of interest) by
>> > fitting only one
>> > > of them and then extrapolating to decide what the other fit
>> > should be
>> > > like.  This was a necessary short-cut when fitting a model might
>> > > involve weeks of hand calculation.  But if fitting a model involves
>> > > only a few seconds of computer time, why not fit both and do a
>> > > comparison of the actual difference in the quality of the fits?
>> > >
>> > > So, why should we even both printing the z-statistics?  I consider
>> > > them to be a guide but if I actually want to do the hypothesis test
>> > > involving a simple model versus a more complex model I fit both
>> > > models.
>> > >
>> > > I hope this is more illuminating than confusing.
>> > >
>> >
>> >
>> >
>> > --
>> > Julia Sommerfeld - PhD Candidate
>> > Institute for Marine and Antarctic Studies University of
>> > Tasmania Private Bag 129, Hobart TAS 7001
>> >
>> > Phone: +61 458 247 348
>> > Email: julia.somma at gmx.de
>> > Julia.Sommerfeld at utas.edu.au
>> >
>> >     [[alternative HTML version deleted]]
>> >
>> >
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> --
> Thilo Kellermann
> RWTH Aachen University
> Department of Psychiatry, Psychotherapy and Psychosomatics
> JARA Translational Brain Medicine
> Pauwelsstr. 30
> 52074 Aachen
> Germany
> Tel.: +49 (0)241 / 8089977
> Fax.: +49 (0)241 / 8082401
> E-Mail: thilo.kellermann at rwth-aachen.de
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>