[R-sig-ME] P value value for a large number of degree of freedom in lmer

Tue Nov 23 23:09:31 CET 2010

I think maybe the focus here should be on the substantive interpretation of the actual model parameters. Let's assume for the moment some effect has a significant p-value: the key then is to actually understand what the corresponding coefficient tells the analyst about the strength of that effect on the outcome. In some cases it may be trivial in practical importance, even if it is statistically distinguishable from zero (i.e., has a significant p-value). Unless you understand the units of measurement and associated with each variable and you have some external reference for what constitutes an effect size that actually matters, you can't properly interpret the results. That reference must be informed by subject-matter knowledge about the phenomenon being studied and the context in which it occurs. 

For example, imagine that we have an effect showing us that, on average, 2 different groups of physicians have annual incomes that differ by $10/year and the corresponding effect is statistically significant (because we have a HUGE sample size). I should hope that the analyst would look at the size of that difference and recognize that it is absolutely trivial when compared to the actual average incomes for both kinds of physicians (perhaps on the order of $150,000/year). Heck, $10 is barely enough to buy 2 or 3 cups of coffee in places like Starbuck's. However if the difference was $10,000/year, then it might be worth saying it has some real practical significance in addition to statistical significance. 

Steven J. Pierce, Ph.D. 
Associate Director 
Center for Statistical Training & Consulting (CSTAT) 
Michigan State University 
178 Giltner Hall 
East Lansing, MI 48824 
E-mail: pierces1 at msu.edu 
Web: http://www.cstat.msu.edu 

-----Original Message-----
From: Ben Bolker [mailto:bbolker at gmail.com] 
Sent: Tuesday, November 23, 2010 3:10 PM
To: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] P value value for a large number of degree of freedom in lmer

  What may be getting lost in this discussion is that in simple cases
(normally distributed data, balanced orthogonal designs etc. etc.) there
is at least a coherent *definition* of the p-value, which would involve
the upper (or extreme/two-sided) tail of the t distribution with the
appropriate degrees of freedom or an F distribution with equivalent
denominator df. [I was going to write that in this case it would be
g-1=31 degrees of freedom, but I actually realized (I think) that would
only be the df for determining an effect in the intercept or in other
factors that varied across groups.  Here I think you have something like
199,999 df left to make inferences on the slope, because it is assumed
*not* to vary across groups ...] The best thing to do (IMO) if you want
to use these approaches is to look at a classical textbook, find the
appropriate calculation for your design, and apply the df to the test
statistic by hand. (You can also try the problem in lme and see what it
guesses, although it may guess wrong.) In many, many cases (unbalanced,
GLMMs, crossed designs, R-side correlation structures, ...) a
universally accepted definition does not exist.
  I do agree with the previous posters that you should think hard about
what (if anything) the p-values mean in this case, though.  Suppose your
t-statistic was 100.  Does the difference between p=exp(-184) and
p=exp(-10011) mean anything?

> 2*pnorm(-abs(100),log.p=TRUE)
[1] -10011.05
> 2*pt(-abs(100),df=31,log.p=TRUE)
[1] -184.448

On 10-11-23 02:45 PM, Joshua Wiley wrote:
> Dear Arnaud,
> 
> Having a large amount of data *is* exactly what increases confidence
> in results.  A p-value is the probability of obtaining your results
> given the null hypothesis is true *in the population*.  If you have a
> lot of data, you have a lot of the population, and can more
> confidently say "this is what the population is or is note like".  The
> p-value is serving its purpose exactly as it was meant to, there is no
> need to "correct" or "alter" it.  The real question is, does anyone
> care about your effect?  Effect sizes are often a good way to get at
> the idea of is the effect meaningful, does it have practical
> significance, could an average person notice the difference?
> 
> Cheers,
> 
> Josh
> 
> On Tue, Nov 23, 2010 at 11:25 AM, Arnaud Mosnier <a.mosnier at gmail.com> wrote:
>> I agree but how to test that a significant result is not due to the amount
>> of data but by a real effect.
>> I though about subsetting my dataset and rerun the model X time to see if
>> the result still persist ... but you can also say that doing so I will
>> achieve to find a (small enough) size of subset at which I will not detect
>> the effect :-)
>> I also agree that the term "bias" was not correctly used ... but is there a
>> method to increase the confidence in those results ?
>>
>> cheers,
>>
>> Arnaud
>>
>> 2010/11/23 Rolf Turner <r.turner at auckland.ac.nz>
>>
>>>
>>> It is well known amongst statisticians that having a large enough data set
>>> will
>>> result in the rejection of *any* null hypothesis, i.e. will result in a
>>> small
>>> p-value.  There is no ``bias'' involved.
>>>
>>>        cheers,
>>>
>>>                Rolf Turner
>>>
>>> On 24/11/2010, at 4:06 AM, Arnaud Mosnier wrote:
>>>
>>>> Dear UseRs,
>>>>
>>>> I am using a database containing nearly 200 000 observations occurring in
>>> 33
>>>> groups.
>>>> With a model of the form ( y ~ x + (1|group) ) in lmer, my number of
>>> degree
>>>> of freedom is really large.
>>>> I am wondering if this large df have an impact on the p values, mainly if
>>>> this could conduct to consider the effect of a variable as significant
>>> while
>>>> it is not .
>>>> ... and if it is the case, does it exist a correction to apply on the
>>>> results to take into account that bias.
>>>>
>>>> thanks !
>>>>
>>>>       [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>>
>>
>>        [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
> 
> 
>