# [R-sig-ME] P value value for a large number of degree of freedom in lmer

Ben Bolker bbolker at gmail.com
Tue Nov 23 21:09:50 CET 2010

```  What may be getting lost in this discussion is that in simple cases
(normally distributed data, balanced orthogonal designs etc. etc.) there
is at least a coherent *definition* of the p-value, which would involve
the upper (or extreme/two-sided) tail of the t distribution with the
appropriate degrees of freedom or an F distribution with equivalent
denominator df. [I was going to write that in this case it would be
g-1=31 degrees of freedom, but I actually realized (I think) that would
only be the df for determining an effect in the intercept or in other
factors that varied across groups.  Here I think you have something like
199,999 df left to make inferences on the slope, because it is assumed
*not* to vary across groups ...] The best thing to do (IMO) if you want
to use these approaches is to look at a classical textbook, find the
appropriate calculation for your design, and apply the df to the test
statistic by hand. (You can also try the problem in lme and see what it
guesses, although it may guess wrong.) In many, many cases (unbalanced,
GLMMs, crossed designs, R-side correlation structures, ...) a
universally accepted definition does not exist.
I do agree with the previous posters that you should think hard about
what (if anything) the p-values mean in this case, though.  Suppose your
t-statistic was 100.  Does the difference between p=exp(-184) and
p=exp(-10011) mean anything?

> 2*pnorm(-abs(100),log.p=TRUE)
[1] -10011.05
> 2*pt(-abs(100),df=31,log.p=TRUE)
[1] -184.448

On 10-11-23 02:45 PM, Joshua Wiley wrote:
> Dear Arnaud,
>
> Having a large amount of data *is* exactly what increases confidence
> in results.  A p-value is the probability of obtaining your results
> given the null hypothesis is true *in the population*.  If you have a
> lot of data, you have a lot of the population, and can more
> confidently say "this is what the population is or is note like".  The
> p-value is serving its purpose exactly as it was meant to, there is no
> need to "correct" or "alter" it.  The real question is, does anyone
> care about your effect?  Effect sizes are often a good way to get at
> the idea of is the effect meaningful, does it have practical
> significance, could an average person notice the difference?
>
> Cheers,
>
> Josh
>
> On Tue, Nov 23, 2010 at 11:25 AM, Arnaud Mosnier <a.mosnier at gmail.com> wrote:
>> I agree but how to test that a significant result is not due to the amount
>> of data but by a real effect.
>> I though about subsetting my dataset and rerun the model X time to see if
>> the result still persist ... but you can also say that doing so I will
>> achieve to find a (small enough) size of subset at which I will not detect
>> the effect :-)
>> I also agree that the term "bias" was not correctly used ... but is there a
>> method to increase the confidence in those results ?
>>
>> cheers,
>>
>> Arnaud
>>
>> 2010/11/23 Rolf Turner <r.turner at auckland.ac.nz>
>>
>>>
>>> It is well known amongst statisticians that having a large enough data set
>>> will
>>> result in the rejection of *any* null hypothesis, i.e. will result in a
>>> small
>>> p-value.  There is no ``bias'' involved.
>>>
>>>        cheers,
>>>
>>>                Rolf Turner
>>>
>>> On 24/11/2010, at 4:06 AM, Arnaud Mosnier wrote:
>>>
>>>> Dear UseRs,
>>>>
>>>> I am using a database containing nearly 200 000 observations occurring in
>>> 33
>>>> groups.
>>>> With a model of the form ( y ~ x + (1|group) ) in lmer, my number of
>>> degree
>>>> of freedom is really large.
>>>> I am wondering if this large df have an impact on the p values, mainly if
>>>> this could conduct to consider the effect of a variable as significant
>>> while
>>>> it is not .
>>>> ... and if it is the case, does it exist a correction to apply on the
>>>> results to take into account that bias.
>>>>
>>>> thanks !
>>>>
>>>>       [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>>
>>
>>        [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
>
>

```