[R] P values

Fri May 7 19:31:50 CEST 2010

Robert A LaBudde wrote:
> At 07:10 AM 5/7/2010, Duncan Murdoch wrote:
>   
>> Robert A LaBudde wrote:
>>     
>>> At 01:40 PM 5/6/2010, Joris Meys wrote:
>>>
>>>       
>>>> On Thu, May 6, 2010 at 6:09 PM, Greg Snow <Greg.Snow at imail.org> wrote:
>>>>
>>>>
>>>>         
>>>>> Because if you use the sample standard deviation then it is a t test not a
>>>>> z test.
>>>>>
>>>>>
>>>>>           
>>>> I'm doubting that seriously...
>>>>
>>>> You calculate normalized Z-values by substracting the sample mean and
>>>> dividing by the sample sd. So Thomas is correct. It becomes a Z-test since
>>>> you compare these normalized Z-values with the Z distribution, instead of
>>>> the (more appropriate) T-distribution. The T-distribution is essentially a
>>>> Z-distribution that is corrected for the finite sample size. In Asymptopia,
>>>> the Z and T distribution are identical.
>>>>
>>>>         
>>> And it is only in Utopia that any P-value less than 0.01 actually 
>>> corresponds to reality.
>>>
>>>
>>>       
>> I'm not sure what you mean by this.  P-values are simply statistics 
>> calculated from the data; why wouldn't they be real if they are small?
>>     
>
> Do you truly believe an actual real-life distribution accurately is 
> fit by a normal distribution at quantiles of 0.001, 0.0001 or beyond?
>   

Not often, but I don't see how that is relevant.  I would normally 
conclude that a P-value of 0.01, 0.001, or especially 0.0001 didn't come 
from the null distribution. 
My model for the null distribution and the distribution that actually 
generated the data and the P-value differ by *a lot*, not just a little 
bit.  (This is somewhat obvious with samples that aren't too large. With 
really large samples, "a lot" may need to be interpreted carefully.)
> "The map is not the territory", and just because you can calculate 
> something from a model doesn't mean it's true.
>
> The real world is composed of mixture distributions, not pure ones.
>
> The P-value may be real, but its reality is subordinate to the 
> distributional assumption involved, which always fails at some level. 
> I'm simply asserting that level is in the tails at probabilities of 
> 0.01 or less.
>
> Statisticians, even eminent ones such as yourself and lesser lights 
> such as myself, frequently fail to keep this in mind. We accept such 
> assumptions as "normality", "equal variances", etc., on an 
> "eyeballometric" basis, without any quantitative understanding of 
> what this means about limitations on inference, including P-values.
>
> Inference in statistics is much cruder and more judgmental than we 
> like to portray. We should at least be honest among ourselves about 
> the degree to which our hand-waving assumptions work.
>   

I think I agree with you that I would have a hard time arguing against a 
test based on a slightly different null distribution, and that test 
would likely give a P-value quite different from the one I calculated 
based on my assumption.  But my conclusion would be the same:  P < 
0.0001 means there's likely something wrong with the assumptions in the 
null distribution.
> I remember at the O. J. Simpson trial, the DNA expert asserted that a 
> match would occur only once in 7 billion people. I wondered at the 
> time how you could evaluate such an assertion, given there were less 
> than 7 billion people on earth at the time.
>   

So that's clear evidence that the null model he was using was not the 
truth.  It would have been just as clear if he'd said 1 in a million, or 
1 in a trillion.
> When I was at a conference on optical disk memories when they were 
> being developed, I heard a talk about validating disk specifications 
> against production. One statement was that the company would also 
> validate the "undetectable error rate" specification of 1 in 10^16 
> bits. I amusingly asked how they planned to validate the 
> "undetectable" error rate. The response was handwaving and "Just as 
> we do everything else". The audience laughed, and the speaker didn't 
> seem to know what the joke was.
>   

That's not a p-value, that's a probability of an error, which is quite a 
different thing.  There the number does matter, an error of 1 in 10^6 is 
quite different from an error of 1 in 10^16.

Duncan Murdoch

> In both these cases the values were calculable, but that didn't mean 
> that they applied to reality.
>
> ================================================================
> Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
> Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
> 824 Timberlake Drive                     Tel: 757-467-0954
> Virginia Beach, VA 23464-3239            Fax: 757-467-2947
>
> "Vere scire est per causas scire"
> ================================================================
>
>