[R] regression towards the mean, AS paper November 2007

Rolf Turner r.turner at auckland.ac.nz
Mon Dec 17 20:24:53 CET 2007


On 18/12/2007, at 7:32 AM, Duncan Murdoch wrote:

> On 12/17/2007 1:21 PM, Troels Ring wrote:
>> Dear friends, regression towards the mean is interesting in medical
>> circles, and a very recent paper (The American Statistician November
>> 2007;61:302-307 by Krause and Pinheiro) treats it at length. An  
>> initial
>> example specifies (p 303):
>> "Consider the following example: we draw 100 samples from a bivariate
>> Normal distribution with X0~N(0,1), X1~N(0,1) and cov(X0,X1)=0.7, We
>> then calculate the p value for the null hypothesis that the means  
>> of X0
>> and X1 are equal, using a paired Student's t test. The procedure is
>> repeated 1000 times, producing 1000 simulated p values. Because X0  
>> and
>> X1 have identical marginal distributions, the simulated p values  
>> behave
>> like independent Uniform(0,1) random variables." This I did not
>> understand, and simulating like shown below produced far from uniform
>> (0,1) p values - but I fail to see how it is wrong. I contacted the
>> authors of the paper but they did not answer. So, please, doesn´t the
>> code below specify a bivariate N(0,1) with covariance 0.7? I get p
>> values = 1 all over - not interesting, but how wrong?
>> Best wishes
>> Troels
>>
>> library(MASS)
>> Sigma <- matrix(c(1,0.7,0.7,1),2,2)
>> Sigma
>> res <- NULL
>> for (i in 1:1000){
>> ff <-(mvrnorm(n=100, rep(0, 2), Sigma, empirical = TRUE))
>> res[i] <- t.test(ff[,1],ff[,2],paired=TRUE)$p.value}
>
> Specifying empirical=TRUE means that your sampled values are not
> independent, the means are guaranteed to match exactly, and the mean
> difference is exactly zero.  Thus all of the t statistics are exactly
> zero, and the p-values are exactly 1.
>
> Set empirical=FALSE (the default), and you'll see more reasonable  
> results.

	This has nothing to do really with the question that Troels asked,
	but the exposition quoted from the AA paper is unnecessarily confusing.
	The phrase ``Because X0 and X1 have identical marginal  
distributions ...''
	throws the reader off the track.  The identical marginal distributions
	are irrelevant.  All one needs is that the ***means*** of X0 and X1
	be the same, and then the null hypothesis tested by a paired t-test
	is true and so the p-values are (asymptotically) Uniform[0,1].  With
	a sample size of 100, the ``asymptotically'' bit can be safely ignored
	for any ``decent'' joint distribution of X0 and X1.  If one further
	assumes that X0 - X1 is Gaussian (which has nothing to do with X0 and
	X1 having identical marginal distributions) then ``asymptotically''
	turns into ``exactly''.

				cheers,

					Rolf Turner
######################################################################
Attention: 
This e-mail message is privileged and confidential. If you are not the 
intended recipient please delete the message and notify the sender. 
Any views or opinions presented are solely those of the author.

This e-mail has been scanned and cleared by MailMarshal 
www.marshalsoftware.com
######################################################################



More information about the R-help mailing list