[R] Generating correlated data from uniform distribution
(Ted Harding)
Ted.Harding at nessie.mcc.ac.uk
Sat Jul 2 13:22:19 CEST 2005
On 02-Jul-05 Peter Dalgaard wrote:
> "Jim Brennan" <jfbrennan at rogers.com> writes:
>
>> OK now I am skeptical especially when you say in a weird way:-)
>> This may be OK but look at plot(x,y) and I am suspicious. Is it still
>> alright with this kind of relationship?
> ...
>> N <- 10000
>> rho <- .6
>> x <- runif(N, -.5,.5)
>> y <- x * sample(c(1,-1), N, replace=T, prob=c((1+rho)/2,(1-rho)/2))
>
> Well, the covariance is (everything has mean zero, of course)
>
> E(XY) = (1+rho)/2*EX^2 + (1-rho)/2*E(X*-X) = rho*EX^2
>
> The marginal distribution of Y is a mixture of two identical uniforms
> (X and -X) so is uniform and in particular has the same variance as X.
>
> In summary, EXY/sqrt(EX^2EY^2) == rho
>
> So as I said, it satisfies the formal requirements. X and Y are
> uniformly distributed and their correlation is rho.
>
> If for nothing else, I suppose that this example is good for
> demonstrating that independence and uncorrelatedness is not the same
> thing.
That was a nice sneaky solution! I was toying with something similar,
but less sneaky, until I saw Peter's, on the lines of
x<-runif(2N, -0.5,0.5); ix<-(N-k):(N+k); y<-x; y[ix]<-(-y[ix])
(which makes the same point about independence and correlation).
The larger k as a fraction of N, the more you swing from rho = 1
to rho = -1, but you cannot achieve, as Peter did, an arbitrary
correlation coefficient rho since the value depends on k which
can only take discrete values.
Another approach which leads to a less "special" joint distribution
is
x<-sort(runif(N, -0.5,0.5)); y<-sort(runif(N, -0.5,0.5))
followed by a rho-dependent permutation of y. I'm still pondering
a way of choosing the permutation so as to get a desired rho.
The extremes are the identity, which for a given sample will
give as close as you can get to rho = +1, and reversal, which
gives as close as you can get to rho = -1.
However, the maximum theoretical rho which you can get (as opposed
to what is possible for particular samples, which may get arbitrarily
close to +1) depends on N. For instance, with N=3, it looks as
though the theoretical rho is about 0.9 with the "identity"
permutation (for N=1000, however, just about all samples give
rho > 0.99).
I smell a source of interesting exam questions ...
Over to you!
Best wishes,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 02-Jul-05 Time: 12:22:09
------------------------------ XFMail ------------------------------
More information about the R-help
mailing list