[R] Generating a binomial random variable correlated with a
(Ted Harding)
Ted.Harding at nessie.mcc.ac.uk
Sun Apr 17 15:10:43 CEST 2005
On 17-Apr-05 Ted Harding wrote:
> [...]
> So I'd suggest experimenting on the following lines.
>
> 1. Let X1 be a sample of size N using rbinom(N,1,p)
> (where, in general, p need not be 0.5)
>
> 2. Let Y be a sample of size N using rnorm(N,mu,sigma)
> (and again, in general, mu need not be 0 nor sigma 1).
>
> This is as in your example. So far, you have a true
> Binomial variate X1 and a true Normal variate Y.
>
> 3. X1 will have r 1s in it, and (N-r) 0s. Now consider
> setting up a correspondence between the 1s and 0s in
> X1 and the values in Y, in such a way that the 1s
> tend to get allocated to the higher values of Y and
> the 0s to lower values of Y. The resulting set of
> pairs (X1,Y) is then your correlated sample.
>
> In other words, permute the sample X1 and cbind the
> result to Y.
> [...]
> Here is an example of a possible approach:
>
> X0 <- rbinom(100,1,0.5)
> Y <- rnorm(100) ; Y<-sort(Y)
> p0<-((1:100)-0.5)/100 ; p0<-p0/sum(p0); p<-cumsum(p0)
> r<-sum(X0); ix<-sample((1:100),r,replace=FALSE,prob=p)
> X1<-numeric(100); X1[ix]<-1;sum(X1)
> ## [1] 50
> cor(X1,Y)
> ## [1] 0.6150937
Here is a follow-up (somewhat inspired by a remark in Bill's
suggestions). Consider the function
test<-function(C,N){
R<-numeric(N); corrs<-numeric(N)
for(i in (1:N)){
X0 <- rbinom(100,1,0.5) ; r<-sum(X0)
Y <- rnorm(100) ; Y<-sort(Y)
p0<-((1:100)-0.5)/100 ; L<-log(p0/(1-p0));
p<-exp(C*L);p<-p/(1+p);
ix<-sample((1:100),r,replace=FALSE,prob=p)
X1<-numeric(100); X1[ix]<-1;
R[i]<-r ; corrs[i]<-cor(X1,Y)
}
list(R=R,corrs=corrs)
}
By experimenting with different values of C, you can see what
you get. For instance:
mean(test(0.1,1000)$corrs)
## [1] 0.05958913
mean(test(0.5,1000)$corrs)
## [1] 0.2722242
mean(test(1.0,1000)$corrs)
## [1] 0.4347375
and finally (in my trials):
mean(test(5.2,1000)$corrs)
## [1] 0.702623
mean(test(5.22,10000)$corrs)
## [1] 0.6998803
and that might just be close enough to your desired 0.7 ...!
So now you have a true Binomial sample, an associated true
Normal sample, and a Pearson correlation of 0.7 between their
values.
Now of course comes the question which has been intriguing
us all: What is the purpose of achieving a given Pearson
correlation between a Normal variate and a binary variate?
Best wishes,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 17-Apr-05 Time: 14:10:43
------------------------------ XFMail ------------------------------
More information about the R-help
mailing list