[R] Generating correlated data from uniform distribution

Greg Snow greg.snow at ihc.com
Tue Jul 5 18:34:39 CEST 2005


Here is an approach using 'optim' and simulated annealing:

x <- sort(runif(1000))
y <- sort(runif(1000))

ord <- 1:1000
target <- function(ord){ ( cor(x, y[ord]) - 0.6 ) ^2 }
new.point <- function(ord){
	tmp <- sample(length(ord), 2)
	ord[tmp] <- ord[rev(tmp)]
	ord
}

new.point2 <- function(ord){
	tmp <- sample(length(ord) -100, 1)
	tmp2 <- sample(100, 1)
	ord[ c(tmp, tmp+tmp2) ] <- ord[ c(tmp+tmp2, tmp) ]
	ord
}

res <- optim(ord, target, new.point, method="SANN",
	control = list(maxit=6000, temp=2000, trace=TRUE))

res2 <- optim(ord, target, new.point2, method="SANN",
	control = list(maxit=60000, temp=200, trace=TRUE))

y <- y[res$par]

par(mfrow=c(2,2))
hist(x)
hist(y)
plot(x,y)
cor(x,y)


y <- sort(y)[res2$par]

par(mfrow=c(2,2))
hist(x)
hist(y)
plot(x,y)
cor(x,y)

Hope this helps,

Greg Snow, Ph.D.
Statistical Data Center, LDS Hospital
Intermountain Health Care
greg.snow at ihc.com
(801) 408-8111

>>> "Jim Brennan" <jfbrennan at rogers.com> 07/01/05 05:25PM >>>
OK now I am skeptical especially when you say in a weird way:-)
This may be OK but look at plot(x,y) and I am suspicious. Is it still
alright with this kind of relationship?

For large N it appears Spencer's method is returning slightly lower
correlation for the uniforms as compared to the normals so maybe there is a
problem!?!

Hope we are all learning something and Menghui gets/has what he wants . :-)

-----Original Message-----
From: pd at pubhealth.ku.dk [mailto:pd at pubhealth.ku.dk] On Behalf Of Peter
Dalgaard
Sent: July 1, 2005 6:59 PM
To: Jim Brennan
Cc: 'Tony Plate'; 'Menghui Chen'; r-help at stat.math.ethz.ch 
Subject: Re: [R] Generating correlated data from uniform distribution

"Jim Brennan" <jfbrennan at rogers.com> writes:

> Yes you are right I guess this works only for normal data. Free advice
> sometimes comes with too little consideration :-)

Worth every cent...

> Sorry about that and thanks to Spencer for the correct way.

Hmm, but is it? Or rather, what is the relation between the
correlation of the normals  and that of the transformed variables? 
Looks nontrivial to me.

Incidentally, here's a way that satisfies the criteria, but in a
rather weird way:

N <- 10000
rho <- .6
x <- runif(N, -.5,.5)
y <- x * sample(c(1,-1), N, replace=T, prob=c((1+rho)/2,(1-rho)/2))

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html




More information about the R-help mailing list