[R] Generating uniformly distributed correlated data.

Mon Feb 21 16:19:00 CET 2011

one simple idea is to generate correlated normals (vector multivariate normal),
and then use the cumulative distribution function F_i of component i such:
F_i(X_i), which is uniform.

Kjetil

(this will not preserve tha value of the correlation coefficient, so
you must experiment)

On Mon, Feb 21, 2011 at 7:30 PM, Mike Marchywka <marchywka at hotmail.com> wrote:
>
>
>
>
>
>
> ----------------------------------------
>> Date: Mon, 21 Feb 2011 15:53:26 +0100
>> From: erich.neuwirth at univie.ac.at
>> To: marchywka at hotmail.com
>> CC: soren.faurby at biology.au.dk; r-help at r-project.org
>> Subject: Re: [R] Generating uniformly distributed correlated data.
>>
>> We want to generate a distribution on the unit square with the following
>> properties
>> * It is concentrated on a "reasonable" subset of the square,
>> and the restricted distribution is uniform on this subset.
>> * Both marginal distributions are uniform on the unit interval.
>> * All horizontal and all vertical cross sections are sets of lines
>> segments with the same total length
>>
>> If we find a geometric figure with these properties, we have solved the
>> problem.
>>
>> So we define the distribution to be uniform on the following area:
>> (it is distorted but should give the idea)
>>
>> x***/-----------------/***x
>> |**/-----------------/****|
>> |*/-----------------/*****|
>> |/-----------------/******|
>> |-----------------/******/|
>> |----------------/******/-|
>> |---------------/******/--|
>> |--------------/******/---|
>> |-------------/******/----|
>> |------------/******/-----|
>> |-----------/******/------|
>> |----------/******/-------|
>> |---------/******/--------|
>> |--------/******/---------|
>> |-------/******/----------|
>> |------/******/-----------|
>> |-----/******/------------|
>> |----/******/-------------|
>> |---/******/--------------|
>> |--/******/---------------|
>> |-/******/----------------|
>> |/******/-----------------|
>> |******/-----------------/|
>> |*****/-----------------/*|
>> |****/-----------------/**|
>> x***/-----------------/***x
>>
>> There is the same number of stars in each horizontal row and each
>> vertical column.
>>
>>
>> So we define
>> g(x1,x2)= 1 abs(x1-x2) <= a or
>> abs(x1-x2+1) <= a or
>> abs(x1-x2-1) <= a
>> 0 elsewhere
>>
>> The total area of the shape is 2*a.
>> The admissible range for a is <0,1/2>
>> therefore
>> f(x1,x2)=g(x1,x2)/(2*a)
>> is a density functions.
>> This is where simple algebra comes in.
>> This distribution has
>> expected value 1/2 and variance 1/12 for both margins
>> (uniform distribution), and it has
>> covariance = (1-3*a+2*a2)/12
>> and correlation = 1 - 3*a + 2*a2
>>
>> The inverse function of 1 - 3*2 + 2*a2 is
>> (3-sqrt(1+8*r))/4
>>
>> Therefore we can compute that our distribution with
>> a=(3-sqrt(1+8*r))/4
>> will produce a given r.
>>
>>
>> Ho do we create random numbers from this distribution?
>> By using conditional densities.
>> x1 is sampled from the uniform distribution, and for a give x1
>> we produce x2 by a uniform distribution on the along the vertical cross
>> cut of the geometrical shape (which is either 1 or 2 intervals).
>> And which is most easily implemented by using the modulo operator %%.
>>
>> This mechanism is NOT a convolution. Applying module after the addition
>> makes it a nonconvolution. Adding independent random variables
>> without doing anything further is a convolution, by applying a trimming
>> operation, the convolution property gets lost.
>>
>>
> The thing inside the mod allows convolution, as I mentioned the effect of
> the mod is to move back the pieces that fall outside the desired range
> and they happen to restore the uniform distribution. I thought my
> explanation was simple and easy after the fact but not sure
> it would have motivated the original design too well.
>
>
>
>>
>>
>>
>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>