[R] Generating uniformly distributed correlated data.
Erich Neuwirth
erich.neuwirth at univie.ac.at
Mon Feb 21 15:53:26 CET 2011
We want to generate a distribution on the unit square with the following
properties
* It is concentrated on a "reasonable" subset of the square,
and the restricted distribution is uniform on this subset.
* Both marginal distributions are uniform on the unit interval.
* All horizontal and all vertical cross sections are sets of lines
segments with the same total length
If we find a geometric figure with these properties, we have solved the
problem.
So we define the distribution to be uniform on the following area:
(it is distorted but should give the idea)
x***/-----------------/***x
|**/-----------------/****|
|*/-----------------/*****|
|/-----------------/******|
|-----------------/******/|
|----------------/******/-|
|---------------/******/--|
|--------------/******/---|
|-------------/******/----|
|------------/******/-----|
|-----------/******/------|
|----------/******/-------|
|---------/******/--------|
|--------/******/---------|
|-------/******/----------|
|------/******/-----------|
|-----/******/------------|
|----/******/-------------|
|---/******/--------------|
|--/******/---------------|
|-/******/----------------|
|/******/-----------------|
|******/-----------------/|
|*****/-----------------/*|
|****/-----------------/**|
x***/-----------------/***x
There is the same number of stars in each horizontal row and each
vertical column.
So we define
g(x1,x2)= 1 abs(x1-x2) <= a or
abs(x1-x2+1) <= a or
abs(x1-x2-1) <= a
0 elsewhere
The total area of the shape is 2*a.
The admissible range for a is <0,1/2>
therefore
f(x1,x2)=g(x1,x2)/(2*a)
is a density functions.
This is where simple algebra comes in.
This distribution has
expected value 1/2 and variance 1/12 for both margins
(uniform distribution), and it has
covariance = (1-3*a+2*a2)/12
and correlation = 1 - 3*a + 2*a2
The inverse function of 1 - 3*2 + 2*a2 is
(3-sqrt(1+8*r))/4
Therefore we can compute that our distribution with
a=(3-sqrt(1+8*r))/4
will produce a given r.
Ho do we create random numbers from this distribution?
By using conditional densities.
x1 is sampled from the uniform distribution, and for a give x1
we produce x2 by a uniform distribution on the along the vertical cross
cut of the geometrical shape (which is either 1 or 2 intervals).
And which is most easily implemented by using the modulo operator %%.
This mechanism is NOT a convolution. Applying module after the addition
makes it a nonconvolution. Adding independent random variables
without doing anything further is a convolution, by applying a trimming
operation, the convolution property gets lost.
More information about the R-help
mailing list