[R] Generating uniformly distributed correlated data.

Erich Neuwirth erich.neuwirth at univie.ac.at
Mon Feb 21 15:53:26 CET 2011


We want to generate a distribution on the unit square with the following
properties
* It is concentrated on a "reasonable" subset of the square,
  and the restricted distribution is uniform on this subset.
* Both marginal distributions are uniform on the unit interval.
* All horizontal and all vertical cross sections are sets of lines
  segments with the same total length

If we find a geometric figure with these properties, we have solved the
problem.

So we define the distribution to be uniform on the following area:
(it is distorted but should give the idea)

x***/-----------------/***x
|**/-----------------/****|
|*/-----------------/*****|
|/-----------------/******|
|-----------------/******/|
|----------------/******/-|
|---------------/******/--|
|--------------/******/---|
|-------------/******/----|
|------------/******/-----|
|-----------/******/------|
|----------/******/-------|
|---------/******/--------|
|--------/******/---------|
|-------/******/----------|
|------/******/-----------|
|-----/******/------------|
|----/******/-------------|
|---/******/--------------|
|--/******/---------------|
|-/******/----------------|
|/******/-----------------|
|******/-----------------/|
|*****/-----------------/*|
|****/-----------------/**|
x***/-----------------/***x

There is the same number of stars in each horizontal row and each
vertical column.


So we define
g(x1,x2)= 1 abs(x1-x2) <= a or
            abs(x1-x2+1) <= a or
            abs(x1-x2-1) <= a
          0 elsewhere

The total area of the shape is 2*a.
The admissible range for a is <0,1/2>
therefore
f(x1,x2)=g(x1,x2)/(2*a)
is a density functions.
This is where simple algebra comes in.
This distribution has
expected value 1/2 and variance 1/12 for both margins
(uniform distribution), and it has
covariance = (1-3*a+2*a2)/12
and correlation = 1 - 3*a + 2*a2

The inverse function of 1 - 3*2 + 2*a2 is
(3-sqrt(1+8*r))/4

Therefore we can compute that our distribution with
a=(3-sqrt(1+8*r))/4
will produce a given r.


Ho do we create random numbers from this distribution?
By using conditional densities.
x1 is sampled from the uniform distribution, and for a give x1
we produce x2 by a uniform distribution on the along the vertical cross
cut of the geometrical shape (which is either 1 or 2 intervals).
And which is most easily implemented by using the modulo operator %%.

This mechanism is NOT a convolution. Applying module after the addition
makes it a nonconvolution. Adding independent random variables
without doing anything further is a convolution, by applying a trimming
operation, the convolution property gets lost.



More information about the R-help mailing list