[R] Random Normal Variable Correlated to an Existing Binomial Variable
Petr Savicky
savicky at praha1.ff.cuni.cz
Mon Apr 25 11:58:43 CEST 2011
On Sun, Apr 24, 2011 at 07:00:26PM -0400, Shane Phillips wrote:
> Hi, R-Helpers!
>
> I have a dataframe that contains a binomial variable. I need to add another random variable drawn from a normal distribution with a specific mean and standard deviation. This variable also needs to be correlated with the existing binomial variable with a specific correlation (say .75). Any ideas?
Hi.
If X, Y are dependent random variables and we want to generate y, so
that (x, y) is a pair from their joint distribution with known x,
then y should be generated from the conditional distribution P(Y|X=x).
If the probability P(X=x) is not too small, then this may be done by
rejection sampling: Generate pairs (X, Y) until the condition X=x is
satisfied and use the corresponding Y.
It remains to generate pairs (X, Y), where Y is a normal variable
and X a binomial one. The parameters of Y are known, the parameters
of X should be chosen somehow and the correlation of X and Y is
known. I suggest the following. Compute the distribution of X as a
vector of probabilities p_0, ..., p_n (see ?dbinom). Find a nondecreasing
function f() from reals to {0, .., n} such that f(Y) has distribution
p_0, ..., p_n. The function may be determined by a sequence of
cutpoints a_1, ..., a_n defining f(y) as follows
y f(y)
(-infty, a_1) 0
[a_1, a_2) 1
...
[a_n, infty) n
For each i, the cutpoint a_i is the (p_0 + ... + p_{i-1})-quantile of Y
(see ?qnorm). See ?cut for computing f().
The pair (f(Y), Y) has the required marginal distributions and, in my
opinion, the maximal possible correlation. If this correlation is lower
than the requested one, then i think there is no solution.
If the correlation of (f(Y), Y) is at least the required one, then use
a mixture of the distribution (f(Y), Y) and (X, Y), where X has the
required marginal distribution of X, but is generated independently
from Y. The mixture parameter may be determined as a solution of an
equation with one variable.
Hope this helps.
Petr Savicky.
More information about the R-help
mailing list