[R] Generation of correlated variables
Petr Savicky
savicky at cs.cas.cz
Fri Mar 16 08:33:33 CET 2012
On Thu, Mar 15, 2012 at 11:23:28PM -0000, Ted Harding wrote:
> On 15-Mar-2012 Filoche wrote:
> > Hi everyone.
> >
> > Based on a dependent variable (y), I'm trying to generate some
> > independent variables with a specified correlation. For this
> > there's no problems.
> > However, I would like that have all my "regressors" to be
> > orthogonal (i.e. no correlation among them).
> >
> > For example,
> >
> > y = x1 + x2 + x3 where the correlation between y x1 = 0.7,
> > x2 = 0.4 and x3 = 0.8. However, x1, x2 and x3 should not be
> > correlated to each other.
> >
> > Anyone can help me?
> >
> > Regards,
> > Phil
>
> Your fundamental problem here (with the correlations you specify)
> is the following.
>
> Your desired correlation matrix can be constructed by
>
> C <- cbind( c(1.0,0.7,0.4,0.8),c(0.7,1.0,0.0,0.0),
> c(0.4,0.0,1.0,0.0),c(0.8,0.0,0.0,1.0) )
> rownames(C) <- c("y","x1","x2","x3")
> colnames(C) <- c("y","x1","x2","x3")
>
> C
> # y x1 x2 x3
> # y 1.0 0.7 0.4 0.8
> # x1 0.7 1.0 0.0 0.0
> # x2 0.4 0.0 1.0 0.0
> # x3 0.8 0.0 0.0 1.0
>
> And now:
>
> det(C)
> # [1] -0.29
>
> and it is impossible for the determinant of a correlation
> matrix to have a negative determinant: a correlation matyrix
> must be positive-semidefinite, and therefore have a non-negative
> determinant.
>
> An alternative check is to look at the eigen-structure of C:
>
> eigen(C)
> # $values
> # [1] 2.1357817 1.0000000 1.0000000 -0.1357817
> #
> # $vectors
> # [,1] [,2] [,3] [,4]
> # [1,] 0.7071068 0.000000e+00 0.0000000 0.7071068
> # [2,] 0.4358010 -1.172802e-16 0.7874992 -0.4358010
> # [3,] 0.2490291 -8.944272e-01 -0.2756247 -0.2490291
> # [4,] 0.4980582 4.472136e-01 -0.5512495 -0.4980582
>
> so one of the eigenvalues (-0.1357817) is negative, again
> impossible for a correlation matrix.
Thank you for this analysis. For general correlations,
say, s1, s2, s3, the matrix is
y x1 x2 x3
y 1 s1 s2 s3
x1 s1 1 0 0
x2 s2 0 1 0
x3 s3 0 0 1
and its determinant is 1 - s1^2 - s2^2 - s3^2. Since there
was also a requirement that y = x1 + x2 + x3, the correlation
matrix should be singular. Hence, the required correlation
structure implies s1^2 + s2^2 + s3^2 = 1.
If this condition is satisfied, then a multivariate
distribution obtained by multiplying a vector from
three-dimensional N(0, I) by the matrix
(s1 s2 s3)
(s1 0 0)
( 0 s2 0)
( 0 0 s3)
has the required correlation structure.
However, this is still not a solution of the original question,
since the original requirement was to find x1, x2, x3, when y is
given. I do not know, whether a solution for an arbitrary y exists,
even if the above condition on the correlations is satisfied.
Petr Savicky.
More information about the R-help
mailing list