[R] Generation of correlated variables

Petr Savicky savicky at cs.cas.cz
Fri Mar 16 08:33:33 CET 2012


On Thu, Mar 15, 2012 at 11:23:28PM -0000, Ted Harding wrote:
> On 15-Mar-2012 Filoche wrote:
> > Hi everyone.
> > 
> > Based on a dependent variable (y), I'm trying to generate some
> > independent variables with a specified correlation. For this
> > there's no problems.
> > However, I would like that have all my "regressors" to be
> > orthogonal (i.e. no correlation among them).
> > 
> > For example, 
> > 
> > y = x1 + x2 + x3 where the correlation between y x1 = 0.7,
> > x2 = 0.4 and x3 = 0.8.  However, x1, x2 and x3 should not be
> > correlated to each other.
> > 
> > Anyone can help me?
> > 
> > Regards,
> > Phil
> 
> Your fundamental problem here (with the correlations you specify)
> is the following.
> 
> Your desired correlation matrix can be constructed by
> 
>   C <- cbind( c(1.0,0.7,0.4,0.8),c(0.7,1.0,0.0,0.0),
>               c(0.4,0.0,1.0,0.0),c(0.8,0.0,0.0,1.0) )
>   rownames(C) <- c("y","x1","x2","x3")
>   colnames(C) <- c("y","x1","x2","x3")
> 
>   C
>   #      y  x1  x2  x3
>   # y  1.0 0.7 0.4 0.8
>   # x1 0.7 1.0 0.0 0.0
>   # x2 0.4 0.0 1.0 0.0
>   # x3 0.8 0.0 0.0 1.0
> 
> And now:
> 
>   det(C)
>   # [1] -0.29
> 
> and it is impossible for the determinant of a correlation
> matrix to have a negative determinant: a correlation matyrix
> must be positive-semidefinite, and therefore have a non-negative
> determinant.
> 
> An alternative check is to look at the eigen-structure of C:
> 
>   eigen(C)
>   # $values
>   # [1]  2.1357817  1.0000000  1.0000000 -0.1357817
>   # 
>   # $vectors
>   #           [,1]          [,2]       [,3]       [,4]
>   # [1,] 0.7071068  0.000000e+00  0.0000000  0.7071068
>   # [2,] 0.4358010 -1.172802e-16  0.7874992 -0.4358010
>   # [3,] 0.2490291 -8.944272e-01 -0.2756247 -0.2490291
>   # [4,] 0.4980582  4.472136e-01 -0.5512495 -0.4980582
> 
> so one of the eigenvalues (-0.1357817) is negative, again
> impossible for a correlation matrix.

Thank you for this analysis. For general correlations,
say, s1, s2, s3, the matrix is

           y  x1  x2  x3

     y     1  s1  s2  s3  
     x1   s1   1   0   0
     x2   s2   0   1   0
     x3   s3   0   0   1

and its determinant is 1 - s1^2 - s2^2 - s3^2. Since there
was also a requirement that y = x1 + x2 + x3, the correlation
matrix should be singular. Hence, the required correlation
structure implies s1^2 + s2^2 + s3^2 = 1.

If this condition is satisfied, then a multivariate
distribution obtained by multiplying a vector from 
three-dimensional N(0, I) by the matrix

  (s1   s2   s3)
  (s1    0    0)
  ( 0   s2    0)    
  ( 0    0   s3)

has the required correlation structure.

However, this is still not a solution of the original question,
since the original requirement was to find x1, x2, x3, when y is
given. I do not know, whether a solution for an arbitrary y exists,
even if the above condition on the correlations is satisfied.

Petr Savicky.



More information about the R-help mailing list