[R] polychoric correlation: issue with coefficient sign

John Fox jfox at mcmaster.ca
Wed Jan 14 03:40:16 CET 2009


Dear Dorothee,

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
> Behalf Of Dorothee
> Sent: January-13-09 8:49 PM
> To: r-help at r-project.org
> Subject: Re: [R] polychoric correlation: issue with coefficient sign
> 
> 
> Thank you so much for all your answers! And sorry for being scarce on the
> details.
> My dataset has 12 variables (6 ordinal coded from 1 to 5, and 6 binary) and
> 384 cases without missing value. High values mean 'positive' attitude toward
> the object of study.
> 
> I probably went too fast in my earlier impression that the variables'
> distribution were almost symmetrically opposed. I got confused by the high
> frequency of the combination (0, 1), sorry. Here is the crosstab:
> 
> Observed counts
>    x2
> x1    0   1
>   0  23   0
>   1 334  27
> 
> Expected counts
>    x2
> x1          0         1
>   0  21.38281  1.617188
>   1 335.61719 25.382812
> 
> The actual counts for (0, 0) and (1, 1) being slightly above the expected
> counts, I can now understand the positive correlation.  But does the high
> polychoric correlation make sense when the variables are so skewed and the
> difference between the actual and expected counts of the crosstab is so
> small?

This problem is very ill-conditioned (i.e., there is little information in the data to estimate the thresholds and correlation), and the standard error of the correlation can't be estimated either by the 2-step approach or ML. In fact, if you add 0.5 to each cell to get rid of the sampling 0, you get very different results:

> tab <- matrix(c(23, 334, 0, 27), 2, 2)
> tab
     [,1] [,2]
[1,]   23    0
[2,]  334   27

> polychor(tab + 0.5, ML=TRUE, std.err=TRUE)

Polychoric Correlation, ML est. = 0.2629 (0.2281)

  Row Threshold
  Threshold Std.Err.
     -1.537   0.1003


  Column Threshold
  Threshold Std.Err.
      1.457  0.09567
> polychor(tab + 0.5, std.err=TRUE)

Polychoric Correlation, 2-step est. = 0.2629 (0.2279)


>
> Regarding the difference of correlation coefficient between x1 and x2 with
> polychor and hetcor:
> I used 'hetcor' (polycor package) with 2-step and ML estimations on the
> whole dataset. The data were first declared as 'factor' otherwise hetcor
> would just compute Pearson correlations.
> 
> hc = hetcor(thedata,ML=F, std.err=F)
> (correlation x1x2) 0.8013
> 
> hc = hetcor(thedata,ML=T, std.err=T)
> "Error in solve.default(result$hessian) :
>   Lapack routine dgesv: system is exactly singular".
> 
> Using polychor with the 2-step, and ML estimates:
> 
> polychor(x1,x2, ML=F, std.err=F)
> [1] 0.9330044
> 
> polychor(x1,x2, ML=T, std.err=T)
> "Error in solve.default(result$hessian) :
>   Lapack routine dgesv: system is exactly singular".
> 
> 
> Murray, you mentioned that the correlation between my two variables could be
> affected by other variables, hence the difference between polychor (on only
> two variables) and hetcor (on all the variables).

That's not correct. The computations are done pairwise (although by default, as I explained in my previous message, only complete observations are used).

> I run polychor and hector on created variables (correlated and not
> correlated). Although I thought that the heterogeneous correlations were run
> only within each pair of variables (therefore, not being affected by other
> variables), a third variable correlated with x1 and x2 does slightly affect
> the correlation between x1 and x2. Thanks for this suggestion. I need to
> look better into the computation of polychoric correlations…!

There are two possible sources of difference, but again since I don't have the data, I can't check. (1) As I mentioned before, if there are missing data, then the subset of cases used by hetcor() and polycor() can differ. (2) As stated in ?hetcor, by default hetcor() coerces the returned correlation matrix to be positive-definite; you can set pd=FALSE in the call to hetcor() to turn this off.

Regards,
 John

> 
> --
> View this message in context: http://www.nabble.com/polychoric-
> correlation%3A-issue-with-coefficient-sign-tp21425977p21448444.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list