[R] polychoric correlation: issue with coefficient sign

Dorothee ddurpoix at gmail.com
Wed Jan 14 02:49:28 CET 2009


Thank you so much for all your answers! And sorry for being scarce on the
details.
My dataset has 12 variables (6 ordinal coded from 1 to 5, and 6 binary) and
384 cases without missing value. High values mean 'positive' attitude toward
the object of study.

I probably went too fast in my earlier impression that the variables'
distribution were almost symmetrically opposed. I got confused by the high
frequency of the combination (0, 1), sorry. Here is the crosstab:

Observed counts
   x2
x1    0   1
  0  23   0
  1 334  27

Expected counts
   x2
x1          0         1
  0  21.38281  1.617188
  1 335.61719 25.382812

The actual counts for (0, 0) and (1, 1) being slightly above the expected
counts, I can now understand the positive correlation.  But does the high
polychoric correlation make sense when the variables are so skewed and the
difference between the actual and expected counts of the crosstab is so
small?

Regarding the difference of correlation coefficient between x1 and x2 with
polychor and hetcor:
I used 'hetcor' (polycor package) with 2-step and ML estimations on the
whole dataset. The data were first declared as 'factor' otherwise hetcor
would just compute Pearson correlations.

hc = hetcor(thedata,ML=F, std.err=F)
(correlation x1x2) 0.8013

hc = hetcor(thedata,ML=T, std.err=T)
"Error in solve.default(result$hessian) : 
  Lapack routine dgesv: system is exactly singular".

Using polychor with the 2-step, and ML estimates:

polychor(x1,x2, ML=F, std.err=F) 
[1] 0.9330044

polychor(x1,x2, ML=T, std.err=T) 
"Error in solve.default(result$hessian) : 
  Lapack routine dgesv: system is exactly singular".


Murray, you mentioned that the correlation between my two variables could be
affected by other variables, hence the difference between polychor (on only
two variables) and hetcor (on all the variables).  
I run polychor and hector on created variables (correlated and not
correlated). Although I thought that the heterogeneous correlations were run
only within each pair of variables (therefore, not being affected by other
variables), a third variable correlated with x1 and x2 does slightly affect
the correlation between x1 and x2. Thanks for this suggestion. I need to
look better into the computation of polychoric correlations…!

-- 
View this message in context: http://www.nabble.com/polychoric-correlation%3A-issue-with-coefficient-sign-tp21425977p21448444.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list