[R] polychoric correlation: issue with coefficient sign
Stas Kolenikov
skolenik at gmail.com
Wed Jan 14 05:39:51 CET 2009
The original Olsson's paper
(http://www.citeulike.org/user/ctacmo/article/553309) did mention that
the greatest biases and numeric problems were encountered when the two
variables had opposite skewness. Your example is even more extreme:
tetrachoric and polychoric correlations do not like zero counts. It
actually means that your data sit on a straight line, but that line
does not pass through the intersection of the thresholds. The nominal
estimate of the correlation should be 1, and what you see should be
insignificantly different from 1. No wonder you get LAPACK errors: at
some point, you had to invert matrix( c(1,1,1,1), 2, 2) or compute its
determinant in the ML computations. My own Stata implementation of
polychoric correlation choked on your data and stopped with an
error... which I should've handled more gracefully :)). The data with
0.5 added produced the same correlation estimate but different
standard errors.
John Fox offered all other feasible explanations, like handling of
missing data in the pairwise and full data set computations. But with
unstable computations you can end just anywhere on the range of
estimates; the standard errors should tell you that your estimate is
quite imprecise.
On 1/12/09, Dorothee <ddurpoix at gmail.com> wrote:
>
> Hello,
>
> I am running polychoric correlations on a dataset composed of 12 ordinal and
> binary variables (N =384), using the polycor package.
> One of the association (between 2 dichotomous variables) is very high using
> the 2-step estimate (0.933 when polychoric run only between the two
> variables; but 0.801 when polychoric run on the 12 variables). The same
> correlation run with ML estimate returns a singularity message.
>
> First, I would like to know why the estimations between only the two
> dichotomous variables and with all the variables at once (with the 2-step
> estimate) returns slightly different results.
>
> Secondly, when i checked back the distribution of these two dichotomous
> variables they appear about symmetrically opposed. Therefore, one should
> indeed expect a strong association between them, but a negative one, isn't
> it? Why does the polychoric correlation returns a positive coefficient? What
> does it mean for the rest of the coefficients, should i trust them?
>
> I have to say I'm new to R and not very strong in statistics, I hope I
> haven't posted a stupid question...
>
--
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
More information about the R-help
mailing list