# [R] Tetrachoric correlation in R vs. stata

Janet Rosenbaum jrosenba at rand.org
Fri Jun 23 23:33:31 CEST 2006

```Peter --- Thanks for pointing out the omitted information.  The hazards
of attempting to be brief.

In R, I am using polychor(vec1, vec2, std.err=T) and have used both the
ML and 2 step estimates, which give virtually identical answers.  I am
explicitly using only the 632 complete cases in R to make sure missing
data is handled the same way as in stata.

Here's my data:

522	54
34	22

> polychor(v1, v2, std.err=T, ML=T)

Polychoric Correlation, ML est. = 0.5172 (0.08048)
Test of bivariate normality: Chisquare = 8.063e-06, df = 0, p = NaN

Row Thresholds
Threshold Std.Err.
1     1.349  0.07042

Column Thresholds
Threshold Std.Err.
1     1.174  0.06458
Warning message:
NaNs produced in: pchisq(q, df, lower.tail, log.p)

In stata, I get:

. tetrachoric t1_v19a ct1_ix17

Tetrachoric correlations (N=632)

----------------------------------
Variable |  t1_v19a  ct1_ix17
-------------+--------------------
t1_v19a |        1
ct1_ix17 |    .6169         1
----------------------------------

Thanks for your help.

Janet

Peter Dalgaard wrote:
> Janet Rosenbaum <jrosenba at rand.org> writes:
>
>> I hope someone here knows the answer to this since it will save me from
>> delving deep into documentation.
>>
>> Based on 22 pairs of vectors, I have noticed that tetrachoric
>> correlation coefficients in stata are almost uniformly higher than those
>> in R, sometimes dramatically so (TCC=.61 in stata, .51 in R;  .51 in
>> stata, .39 in R).  Stata's estimate is higher than R's in 20 out of 22
>> computations, although the estimates always fall within the 95% CI for
>> the TCC calculated by R.
>>
>> Do stata and R calculate TCC in dramatically different ways?  Is the
>> handling of missing data perhaps different?  Any thoughts?
>>
>> Btw, I am sending this question only to the R-help list.
>
>
> A bit more information seems necessary:
>
> - tetrachoric correlations depend on 4 numbers, so you should be able
>   to give a direct example
>
> - you're not telling us how you calculate the TCC in R. This is not
>   obvious (package polycor?).
>

--------------------

This email message is for the sole use of the intended recip...{{dropped}}

```