# [R] Tetrachoric correlation in R vs. stata

Peter Dalgaard p.dalgaard at biostat.ku.dk
Sat Jun 24 00:30:03 CEST 2006

```Janet Rosenbaum <jrosenba at rand.org> writes:

> Peter --- Thanks for pointing out the omitted information.  The
> hazards of attempting to be brief.
>
> In R, I am using polychor(vec1, vec2, std.err=T) and have used both
> the ML and 2 step estimates, which give virtually identical answers.
> I am explicitly using only the 632 complete cases in R to make sure
> missing data is handled the same way as in stata.
>
> Here's my data:
>
> 522	54
> 34	22
>
> > polychor(v1, v2, std.err=T, ML=T)
>
> Polychoric Correlation, ML est. = 0.5172 (0.08048)
> Test of bivariate normality: Chisquare = 8.063e-06, df = 0, p = NaN
>
>     Row Thresholds
>     Threshold Std.Err.
>   1     1.349  0.07042
>
>
>     Column Thresholds
>     Threshold Std.Err.
>   1     1.174  0.06458
>   Warning message:
>   NaNs produced in: pchisq(q, df, lower.tail, log.p)
>
> In stata, I get:
>
> . tetrachoric t1_v19a ct1_ix17
>
> Tetrachoric correlations (N=632)
>
> ----------------------------------
>      Variable |  t1_v19a  ct1_ix17
> -------------+--------------------
>       t1_v19a |        1
>      ct1_ix17 |    .6169         1
> ----------------------------------

Well,

> pmvnorm(c(1.349,1.174),c(Inf,Inf),
+    sigma=matrix(c(1,.5172,.5172,1),2))*632
[1] 22.00511
attr(,"error")
[1] 1e-15
attr(,"msg")
[1] "Normal Completion"
> pnorm(1.349)*632
[1] 575.9615
> pnorm(1.174)*632
[1] 556.0352

so the estimates from R appear to be consistent with the table. In
contrast, plugging in the .6169 from Stata gives

> pmvnorm(c(1.349,1.174),c(Inf,Inf),
+     sigma=matrix(c(1,.6169,.6169,1),2))*632
[1] 26.34487
...

You might want to follow up on

http://www.ats.ucla.edu/stat/stata/faq/tetrac.htm

>
> Janet
>
>
>
> Peter Dalgaard wrote:
> > Janet Rosenbaum <jrosenba at rand.org> writes:
> >
> >> I hope someone here knows the answer to this since it will save me
> >> from delving deep into documentation.
> >>
> >> Based on 22 pairs of vectors, I have noticed that tetrachoric
> >> correlation coefficients in stata are almost uniformly higher than
> >> those in R, sometimes dramatically so (TCC=.61 in stata, .51 in R;
> >> .51 in stata, .39 in R).  Stata's estimate is higher than R's in 20
> >> out of 22 computations, although the estimates always fall within
> >> the 95% CI for the TCC calculated by R.
> >>
> >> Do stata and R calculate TCC in dramatically different ways?  Is
> >> the handling of missing data perhaps different?  Any thoughts?
> >>
> >> Btw, I am sending this question only to the R-help list.
> > - tetrachoric correlations depend on 4 numbers, so you should be able
> >   to give a direct example
> > - you're not telling us how you calculate the TCC in R. This is not
> >   obvious (package polycor?).
> >
>
>
> --------------------
>
> This email message is for the sole use of the intended rec...{{dropped}}

```