[R] Tetrachoric and polychoric ceofficients (for sem) - any tips?

John Fox jfox at mcmaster.ca
Tue Nov 30 03:25:34 CET 2004


Dear Michael,

I had some time this evening, so I programmed the two-step procedure
described in the article that I mentioned. This isn't the ML estimate of the
correlation, but apparently it performs reasonably well and it is quite
fast. I checked the function on some examples and it appears to work
correctly. I'd be curious to see how the this function compares with the one
that you received.

A more complete solution would take a data frame and, as appropriate,
calculate polychoric, polyserial, and product-moment correlations among its
columns. I don't think that writing a polyserial-correlation function would
be any more difficult. Perhaps I'll add this to the sem package; I hesitate
to do so because the resulting standard errors and likelihoods from sem()
won't be right.

I'm taking the liberty of copying this reply to the r-help list, since the
question was originally raised there. I hope that's OK.

Regards,
 John 


-------------- snip ------------

polychor <- function (x, y){
# x: a contingency table of counts or an ordered categorical variable
# y: if x is a variable, a second ordered categorical variable
# returns the polychoric correlation for the table
#  or between x and y, using the two-step approximation
#  to the ML estimate described in F. Drasgow, "Polychoric and
#  polyserial correlations," in S. Kotz and N. Johnson, eds.
#  The Encyclopedia of Statistics, Volume 7, New York: Wiley,
#  1986.
  f <- function(rho) {
    P <- matrix(0, r, c)
    R <- matrix(c(1, rho, rho, 1), 2, 2)
    for (i in 1:r){
      for (j in 1:c){
        P[i,j] <- pmvnorm(lower=c(row.cuts[i], col.cuts[j]),
                          upper=c(row.cuts[i+1], col.cuts[j+1]),
                          corr=R)
        }
      }
     - sum(tab * log(P))
    }
  tab <- if (missing(y)) x else table(x, y)
  r <- nrow(tab)
  c <- ncol(tab)
  n <- sum(tab)
  row.cuts <- c(-Inf, qnorm(cumsum(rowSums(tab))/n))
  col.cuts <- c(-Inf, qnorm(cumsum(colSums(tab))/n))
  optimise(f, interval=c(0, 1))$minimum
  }


> -----Original Message-----
> From: Michael Dewey [mailto:m.dewey at iop.kcl.ac.uk] 
> Sent: Monday, November 29, 2004 1:38 PM
> To: John Fox
> Subject: RE: [R] Tetrachoric and polychoric ceofficients (for 
> sem) - any tips?
> 
> At 18:07 28/11/04, you wrote:
> >Dear Michael,
> >
> >I'm not aware of pre-existing R code for tetrachoric or polychoric 
> >correlations. I may at some point incorporate such functions 
> into the 
> >sem package but I don't have concrete plans for doing so. On 
> the other 
> >hand, I don't think that it would be very hard to do so. (A 
> discussion, 
> >references, and an example are in Kotz and Johnson, eds., 
> Encyclopedia 
> >of Statistics, Vol 7.)
> 
> Someone has kindly emailed me some code for the polychoric 
> case (which he says is quite slow) I will try it out, it may 
> be better to treat the 2 by 2 case separately. If it is OK I 
> plan to write an equivalent to cor for matrices of 
> tetrachoric/polychoric. It will not happen any time soon now, 
> but if I do manage t would you be interested in it for sem? 
> As you say there remains the problem that they are estimates 
> and introduce further uncertainties.
> 
> 
> >Tetrachoric and polychoric correlations are estimates, and in 
> >subsequently estimating a SEM from these, one should take 
> that into account.
> 
> Michael Dewey
> m.dewey at iop.kcl.ac.uk
>




More information about the R-help mailing list