# [R-meta] Calculating variances and z transformation for tetrachoric, biserial correlations?

Viechtbauer Wolfgang (SP) wolfgang.viechtbauer at maastrichtuniversity.nl
Sun Jul 2 23:30:50 CEST 2017

```Let me address the computations first (that's the easy part).

Tetrachoric correlation: For tetrachoric correlations, escalc() computes the MLE (requires an iterative routine -- optim() is used for that). The sampling variance is estimated based on the inverse of the Hessian evaluated at the MLE. There is no closed form solution for that.

Biserial correlation (from *t*- or *F*-statistic): You can use a trick here if you still want to use escalc(). If you know t (or t = sqrt(F)), then just use escalc(measure="RBIS", m1i=t*sqrt(2)/sqrt(n), m2i=0, sd1i=1, sd2i=1, n1i=n, n2i=n), where n is the size of the groups (not the total sample size). For example, using the example from Jacobs & Viechtbauer (2017):

escalc(measure="RBIS", m1i=1.68*sqrt(2)/sqrt(10), m2i=0, sd1i=1, sd2i=1, n1i=10, n2i=10)

yields yi = 0.4614 and vi = 0.0570, exactly as in the example. You used equation (13) to compute the sampling variances, which is the approximate equation. escalc() uses the 'exact' one (equation 12). That way, you are also consistent with what you get for the case of "Biserial correlation (from *M *and *SD*)".

Biserial correlation (from *M *and *SD*): As mentioned above, escalc() uses equation (12) from Jacobs & Viechtbauer (2017) to compute/estimate the sampling variance.

Square-root of eta-squared: You cannot use the large-sample variance of a regular correlation coefficient for this. The right thing to do is to compute a polyserial correlation coefficient here (the extension of the biserial to more than two groups). You can do this using the polycor package. Technically, the polyserial() function from that package requires you to input the raw data, which you don't have. If you have the means and SDs, you can just simulate raw data with exactly those means and SDs and use that as input to polyserial(). The means and SDs are sufficient statistics here, so you should always get the same result regardless of what specific values are simulated. Here is an example:

x1 <- scale(rnorm(10)) * 2.4 + 10.4
x2 <- scale(rnorm(10)) * 2.8 + 11.2
x3 <- scale(rnorm(10)) * 2.1 + 11.5

x <- c(x1, x2, x3)
y <- rep(1:3,each=10)

polyserial(x, y, ML=TRUE, std.err=T, control=list(reltol=1e12))

If you run this over and over, you will (should) always get the same polyserial correlation coefficient of 0.2127. The standard error is ~0.195, but it changes very slightly from run to run due to minor numerical differences in the optimization routine. Note that I increased the convergence tolerance a bit to avoid that those numerical issues also affect the estimate itself. But these minor differences are essentially inconsequential anyway.

If you do not have the means and SDs, then well, don't know what to do off the top of my head. But again, don't treat the converted value as if it was a correlation coefficient. It is not.

Now for your question what/how to combine:

The various coefficients (Pearson product-moment correlation coefficients, biserial correlations, polyserial correlations, tetrachoric correlations) are directly comparable, at least in principle (assuming that the underlying assumptions hold -- e.g., bivariate normality for the observed/latent variables). I just saw that James also posted an answer and he raises an important issue about the theoretical comparability of the various coefficients, esp. when they arise from different sampling designs. I very much agree that this needs to be considered. You could take a pragmatic / empirical approach though by coding the type of coefficient / design from which the coefficient arose and examine empirically whether there are any systematic differences (i.e., via a meta-regression analysis) between the types.

As James also points out, you can use Fisher's r-to-z transformation on all of these coefficients, but to be absolutely clear: Only for Pearson product-moment correlation coefficients is the variance then approximately 1/(n-3). I have seen many cases where people converted all kinds of statistics to 'correlations', then applied Fisher's r-to-z transformation, and then used 1/(n-3) as the variance, which is just flat out wrong in most cases. Various books on meta-analysis even make such faulty suggestions.

Also, Fisher's r-to-z transformation will *only* be a variance stabilizing transformation for Pearson product-moment correlation coefficients (e.g., the actual variance stabilizing transformation for biserial correlation coefficients is given by equation 17 in Jacobs & Viechtbauer, 2017 -- and even that is just an approximation, since it is based on Soper's approximate formula). If you apply Fisher's r-to-z transformation to other types of coefficients, you have to use the right sampling variance (see James' mail). Also note: You cannot mix different transformations (i.e., use Fisher's r-to-z transformation for all).

Whether applying Fisher's r-to-z transformation to other coefficients (other than 'regular' correlation coefficients) is actually advantageous is debatable. Again, you do not get the nice variance stabilizing properties here (the transformation may still have some normalizing properties). If I remember correctly, James examined this in his 2014 paper, at least for biserial correlations (James, please correct me if I misremember).

Best,
Wolfgang

--
Wolfgang Viechtbauer, Ph.D., Statistician | Department of Psychiatry and
Neuropsychology | Maastricht University | P.O. Box 616 (VIJV1) | 6200 MD
Maastricht, The Netherlands | +31 (43) 388-4170 | http://www.wvbauer.com

>-----Original Message-----
>From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-
>project.org] On Behalf Of Mark White
>Sent: Sunday, July 02, 2017 20:05
>To: r-sig-meta-analysis at r-project.org
>Subject: [R-meta] Calculating variances and z transformation for
>tetrachoric, biserial correlations?
>
>Hello,
>
>I have converted a number of summary statistics (contingency tables, *t-*
>and
>*F*-statistics,* M*s and *SD*s) to tetrachoric and biserial correlations.
>The other effect sizes that I directly observed were raw correlations. I
>have my model all set up to run, but I am unsure as to what to do about
>these effect sizes. I see two options:
>
>1. Submit raw, tetrachoric, and biserial correlations and their variances
>to analyses directly (what I have now).
>
>2. Do Fisher's r-to-z transformation and *then *submit those to analyses.
>The problem here is: How do I convert tetrachoric and biserial
>correlations
>to Fisher's z? And if I do that, can I just use N to calculate the
>variance? Or, do I have to also convert the variances of tetrachoric and
>biserial correlations?
>
>In either case, I am not sure how `metafor::escalc` calculates variances
>for tetrachoric (`RTET`) and biserial (`RBIS`) correlations. I tried
>looking through the code for `metafor::escalc` on GitHub, but could not
>figure out the calculations.
>
>I have included a table describing my effect sizes and how I calculated
>them/their variances below.
>
>What do you all think would be the best way to handle these data?
>
>*Effect size*
>
>*k*
>
>*Effect size calculation*
>
>*Variance calculation*
>
>Raw correlation
>
>217
>
>Directly observed
>
>Typical large-samples estimation (see Hedges, 1989, Equation 5), using
>`metafor::escalc`
>
>Tetrachoric correlation
>
>12
>
>From 2 x 2 contingency tables, using `metafor::escalc`
>
>From 2 x 2 contingency tables, using `metafor::escalc`
>
>
>*Unsure what the formula is*
>
>Biserial correlation (from *t*- or *F*-statistic)
>
>8
>
>From *t*- or *F*-statistic to point-biserial correlation (using
>`compute.es::tes`
>and `compute.es::fes`) to biserial correlation (self-written function
>based
>on Jacobs & Viechtbauer, 2016, assuming *n*s equal across conditions)
>
>From *n*, using self-written function based on Soper's method (Jacobs &
>Viechtbauer, 2016, Equation 13, assuming *n*s equal across conditions)
>
>Biserial correlation (from *M *and *SD*)
>
>2
>
>From means and standard deviations directly, using `metafor::escalc`
>
>From means and standard deviations directly, using `metafor::escalc`
>
>
>*Unsure what the formula is*
>
>Square-root of eta-squared
>
>1
>
>F-statistic to Cohen’s *f*  (Cohen, 1988) to eta-squared to square-root of
>eta-squared as an approximation of a raw correlation coefficient (Lakens,
>2013), using self-written function
>
>*This was a one-way ANOVA with three means (low, medium, high prejudice).*
>
>Typical large-samples estimation (see Hedges, 1989, Equation 5), using
>`metafor::escalc`
>
>Best,
>Mark
```