# [R] Re: point-biserial correlation

Bernd Weiss weiss at wiso-r610.wiso.uni-koeln.de
Mon Mar 31 19:23:20 CEST 2003

```On 31 Mar 2003 at 15:07, Noel Yvonnick wrote:

[...]

> Note that the point-biserial correlation is nothing but the standard
> correlation coefficient when one of the variables is dichotomous, so
> that cor(.) is OK.

Yes, this is a misleading subject.

> The biserial is different and includes a correction for the so-called
> "point of dichotomy". The following should work (translating a formula
> found in a psychometric manual) :
>

[...]

>   # Biserial correlation
>   # Be cautious in interpreting the sign :
>   # depends upon the ordering of levels(x)
>   ((m-m)/Sy)*(f*f/dnorm(f-.5))
>

Thanks a lot for your help. Your code inspired me to do some modifications.

(1) Following a German statistic book (Bortz, Jürgen, 1993: Statistik. Heidelberg:
Springer) I use the following term "dnorm(qnorm(f))" instead of "dnorm(f-.5)".

(2) I added some code for handling NA's.

(3) Finaly, it is now possible to do some significance test for rbis.

Bernd

# Modification of Noel Yvonnick function for computing biserial correlations
# x.na: 0/1 variable
# y.na: continuous variable
cor.biserial = function(x.na,y.na)
{
x <- x[!is.na(y.na) & !is.na(x.na)]
y <- y[!is.na(y.na) & !is.na(x.na)]

stopifnot(is.factor(x))
stopifnot(length(levels(x))==2)
stopifnot(length(x)==length(y))

N = length(y)

# Success / Failure frequencies
n <- table(x)
f = table(x)/length(x)

# Means of success/failure groups on the global score
m = tapply(y,x,mean)

# Variance of the global score
Sy = sqrt(var(y)*(N-1)/N)

# Biserial correlation
# Be cautious in interpreting the sign :
# depends upon the ordering of levels(x)
rbis <- ((m-m)/Sy)*(f*f/dnorm(qnorm(f)))

# significance test for rbis
rhobis <- sqrt(n*n)/(dnorm(qnorm(f))*N*sqrt(N))
z <- rbis/rhobis
alpha <- ifelse(z<0,pnorm(z),1-pnorm(z))

return(rbis,rhobis,z,alpha,N)
}

```