[R] Re: point-biserial correlation
Bernd Weiss
weiss at wiso-r610.wiso.uni-koeln.de
Mon Mar 31 19:23:20 CEST 2003
On 31 Mar 2003 at 15:07, Noel Yvonnick wrote:
[...]
> Note that the point-biserial correlation is nothing but the standard
> correlation coefficient when one of the variables is dichotomous, so
> that cor(.) is OK.
Yes, this is a misleading subject.
> The biserial is different and includes a correction for the so-called
> "point of dichotomy". The following should work (translating a formula
> found in a psychometric manual) :
>
[...]
> # Biserial correlation
> # Be cautious in interpreting the sign :
> # depends upon the ordering of levels(x)
> ((m[1]-m[2])/Sy)*(f[1]*f[2]/dnorm(f[1]-.5))
>
Thanks a lot for your help. Your code inspired me to do some modifications.
(1) Following a German statistic book (Bortz, Jürgen, 1993: Statistik. Heidelberg:
Springer) I use the following term "dnorm(qnorm(f[1]))" instead of "dnorm(f[1]-.5)".
(2) I added some code for handling NA's.
(3) Finaly, it is now possible to do some significance test for rbis.
Bernd
# Modification of Noel Yvonnick function for computing biserial correlations
# x.na: 0/1 variable
# y.na: continuous variable
cor.biserial = function(x.na,y.na)
{
x <- x[!is.na(y.na) & !is.na(x.na)]
y <- y[!is.na(y.na) & !is.na(x.na)]
stopifnot(is.factor(x))
stopifnot(length(levels(x))==2)
stopifnot(length(x)==length(y))
N = length(y)
# Success / Failure frequencies
n <- table(x)
f = table(x)/length(x)
# Means of success/failure groups on the global score
m = tapply(y,x,mean)
# Variance of the global score
Sy = sqrt(var(y)*(N-1)/N)
# Biserial correlation
# Be cautious in interpreting the sign :
# depends upon the ordering of levels(x)
rbis <- ((m[1]-m[2])/Sy)*(f[1]*f[2]/dnorm(qnorm(f[1])))
# significance test for rbis
rhobis <- sqrt(n[1]*n[2])/(dnorm(qnorm(f[1]))*N*sqrt(N))
z <- rbis/rhobis
alpha <- ifelse(z<0,pnorm(z),1-pnorm(z))
return(rbis,rhobis,z,alpha,N)
}
More information about the R-help
mailing list