# [R] Goodman-Kruskal tau

Antti Arppe aarppe at ling.helsinki.fi
Tue Aug 7 12:09:38 CEST 2007

```On Wed, 1 Aug 2007, Upasna Sharma <upasna at iitb.ac.in> wrote:
> From: "Upasna Sharma" <upasna at iitb.ac.in>
> Subject: [R] Goodman Kruskal's tau
>
> I need to know which package in R calculates the Goodman Kruskal's
> tau statistic for nominal data. Also is there any implementation for
> multiple classification analysis (Andrews at al 1973) in R? Any
> information on this would be greatly appreciated.

As far as I know, the Goodman-Kruskal tau has not yet been implemented
in any package, but I've coded it as a function for my own research
purposes following Liebetrau 1983, see the code attached below (save
it and load it into R with the source-command. You simply provide the
function with a matrix 'x' with the nominal data, and get the values
of the G-K tau as well as the significance values and asymptotic
standard errors, calculated in both directions, i.e. GK.tau(dat=x) or
GK.tau(x), e.g.

> source("GK.tau.R")
> x
[,1] [,2] [,3]
[1,]   30   10    1
[2,]    2   20    5
[3,]    1   10   15
> GK.tau(x)
tau.RC    tau.CR     p.tau.RC     p.tau.CR  var.tau.RC ASE.tau.RC
1 0.3522439 0.3219675 2.002024e-13 3.065436e-12 0.004584542 0.06770924
ASE.tau.CR
1 0.07004566

GK.tau <- function(dat)
{ N <- sum(dat);
dat.rows <- nrow(dat);
dat.cols <- ncol(dat);
max.col <- sum.col <- L.col <- matrix(,dat.cols);
max.row <- sum.row <- L.row <- matrix(,dat.rows);
for(i in 1:dat.cols)
{ sum.col[i] <- sum(dat[,i]); max.col[i] <- max(dat[,i]); }
for(i in 1:dat.rows)
{ sum.row[i] <- sum(dat[i,]); max.row[i] <- max(dat[i,]); }
max.row.margin <- max(apply(dat,1,sum));
max.col.margin <- max(apply(dat,2,sum));

# Goodman-Kruskal tau
# Tau Column|Row
n.err.unconditional <- N^2;
for(i in 1:dat.rows)
n.err.unconditional <- n.err.unconditional-N*sum(dat[i,]^2/sum.row[i]);
n.err.conditional <- N^2-sum(sum.col^2);
tau.CR <- 1-(n.err.unconditional/n.err.conditional);
v <- n.err.unconditional/(N^2);
d <- n.err.conditional/(N^2);
f <- d*(v+1)-2*v;
var.tau.CR <- 0;
for(i in 1:dat.rows)
for(j in 1:dat.cols)
var.tau.CR <- var.tau.CR + dat[i,j]*(-2*v*(sum.col[j]/N)+d*((2*dat[i,j]/sum.row[i])-sum((dat[i,]/sum.row[i])^2))-f)^2/(N^2*d^4);
ASE.tau.CR <- sqrt(var.tau.CR);
z.tau.CR <- tau.CR/ASE.tau.CR;
U.tau.CR <- (N-1)*(dat.cols-1)*tau.CR; # Chi-squared approximation for H0 according to Margolin & Light 1974, see also Liebetrau 1983
p.tau.CR <- pchisq(U.tau.CR,df=(dat.rows-1)*(dat.cols-1),lower=FALSE);
# Tau Row|Column
n.err.unconditional <- N^2;
for(j in 1:dat.cols)
n.err.unconditional <- n.err.unconditional-N*sum(dat[,j]^2/sum.col[j]);
n.err.conditional <- N^2-sum(sum.row^2);
tau.RC <- 1-(n.err.unconditional/n.err.conditional);
v <- n.err.unconditional/(N^2);
d <- n.err.conditional/(N^2);
f <- d*(v+1)-2*v;
var.tau.RC <- 0;
for(i in 1:dat.rows)
for(j in 1:dat.cols)
var.tau.RC <- var.tau.RC + dat[i,j]*(-2*v*(sum.row[i]/N)+d*((2*dat[i,j]/sum.col[j])-sum((dat[,j]/sum.col[j])^2))-f)^2/(N^2*d^4);
ASE.tau.RC <- sqrt(var.tau.RC);
z.tau.RC <- tau.CR/ASE.tau.RC;
U.tau.RC <- (N-1)*(dat.rows-1)*tau.RC; # Chi-squared approximation for H0 according to Margolin & Light 1974, see also Liebetrau 1983
p.tau.RC <- pchisq(U.tau.RC,df=(dat.rows-1)*(dat.cols-1),lower=FALSE);
results <- data.frame(tau.RC, tau.CR, p.tau.RC, p.tau.CR, var.tau.RC, ASE.tau.RC, ASE.tau.CR);
return(results);
}

Hope this helps.

-Antti
--
======================================================================
Antti Arppe - Master of Science (Engineering)
Researcher & doctoral student (Linguistics)
E-mail: antti.arppe at helsinki.fi
WWW: http://www.ling.helsinki.fi/~aarppe

```