[R] Convert Contingency Table to Flat File
Marc Schwartz
MSchwartz at mn.rr.com
Wed Oct 18 04:50:54 CEST 2006
Just a quick update on this thread.
The version of expand.dft() that I posted earlier has a bug in it.
This is the result of the use of lapply() and the evaluation of the
additional arguments passed to type.convert().
I noted this when testing the function on the UCBAdmissions data set,
which is a multi-way table used in some help file examples such
as ?as.data.frame.table.
Here is a corrected version:
expand.dft <- function(x, na.strings = "NA", as.is = FALSE, dec = ".")
{
DF <- sapply(1:nrow(x), function(i) x[rep(i, each = x$Freq[i]), ],
simplify = FALSE)
DF <- subset(do.call("rbind", DF), select = -Freq)
for (i in 1:ncol(DF))
{
DF[[i]] <- type.convert(as.character(DF[[i]]),
na.strings = na.strings,
as.is = as.is, dec = dec)
}
DF
}
Thus if we now take the UCBAdmissions multi-way table data and convert
it to a flat contingency table:
FCT <- as.data.frame(UCBAdmissions)
> FCT
Admit Gender Dept Freq
1 Admitted Male A 512
2 Rejected Male A 313
3 Admitted Female A 89
4 Rejected Female A 19
5 Admitted Male B 353
6 Rejected Male B 207
7 Admitted Female B 17
8 Rejected Female B 8
9 Admitted Male C 120
10 Rejected Male C 205
11 Admitted Female C 202
12 Rejected Female C 391
13 Admitted Male D 138
14 Rejected Male D 279
15 Admitted Female D 131
16 Rejected Female D 244
17 Admitted Male E 53
18 Rejected Male E 138
19 Admitted Female E 94
20 Rejected Female E 299
21 Admitted Male F 22
22 Rejected Male F 351
23 Admitted Female F 24
24 Rejected Female F 317
Thus, there should be:
> sum(FCT$Freq)
[1] 4526
rows in the final 'raw' data frame.
> DF <- expand.dft(FCT)
> str(DF)
'data.frame': 4526 obs. of 3 variables:
$ Admit : Factor w/ 2 levels "Admitted","Rejected": 1 1 1 1 1 1 1 1 1
1 ...
$ Gender: Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...
$ Dept : Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1
1 ...
Note that the three columns are coerced back to factors, which is of
course the default behavior for data frames.
If we now use:
> DF <- expand.dft(FCT, as.is = TRUE)
> str(DF)
'data.frame': 4526 obs. of 3 variables:
$ Admit : chr "Admitted" "Admitted" "Admitted" "Admitted" ...
$ Gender: chr "Male" "Male" "Male" "Male" ...
$ Dept : chr "A" "A" "A" "A" ...
The three columns stay as character vectors. It was this behavior that
did not work properly in the first version.
HTH,
Marc Schwartz
More information about the R-help
mailing list