[R] Convert Contingency Table to Flat File

Marc Schwartz MSchwartz at mn.rr.com
Wed Oct 18 04:50:54 CEST 2006


Just a quick update on this thread.

The version of expand.dft() that I posted earlier has a bug in it.

This is the result of the use of lapply() and the evaluation of the
additional arguments passed to type.convert().

I noted this when testing the function on the UCBAdmissions data set,
which is a multi-way table used in some help file examples such
as ?as.data.frame.table.

Here is a corrected version:

expand.dft <- function(x, na.strings = "NA", as.is = FALSE, dec = ".")
{
  DF <- sapply(1:nrow(x), function(i) x[rep(i, each = x$Freq[i]), ],
               simplify = FALSE)

  DF <- subset(do.call("rbind", DF), select = -Freq)

  for (i in 1:ncol(DF))
  {
    DF[[i]] <- type.convert(as.character(DF[[i]]),
                            na.strings = na.strings,
                            as.is = as.is, dec = dec)
                                           
  }
    
  DF
}             


Thus if we now take the UCBAdmissions multi-way table data and convert
it to a flat contingency table:

FCT <- as.data.frame(UCBAdmissions)

> FCT
      Admit Gender Dept Freq
1  Admitted   Male    A  512
2  Rejected   Male    A  313
3  Admitted Female    A   89
4  Rejected Female    A   19
5  Admitted   Male    B  353
6  Rejected   Male    B  207
7  Admitted Female    B   17
8  Rejected Female    B    8
9  Admitted   Male    C  120
10 Rejected   Male    C  205
11 Admitted Female    C  202
12 Rejected Female    C  391
13 Admitted   Male    D  138
14 Rejected   Male    D  279
15 Admitted Female    D  131
16 Rejected Female    D  244
17 Admitted   Male    E   53
18 Rejected   Male    E  138
19 Admitted Female    E   94
20 Rejected Female    E  299
21 Admitted   Male    F   22
22 Rejected   Male    F  351
23 Admitted Female    F   24
24 Rejected Female    F  317


Thus, there should be:

> sum(FCT$Freq)
[1] 4526

rows in the final 'raw' data frame.


> DF <- expand.dft(FCT)

> str(DF)
'data.frame':   4526 obs. of  3 variables:
 $ Admit : Factor w/ 2 levels "Admitted","Rejected": 1 1 1 1 1 1 1 1 1
1 ...
 $ Gender: Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...
 $ Dept  : Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1
1 ...


Note that the three columns are coerced back to factors, which is of
course the default behavior for data frames.

If we now use:

> DF <- expand.dft(FCT, as.is = TRUE)

> str(DF)
'data.frame':   4526 obs. of  3 variables:
 $ Admit : chr  "Admitted" "Admitted" "Admitted" "Admitted" ...
 $ Gender: chr  "Male" "Male" "Male" "Male" ...
 $ Dept  : chr  "A" "A" "A" "A" ...


The three columns stay as character vectors. It was this behavior that
did not work properly in the first version.

HTH,

Marc Schwartz



More information about the R-help mailing list