[R] Convert Contingency Table to Flat File

Marc Schwartz MSchwartz at mn.rr.com
Tue Oct 17 21:04:11 CEST 2006


On Tue, 2006-10-17 at 13:09 +0200, Philipp Pagel wrote:
> On Tue, Oct 17, 2006 at 03:08:49AM -0700, Marco LO wrote:
> >   Is there any R function out there to turn a multi-way contingency
> >   table back to a flat file table of individual rows and attribute
> >   columns.?
> 
> Are you looking for something like this?
> 
> # generate some data
> x = sample(c(0,1), 100, replace=T)
> y = sample(c(0,1), 100, replace=T)
> z = sample(c(0,1), 100, replace=T)
> # contingency table
> mytab = table(x,y,z)
> # flat contingency table
> as.data.frame( mytab )


This thread reminds me of a discussion a while back, but which I cannot
seem to find at the moment in the archives.

The steps elucidated by Philipp result in a flattened contingency table,
which contains the various cross-classifying factors as unique rows and
the addition of a frequency column indicating the number of occurrences
of each unique row.

It does not however result in what might be considered the original "raw
data frame' containing a single row per observation, if that is what one
desires.

In other words, we get the following:

set.seed(1)
x <- sample(c(0, 1), 100, replace = TRUE)
y <- sample(c(0, 1), 100, replace = TRUE)
z <- sample(c(0, 1), 100, replace = TRUE)
 
# contingency table
mytab <- table(x, y, z)
 
> mytab
, , z = 0

   y
x    0  1
  0 17 19
  1 11 15

, , z = 1

   y
x    0  1
  0  6 10
  1 12 10

 
# flattened contingency table
FCT <- as.data.frame(mytab)
 
> FCT
  x y z Freq
1 0 0 0   17
2 1 0 0   11
3 0 1 0   19
4 1 1 0   15
5 0 0 1    6
6 1 0 1   12
7 0 1 1   10
8 1 1 1   10



In order to take 'FCT' and convert it to 'raw data rows', we can do the
following:

expand.dft <- function(x, na.strings = "NA", as.is = FALSE, dec = ".")
{
  # Take each row in the source data frame table and replicate it
  # using the Freq value
  DF <- sapply(1:nrow(x), function(i) x[rep(i, each = x$Freq[i]), ],
               simplify = FALSE)

  # Take the above list and rbind it to create a single DF
  # Also subset the result to eliminate the Freq column
  DF <- subset(do.call("rbind", DF), select = -Freq)

  # Now apply type.convert to the character coerced factor columns
  # to facilitate data type selection for each column
  DF <- as.data.frame(lapply(DF,
                             function(x) 
                             type.convert(as.character(x),
                                          na.strings = na.strings,
                                          as.is = as.is,
                                          dec = dec)))

  # Return data frame
  DF
}


# Now use expand.dft() on the table from above
new.DF <- expand.dft(FCT)

> str(new.DF)
'data.frame':   100 obs. of  3 variables:
 $ x: int  0 0 0 0 0 0 0 0 0 0 ...
 $ y: int  0 0 0 0 0 0 0 0 0 0 ...
 $ z: int  0 0 0 0 0 0 0 0 0 0 ...


# Re-create the multi-way table
new.tab <- table(new.DF)

> new.tab
, , z = 0

   y
x    0  1
  0 17 19
  1 11 15

, , z = 1

   y
x    0  1
  0  6 10
  1 12 10


# Compare to initial mytab
> identical(new.tab, mytab)
[1] TRUE



So, if one needs it, expand.dft() can be used to take a multi-way
contingency table that has been coerced to a data frame and convert it
back to the raw data frame.

I'm not sure if this functionality is available elsewhere, but thought
that it might be helpful.

I included the use of type.convert() in order to make a reasonable
attempt at restoring original data types, as the lack of this step
results in all columns as factors.

I wonder if it might make sense to add an 'expand' argument to
as.data.frame.table(), which would default to FALSE. It could be then
set to TRUE and utilize expand.dft() to take the additional step and
return the raw data frame as above.

Anyway, I hope that this might be helpful.

Regards,

Marc Schwartz



More information about the R-help mailing list