[R] generalizing expand.table: table -> data.frame

Michael Friendly friendly at yorku.ca
Tue Jan 20 17:38:15 CET 2009


In
http://tolstoy.newcastle.edu.au/R/e2/help/06/10/3064.html
a method was given for converting a frequency table to an expanded data 
frame representing each
observation as a set of factors.  A slightly modified version was later 
included in the NCStats package,
only on http://rforge.net/ (and it has too many dependencies to be useful).

I've tried to make it more general, allowing an input data frame in 
frequency form, and where
the frequency variable is not named "Freq".  This is my working version:

__begin__ expand.table.R
expand.table <- function (x, var.names = NULL, freq="Freq", ...)
{
#  allow: a table object, or a data frame in frequency form
   if(inherits(x,"table")) {
     x <- as.data.frame.table(x)
   }
##  This fails:
#   df <- sapply(1:nrow(x), function(i) x[rep(i, each = x[i,freq]), ], 
simplify = FALSE)
#   df <- subset(do.call("rbind", df), select = -freq)

#  This works, when the frequency variable is named Freq
   df <- sapply(1:nrow(x), function(i) x[rep(i, each = x[i,"Freq"]), ], 
simplify = FALSE)
   df <- subset(do.call("rbind", df), select = -Freq)

   for (i in 1:ncol(df)) {
       df[[i]] <- type.convert(as.character(df[[i]]), ...)
   }
   rownames(df) <- NULL
   if (!is.null(var.names)) {
       if (length(var.names) < dim(df)[2])
           stop("Too few var.names given.")
       else if (length(var.names) > dim(df)[2])
           stop("Too many var.names given.")
       else names(df) <- var.names
   }
   df
}
__end__   expand.table.R

Thus for the following table

library(vcd)
art <- xtabs(~Treatment + Improved, data = Arthritis)


 > art
         Improved
Treatment None Some Marked
  Placebo   29    7      7
  Treated   13    7     21

expand.table (above) gives a data frame of sum(art)=84 observations, 
with factors
Treatment and Improved. 

 > artdf <- expand.table(art)
 > str(artdf)
'data.frame':   84 obs. of  2 variables:
 $ Treatment: Factor w/ 2 levels "Placebo","Treated": 1 1 1 1 1 1 1 1 1 
1 ...
 $ Improved : Factor w/ 3 levels "Marked","None",..: 2 2 2 2 2 2 2 2 2 2 ...
 >

I've generalized this so it works with data frames in frequency form,

 > as.data.frame(art)
  Treatment Improved Freq
1   Placebo     None   29
2   Treated     None   13
3   Placebo     Some    7
4   Treated     Some    7
5   Placebo   Marked    7
6   Treated   Marked   21

 > art.df2 <- expand.table(as.data.frame(art))
 > str(art.df2)
'data.frame':   84 obs. of  2 variables:
 $ Treatment: Factor w/ 2 levels "Placebo","Treated": 1 1 1 1 1 1 1 1 1 
1 ...
 $ Improved : Factor w/ 3 levels "Marked","None",..: 2 2 2 2 2 2 2 2 2 2 ...
 >

But--- here's the rub --- when the Freq variable in a data frame is 
called something other than
"Freq", as in this example,

 > GSS
     sex party count
1 female   dem   279
2   male   dem   165
3 female indep    73
4   male indep    47
5 female   rep   225
6   male   rep   191

all the changes I've tried, using the freq= argument in expand.table() 
fail in various ways.

Can someone help?

-Michael

-- 
Michael Friendly     Email: friendly AT yorku DOT ca 
Professor, Psychology Dept.
York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street    http://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT  M3J 1P3 CANADA




More information about the R-help mailing list