[R] generalizing expand.table: table -> data.frame
Michael Friendly
friendly at yorku.ca
Tue Jan 20 17:38:15 CET 2009
In
http://tolstoy.newcastle.edu.au/R/e2/help/06/10/3064.html
a method was given for converting a frequency table to an expanded data
frame representing each
observation as a set of factors. A slightly modified version was later
included in the NCStats package,
only on http://rforge.net/ (and it has too many dependencies to be useful).
I've tried to make it more general, allowing an input data frame in
frequency form, and where
the frequency variable is not named "Freq". This is my working version:
__begin__ expand.table.R
expand.table <- function (x, var.names = NULL, freq="Freq", ...)
{
# allow: a table object, or a data frame in frequency form
if(inherits(x,"table")) {
x <- as.data.frame.table(x)
}
## This fails:
# df <- sapply(1:nrow(x), function(i) x[rep(i, each = x[i,freq]), ],
simplify = FALSE)
# df <- subset(do.call("rbind", df), select = -freq)
# This works, when the frequency variable is named Freq
df <- sapply(1:nrow(x), function(i) x[rep(i, each = x[i,"Freq"]), ],
simplify = FALSE)
df <- subset(do.call("rbind", df), select = -Freq)
for (i in 1:ncol(df)) {
df[[i]] <- type.convert(as.character(df[[i]]), ...)
}
rownames(df) <- NULL
if (!is.null(var.names)) {
if (length(var.names) < dim(df)[2])
stop("Too few var.names given.")
else if (length(var.names) > dim(df)[2])
stop("Too many var.names given.")
else names(df) <- var.names
}
df
}
__end__ expand.table.R
Thus for the following table
library(vcd)
art <- xtabs(~Treatment + Improved, data = Arthritis)
> art
Improved
Treatment None Some Marked
Placebo 29 7 7
Treated 13 7 21
expand.table (above) gives a data frame of sum(art)=84 observations,
with factors
Treatment and Improved.
> artdf <- expand.table(art)
> str(artdf)
'data.frame': 84 obs. of 2 variables:
$ Treatment: Factor w/ 2 levels "Placebo","Treated": 1 1 1 1 1 1 1 1 1
1 ...
$ Improved : Factor w/ 3 levels "Marked","None",..: 2 2 2 2 2 2 2 2 2 2 ...
>
I've generalized this so it works with data frames in frequency form,
> as.data.frame(art)
Treatment Improved Freq
1 Placebo None 29
2 Treated None 13
3 Placebo Some 7
4 Treated Some 7
5 Placebo Marked 7
6 Treated Marked 21
> art.df2 <- expand.table(as.data.frame(art))
> str(art.df2)
'data.frame': 84 obs. of 2 variables:
$ Treatment: Factor w/ 2 levels "Placebo","Treated": 1 1 1 1 1 1 1 1 1
1 ...
$ Improved : Factor w/ 3 levels "Marked","None",..: 2 2 2 2 2 2 2 2 2 2 ...
>
But--- here's the rub --- when the Freq variable in a data frame is
called something other than
"Freq", as in this example,
> GSS
sex party count
1 female dem 279
2 male dem 165
3 female indep 73
4 male indep 47
5 female rep 225
6 male rep 191
all the changes I've tried, using the freq= argument in expand.table()
fail in various ways.
Can someone help?
-Michael
--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT M3J 1P3 CANADA
More information about the R-help
mailing list