[R] generalizing expand.table: table -> data.frame
Marc Schwartz
marc_schwartz at comcast.net
Tue Jan 20 19:05:51 CET 2009
on 01/20/2009 10:38 AM Michael Friendly wrote:
> In
> http://tolstoy.newcastle.edu.au/R/e2/help/06/10/3064.html
> a method was given for converting a frequency table to an expanded data
> frame representing each
> observation as a set of factors. A slightly modified version was later
> included in the NCStats package,
> only on http://rforge.net/ (and it has too many dependencies to be useful).
>
> I've tried to make it more general, allowing an input data frame in
> frequency form, and where
> the frequency variable is not named "Freq". This is my working version:
>
> __begin__ expand.table.R
> expand.table <- function (x, var.names = NULL, freq="Freq", ...)
> {
> # allow: a table object, or a data frame in frequency form
> if(inherits(x,"table")) {
> x <- as.data.frame.table(x)
> }
> ## This fails:
> # df <- sapply(1:nrow(x), function(i) x[rep(i, each = x[i,freq]), ],
> simplify = FALSE)
> # df <- subset(do.call("rbind", df), select = -freq)
>
> # This works, when the frequency variable is named Freq
> df <- sapply(1:nrow(x), function(i) x[rep(i, each = x[i,"Freq"]), ],
> simplify = FALSE)
> df <- subset(do.call("rbind", df), select = -Freq)
>
> for (i in 1:ncol(df)) {
> df[[i]] <- type.convert(as.character(df[[i]]), ...)
> }
> rownames(df) <- NULL
> if (!is.null(var.names)) {
> if (length(var.names) < dim(df)[2])
> stop("Too few var.names given.")
> else if (length(var.names) > dim(df)[2])
> stop("Too many var.names given.")
> else names(df) <- var.names
> }
> df
> }
> __end__ expand.table.R
>
> Thus for the following table
>
> library(vcd)
> art <- xtabs(~Treatment + Improved, data = Arthritis)
>
>
>> art
> Improved
> Treatment None Some Marked
> Placebo 29 7 7
> Treated 13 7 21
>
> expand.table (above) gives a data frame of sum(art)=84 observations,
> with factors
> Treatment and Improved.
>> artdf <- expand.table(art)
>> str(artdf)
> 'data.frame': 84 obs. of 2 variables:
> $ Treatment: Factor w/ 2 levels "Placebo","Treated": 1 1 1 1 1 1 1 1 1 1
> ...
> $ Improved : Factor w/ 3 levels "Marked","None",..: 2 2 2 2 2 2 2 2 2 2 ...
>>
>
> I've generalized this so it works with data frames in frequency form,
>
>> as.data.frame(art)
> Treatment Improved Freq
> 1 Placebo None 29
> 2 Treated None 13
> 3 Placebo Some 7
> 4 Treated Some 7
> 5 Placebo Marked 7
> 6 Treated Marked 21
>
>> art.df2 <- expand.table(as.data.frame(art))
>> str(art.df2)
> 'data.frame': 84 obs. of 2 variables:
> $ Treatment: Factor w/ 2 levels "Placebo","Treated": 1 1 1 1 1 1 1 1 1 1
> ...
> $ Improved : Factor w/ 3 levels "Marked","None",..: 2 2 2 2 2 2 2 2 2 2 ...
>>
>
> But--- here's the rub --- when the Freq variable in a data frame is
> called something other than
> "Freq", as in this example,
>
>> GSS
> sex party count
> 1 female dem 279
> 2 male dem 165
> 3 female indep 73
> 4 male indep 47
> 5 female rep 225
> 6 male rep 191
>
> all the changes I've tried, using the freq= argument in expand.table()
> fail in various ways.
>
> Can someone help?
Hi Michael,
I think that the following modifications to my original code, also
incorporating the changes made in the NCstats package should work.
expand.dft <- function(x, var.names = NULL, freq = "Freq", ...)
{
# allow: a table object, or a data frame in frequency form
if(inherits(x, "table"))
x <- as.data.frame.table(x, responseName = freq)
freq.col <- which(colnames(x) == freq)
if (length(freq.col) == 0)
stop(paste(sQuote("freq"), "not found in column names"))
DF <- sapply(1:nrow(x),
function(i) x[rep(i, each = x[i, freq.col]), ],
simplify = FALSE)
DF <- do.call("rbind", DF)[, -freq.col]
for (i in 1:ncol(DF))
{
DF[[i]] <- type.convert(as.character(DF[[i]]), ...)
}
rownames(DF) <- NULL
if (!is.null(var.names))
{
if (length(var.names) < dim(DF)[2])
{
stop(paste("Too few", sQuote("var.names"), "given."))
} else if (length(var.names) > dim(DF)[2]) {
stop(paste("Too many", sQuote("var.names"), "given."))
} else {
names(DF) <- var.names
}
}
DF
}
> art
Improved
Treatment None Some Marked
Placebo 29 7 7
Treated 13 7 21
> head(expand.dft(art), 10)
Treatment Improved
1 Placebo None
2 Placebo None
3 Placebo None
4 Placebo None
5 Placebo None
6 Placebo None
7 Placebo None
8 Placebo None
9 Placebo None
10 Placebo None
art.dft <- as.data.frame.table(art)
> art.dft
Treatment Improved Freq
1 Placebo None 29
2 Treated None 13
3 Placebo Some 7
4 Treated Some 7
5 Placebo Marked 7
6 Treated Marked 21
names(art.dft)[3] <- "count"
> art.dft
Treatment Improved count
1 Placebo None 29
2 Treated None 13
3 Placebo Some 7
4 Treated Some 7
5 Placebo Marked 7
6 Treated Marked 21
> head(expand.dft(art.dft, freq = "count"), 10)
Treatment Improved
1 Placebo None
2 Placebo None
3 Placebo None
4 Placebo None
5 Placebo None
6 Placebo None
7 Placebo None
8 Placebo None
9 Placebo None
10 Placebo None
HTH,
Marc Schwartz
More information about the R-help
mailing list