Marc Schwartz
marc_schwartz at comcast.net
Fri Oct 5 22:17:23 CEST 2007
On Fri, 2007-10-05 at 13:09 -0700, Deepayan Sarkar wrote:
> On 10/5/07, born.to.b.wyld at gmail.com <born.to.b.wyld at gmail.com> wrote:
> > I have an application that would generate a cross-tabulation in array
> > format in R. In particular, my application would give me a result
> > similar to that of :
> >
> > array(5,c(2,2,2,2,2))
> >
> > The above could be seen as a cross-tabulation of 5 variables with 2
> > levels each (could be 0 and 1). In this case, the data were such that
> > each cell has exactly 5 observations. I
> >
> > Now, I want the output to look like the output of 'xtabs' utility, so
> > that I can use this output inside 'loglm(MASS)'. In particular, I want
> > to add (variable) names to each dimension and indicate the levels that
> > correspond to each cell. The output from 'xtabs' for a data set of
> > this kind would look like:
>
> Simplifying your example:
>
> > foo <- array(5,c(2,2), dimnames = list(xx = c("x=0", "x=1"), yy = c("A", "B")))
> > foo
> yy
> xx A B
> x=0 5 5
> x=1 5 5
>
> You can also do this in two steps:
>
> foo <- array(5,c(2,2))
> dimnames(foo) <- list(xx = c("x=0", "x=1"), yy = c("A", "B"))
>
> [...]
>
> > Now, I do not generate my output using the 'xtabs' utility. In fact,
> > my simulations would generate the cross-table directly and not the
> > dataset.
> >
> > Can anyone help? R examples have some reference to the 'dimnames'
> > attribute, but I am not exactly sure.
> > Also, is there an R function that could do the exact opposite of
> > 'xtabs', that is, may be generate a data frame given its cross-table?
>
> Sort of (there is only one column for each combination, giving `frequencies'):
>
> > as.data.frame.table(foo)
> xx yy Freq
> 1 x=0 A 5
> 2 x=1 A 5
> 3 x=0 B 5
> 4 x=1 B 5
>
> If foo has class "table", then as.data.frame(foo) would also work.
In follow up to Deepayan's reply, I have posted this previously, but
here is a function that will take the result of applying
as.data.frame.table() to a table and generate the raw data. For example,
using Deepayan's table above:
> foo
yy
xx A B
x=0 5 5
x=1 5 5
f.dft <- as.data.frame.table(foo)
> f.dft
xx yy Freq
1 x=0 A 5
2 x=1 A 5
3 x=0 B 5
4 x=1 B 5
Here is the function, called expand.dft():
expand.dft <- function(x, na.strings = "NA", as.is = FALSE, dec = ".")
{
DF <- sapply(1:nrow(x), function(i) x[rep(i, each = x$Freq[i]), ],
simplify = FALSE)
DF <- subset(do.call("rbind", DF), select = -Freq)
for (i in 1:ncol(DF))
{
DF[[i]] <- type.convert(as.character(DF[[i]]),
na.strings = na.strings,
as.is = as.is, dec = dec)
}
DF
}
Now, applying that to 'f.dft' from above:
> expand.dft(f.dft)
xx yy
1 x=0 A
1.1 x=0 A
1.2 x=0 A
1.3 x=0 A
1.4 x=0 A
2 x=1 A
2.1 x=1 A
2.2 x=1 A
2.3 x=1 A
2.4 x=1 A
3 x=0 B
3.1 x=0 B
3.2 x=0 B
3.3 x=0 B
3.4 x=0 B
4 x=1 B
4.1 x=1 B
4.2 x=1 B
4.3 x=1 B
4.4 x=1 B
and of course:
> table(expand.dft(f.dft))
yy
xx A B
x=0 5 5
x=1 5 5
HTH,
Marc Schwartz
