[R] Re: Excluding levels in table and xtabs
Michael Friendly
friendly at yorku.ca
Thu Dec 12 19:01:03 CET 2002
Having looked over the replies and examined the code, I can't
see any reason for table (and xtabs) to avoid honoring the
exclude= argument for factors. There are often reasons for wanting
to exclude certain levels, even non-missing in making a table.
In my application, John Fox suggested that I could circumvent
the problem by reading in the .csv file with na.strings="".
However, it was only for making tables that I wanted to exclude
the "" categories.
The change to table() to have it honor the exclude option for
factors is quite straight-forward. I wonder if the R team
will consider placing this on its list. (revised version below)
More generally, in working with tables I often find the need
to collapse or reorder the levels of some dimensions of an
n-way table. I've written a collapse.table to do the first,
e.g.,
sex <- c("Male", "Female")
age <- letters[1:6]
education <- c("low", 'med', 'high')
data <- expand.grid(sex=sex, age=age, education=education)
data <- cbind(data, rpois(36, 100))
# collapse age to 3 levels
t2 <- collapse.table(t1, age=c("A", "A", "B", "B", "C", "C"))
t3 <- collapse.table(t1, age=c("A", "A", "B", "B", "C", "C"),
education=c("low", "low", "high"))
and it's not too hard to do the second. However, I wonder if some
more general and convenient tools for working with tables are
available somewhere I've missed.
For example, for mosaicplots
it is often crucial be able to treat table variables as
ordered factors, where the ordering is that which shows the
pattern of association, not the default. For a data frame,
this can be done with
subset$Skin.Colour <- factor(subset$Skin.Colour, levels=c("White",
"Brown", "Other", "Black"))
but it's more unweildy with a table object.
-Michael
------- table.R ------
# modified to respect the exclude argument for factors
# use exclude=NULL for former behavior for factors (or change
default)
table <- function (..., exclude = c(NA, NaN),
dnn = list.names(...), deparse.level = 1)
{
list.names <- function(...) {
l <- as.list(substitute(list(...)))[-1]
nm <- names(l)
fixup <- if (is.null(nm))
seq(along = l)
else nm == ""
dep <- sapply(l[fixup], function(x)
switch (deparse.level + 1,
"",
if (is.symbol(x)) as.character(x) else "",
deparse(x)[1]
)
)
if (is.null(nm))
dep
else {
nm[fixup] <- dep
nm
}
}
args <- list(...)
if (length(args) == 0)
stop("nothing to tabulate")
if (length(args) == 1 && is.list(args[[1]])) {
args <- args[[1]]
if (length(dnn) != length(args))
dnn <- if (!is.null(argn <- names(args)))
argn
else
paste(dnn[1],1:length(args),sep='.')
}
bin <- 0
lens <- NULL
dims <- integer(0)
pd <- 1
dn <- NULL
for (a in args) {
if (is.null(lens)) lens <- length(a)
else if (length(a) != lens)
stop("all arguments must have the same length")
# MF: make exclude work for factors too
# if (is.factor(a))
# cat <- a
# else
cat <- factor(a, exclude = exclude)
nl <- length(l <- levels(cat))
dims <- c(dims, nl)
dn <- c(dn, list(l))
## requiring all(unique(as.integer(cat)) == 1:nlevels(cat)) :
bin <- bin + pd * (as.integer(cat) - 1)
pd <- pd * nl
}
names(dn) <- dnn
bin <- bin[!is.na(bin)]
if (length(bin)) bin <- bin + 1 # otherwise, that makes bin NA
y <- array(tabulate(bin, pd), dims, dimnames = dn)
class(y) <- "table"
y
}
--
Michael Friendly friendly at yorku.ca
York University http://www.math.yorku.ca/SCS/friendly.html
Psychology Department
4700 Keele Street Tel: (416) 736-5115 x66249
Toronto, Ontario, M3J 1P3 Fax: (416) 736-5814
More information about the R-help
mailing list