[R] Re: summarizing dataframe
Jim Rogers
jrogers at cantatapharm.com
Mon Jan 13 14:44:03 CET 2003
The options I know of are:
1. aggregate (in the base package), with FUN = length. But this converts
character vectors to factors, which is sometimes annoying and sometimes
dangerous.
2. summarize, in the Hmisc package (again, with FUN = length). I find
summarize to be a very useful function in general, but it has a lot of
overhead if all you want is counts. Very slow with a large data frame.
3. Some wrapper that calls tabulate directly. I use:
table.mat <- function(x) {
uid <- do.call("paste", as.list(x))
count <- tabulate(factor(uid))
x <- x[order(uid), ]
i <- !duplicated(sort(uid))
out <- x[i, ]
out$Count <- count
last <- length(out)
o <- do.call("order", as.list(out[-last]))
out <- out[o, ]
dimnames(out) <- list(1:(dim(out)[1]), names(out))
out
}
This is based on my memory of a function that I think Scott Chasalow
wrote and often used. My memory is only of what the function did, not on
the code, so Scott may have something a bit better? (I am cc'ing Scott)
> Message: 16
> From: Alexander.Herr at csiro.au
> To: r-help at stat.math.ethz.ch
> Date: Mon, 13 Jan 2003 14:22:23 +1000
> Subject: [R] summarizing dataframe
>
> Hi Listers,
>
> Surely, I just have a mental block and there is a more elegant way of
creating a
> summary count (other than extracing it from ftable). I'd like to
create a new
> data.frame containing counts of spell by loc ie have three columns
showing
> spell,loc,count. Below the data.frame...
>
> Any help appreciated
> Thanks Herry
Jim
James A. Rogers, Ph.D. <rogers at cantatapharm.com>
Statistical Scientist
Cantata Pharmaceuticals
3-G Gill St
Woburn, MA 01801
617.225.9009
Fax 617.225.9010
More information about the R-help
mailing list