[R] Data aggregation question
William Dunlap
wdunlap at tibco.com
Fri Jul 29 00:12:08 CEST 2011
Have you tried using table()?
E.g.,
> df <- data.frame(x=c("A","A","B","C"), y=c("ii","ii","i","ii"), Age=2^(1:4))
> tab <- do.call("table", df[c("x","y")])
> tab
y
x i ii
A 0 2
B 1 0
C 0 1
> as.data.frame(tab)
x y Freq
1 A i 0
2 B i 1
3 C i 0
4 A ii 2
5 B ii 0
6 C ii 1
> str(.Last.value)
'data.frame': 6 obs. of 3 variables:
$ x : Factor w/ 3 levels "A","B","C": 1 2 3 1 2 3
$ y : Factor w/ 2 levels "i","ii": 1 1 1 2 2 2
$ Freq: int 0 1 0 2 0 1
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of David Warren
> Sent: Thursday, July 28, 2011 1:25 PM
> To: r-help at r-project.org
> Subject: [R] Data aggregation question
>
> Hi all,
>
> I'm working with a sizable dataset that I'd like to summarize, but I
> can't find a tool or function that will do quite what I'd like. Basically,
> I'd like to summarize the data by fully crossing three variables and getting
> a count of the number of observations for every level of that 3-way
> interaction. For example, if factors A, B, and C each have 3 levels (all of
> which were observed someplace in the dataset), I'd like to know how many
> times A1, B1, and C1 co-occurred in the dataset. Functions like aggregate
> and summaryBy do a decent job when I sum a vector of ones of the same length
> as the original dataset, but I'm getting stuck on the fact that neither will
> return 0-count combinations of the three variables in question. I
> understand that this is a desirable outcome (if A1, B1, C2 didn't occur, it
> shouldn't be counted and isn't), but I need to know both when these
> combinations of factor did and did not occur. I'm stuck on this one, and
> would really appreciate any help. Thanks in advance!
>
> Dave Warren
>
> PS A functional solution would be best; the original dataset contains about
> 2.3 million observations, so any looping is going to be very slow.
>
> --
> Post-doctoral Fellow
> Neurology Department
> University of Iowa Hospitals and Clinics
> davideugenewarren at gmail.com
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list