[R] Data aggregation question
dwinsemius at comcast.net
Fri Jul 29 00:39:53 CEST 2011
On Jul 28, 2011, at 4:24 PM, David Warren wrote:
> Hi all,
> I'm working with a sizable dataset that I'd like to summarize,
> but I
> can't find a tool or function that will do quite what I'd like.
> I'd like to summarize the data by fully crossing three variables and
> a count of the number of observations for every level of that 3-way
> interaction. For example, if factors A, B, and C each have 3 levels
> (all of
> which were observed someplace in the dataset), I'd like to know how
> times A1, B1, and C1 co-occurred in the dataset. Functions like
> and summaryBy do a decent job when I sum a vector of ones of the
> same length
> as the original dataset, but I'm getting stuck on the fact that
> neither will
> return 0-count combinations of the three variables in question.
I think that may depend on what functions and arguments you use.
> I understand that this is a desirable outcome (if A1, B1, C2 didn't
> occur, it
> shouldn't be counted and isn't), but I need to know both when these
> combinations of factor did and did not occur. I'm stuck on this
> one, and
> would really appreciate any help. Thanks in advance!
> Dave Warren
> PS A functional solution would be best; the original dataset
> contains about
> 2.3 million observations, so any looping is going to be very slow.
In general tabulations like these are very efficient.
David Winsemius, MD
West Hartford, CT
More information about the R-help