[R] Data aggregation question

Sarah Goslee sarah.goslee at gmail.com
Fri Jul 29 00:10:53 CEST 2011


You don't offer a reproducible example, but what do you need that table()
doesn't provide?

testdata <- data.frame(A=factor(sample(1:3, 20)), B=factor(sample(1:3,
20)), C=factor(sample(1:3, 20)))
table(testdata)

Sarah

On Thu, Jul 28, 2011 at 4:24 PM, David Warren
<davideugenewarren at gmail.com> wrote:
> Hi all,
>
>     I'm working with a sizable dataset that I'd like to summarize, but I
> can't find a tool or function that will do quite what I'd like.  Basically,
> I'd like to summarize the data by fully crossing three variables and getting
> a count of the number of observations for every level of that 3-way
> interaction.  For example, if factors A, B, and C each have 3 levels (all of
> which were observed someplace in the dataset), I'd like to know how many
> times A1, B1, and C1 co-occurred in the dataset.  Functions like aggregate
> and summaryBy do a decent job when I sum a vector of ones of the same length
> as the original dataset, but I'm getting stuck on the fact that neither will
> return 0-count combinations of the three variables in question.  I
> understand that this is a desirable outcome (if A1, B1, C2 didn't occur, it
> shouldn't be counted and isn't), but I need to know both when these
> combinations of factor did and did not occur.  I'm stuck on this one, and
> would really appreciate any help.  Thanks in advance!
>
> Dave Warren
>
> PS A functional solution would be best; the original dataset contains about
> 2.3 million observations, so any looping is going to be very slow.
>

-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list