[R] Data aggregation question

David Winsemius dwinsemius at comcast.net
Fri Jul 29 00:39:53 CEST 2011


On Jul 28, 2011, at 4:24 PM, David Warren wrote:

> Hi all,
>
>     I'm working with a sizable dataset that I'd like to summarize,  
> but I
> can't find a tool or function that will do quite what I'd like.   
> Basically,
> I'd like to summarize the data by fully crossing three variables and  
> getting
> a count of the number of observations for every level of that 3-way
> interaction.  For example, if factors A, B, and C each have 3 levels  
> (all of
> which were observed someplace in the dataset), I'd like to know how  
> many
> times A1, B1, and C1 co-occurred in the dataset.  Functions like  
> aggregate
> and summaryBy do a decent job when I sum a vector of ones of the  
> same length
> as the original dataset, but I'm getting stuck on the fact that  
> neither will
> return 0-count combinations of the three variables in question.

I think that may depend on what functions and arguments you use.

>  I understand that this is a desirable outcome (if A1, B1, C2 didn't  
> occur, it
> shouldn't be counted and isn't), but I need to know both when these
> combinations of factor did and did not occur.  I'm stuck on this  
> one, and
> would really appreciate any help.  Thanks in advance!

?xtabs

>
> Dave Warren
>
> PS A functional solution would be best; the original dataset  
> contains about
> 2.3 million observations, so any looping is going to be very slow.

In general tabulations like these are very efficient.

-- 

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list