[R] identify subsets based on two grouping factors

Mon Jan 31 23:26:58 CET 2011

Indeed, tapply is what I needed. To clarify Phils' question, what I needed was

tapply(x, list(cut.grp1, cut.grp2), function(z) table(z))

On Mon, Jan 31, 2011 at 4:50 PM, Bert Gunter <gunter.berton at gene.com> wrote:
> ?tapply   is the basic R function for this. There are many other packages
> (e.g. plyr) and functions (e.g. ave) that simplify and streamline this for
> more complicated applications.
>
> -- Bert
>
> On Mon, Jan 31, 2011 at 1:43 PM, Rajarshi Guha <rajarshi.guha at gmail.com>
> wrote:
>>
>> Hi, I have a data.frame that has a categorical variable, for which I
>> would like to look at the distribution of levels of this variable,
>> based on a grouping of two other variables.
>>
>> As an example:
>>
>> x <- data.frame(obs=sample(c('low', 'high'),100, replace=TRUE),
>> grp1=sample(1:10, 100, replace=TRUE),
>> grp2=runif(100))
>>
>> cut.grp1 <- cut(x$grp1, 3)
>> cut.grp2 <- cut(x$grp2, 3)
>>
>> Thus, for each combination of levels in cut.grp1 and cut.grp2, I'd
>> like to obtain the distribution of levels obs. I know I can loop over
>> each pair of levels in cut.grp1 and cut.grp2, but is there a more
>> elegant way to achieve this?
>>
>> --
>> Rajarshi Guha
>> NIH Chemical Genomics Center
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Bert Gunter
> Genentech Nonclinical Biostatistics
> 467-7374
> http://devo.gene.com/groups/devo/depts/ncb/home.shtml
>

-- 
Rajarshi Guha
NIH Chemical Genomics Center