[R] Subsetting a data frame by a factor, using the level that occurs the most times

Thu Jan 20 18:40:29 CET 2005

> From: Douglas Bates
> 
> Liaw, Andy wrote:
> >>From: Douglas Bates
> >>
> >>michael watson (IAH-C) wrote:
> >>
> >>>I think that title makes sense... I hope it does...
> >>>
> >>>I have a data frame, one of the columns of which is a 
> >>
> >>factor.  I want
> >>
> >>>the rows of data that correspond to the level in that factor which
> >>>occurs the most times.  
> >>
> >>So first you want to determine the mode (in the sense of the most 
> >>frequently occuring value) of the factor.   One way to do this is
> >>
> >>names(which.max(table(fac)))
> >>
> >>Use this comparison for the subset as
> >>
> >>subset(data, pattern == names(which.max(table(pattern))))
> > 
> > 
> > Just be careful that if there are ties (i.e., more than one 
> level having the
> > max) which.max() will randomly pick one of them.  That may 
> or may not be
> > what's desired.  If that is a possibility, Mick will need 
> to think what he
> > wants in such cases.
> 
> According to the documentation it picks the first one.  Also, that's 
> what Martin Maechler told me and he wrote the code so I trust him on 
> that.  I figure that if you have to trust someone to be 
> meticulous and 
> precise then a German-speaking Swiss is a good choice.

My apologies!  I got it mixed up with max.col, which does the tie-breaking. 

Andy