[R] Subsetting a data frame by a factor, using the level that occurs the most times
Liaw, Andy
andy_liaw at merck.com
Thu Jan 20 18:40:29 CET 2005
> From: Douglas Bates
>
> Liaw, Andy wrote:
> >>From: Douglas Bates
> >>
> >>michael watson (IAH-C) wrote:
> >>
> >>>I think that title makes sense... I hope it does...
> >>>
> >>>I have a data frame, one of the columns of which is a
> >>
> >>factor. I want
> >>
> >>>the rows of data that correspond to the level in that factor which
> >>>occurs the most times.
> >>
> >>So first you want to determine the mode (in the sense of the most
> >>frequently occuring value) of the factor. One way to do this is
> >>
> >>names(which.max(table(fac)))
> >>
> >>Use this comparison for the subset as
> >>
> >>subset(data, pattern == names(which.max(table(pattern))))
> >
> >
> > Just be careful that if there are ties (i.e., more than one
> level having the
> > max) which.max() will randomly pick one of them. That may
> or may not be
> > what's desired. If that is a possibility, Mick will need
> to think what he
> > wants in such cases.
>
> According to the documentation it picks the first one. Also, that's
> what Martin Maechler told me and he wrote the code so I trust him on
> that. I figure that if you have to trust someone to be
> meticulous and
> precise then a German-speaking Swiss is a good choice.
My apologies! I got it mixed up with max.col, which does the tie-breaking.
Andy
More information about the R-help
mailing list