[R] subsetting a data.frame to the 'unique' of a column
Berton Gunter
gunter.berton at gene.com
Thu Dec 23 18:47:41 CET 2004
Spencer's solution is considerably more inefficient then using duplicated()
and subscripting: in a small example with 3 columns and 10000 rows, it took
5 times as long on my Windows setup.
The reason is that aggregate() is basically a wrapper for tapply and tapply
basically loops in R. duplicated() loops in C (and uses hashing, I believe).
Cheers,
-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
"The business of the statistician is to catalyze the scientific learning
process." - George E. P. Box
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Spencer Graves
> Sent: Thursday, December 23, 2004 9:06 AM
> To: Göran Broström
> Cc: Rudi Alberts; r-help at stat.math.ethz.ch
> Subject: Re: [R] subsetting a data.frame to the 'unique' of a column
>
> What about "aggregate"?
>
> DF <- data.frame(a=c(1,1,2), b=1:3, c=letters[1:3])
> aggregate(DF[2:3], DF[1], function(x)x[1])
> a b c
> 1 1 1 1
> 2 2 3 3
>
> hope this helps. spencer graves
>
> Göran Broström wrote:
>
> >On Thu, Dec 23, 2004 at 11:28:31AM -0800, Rudi Alberts wrote:
> >
> >
> >>Hi,
> >>
> >>I often run into this problem:
> >>I have a data.frame with one column containing entries that are not
> >>unique. What I then want is a subset of the data.frame in which
> >>the entries in that column have become the 'unique' of the original
> >>column.
> >>Normally I program around it by taking the unique of the column and
> >>making a new data.frame with it and filling the rest of the data.
> >>
> >>(By the way, when moving to the smaller data.frame for
> example 5 rows
> >>with the same value in that column will be replaced by one
> row for that
> >>value. I don't mind which of the rows now..)
> >>
> >>
> >>something like this, however, this gives me the complete df.
> >>
> >>df[df$colname %in% unique(df$colname),]
> >>
> >>or this, which doesnt work
> >>
> >>df[df$colname == unique(df$colname),]
> >>
> >>
> >>
> > Use 'duplicated':
> >
> >
> >
> >>df[!duplicated(df$colname), ]
> >>
> >>
> >
> >
> >
>
> --
> Spencer Graves, PhD, Senior Development Engineer
> O: (408)938-4420; mobile: (408)655-4567
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
More information about the R-help
mailing list