[R] difference between unique() and !duplicated()

jim holtman jholtman at gmail.com
Thu Sep 13 12:42:33 CEST 2007


Try this:

> a <- read.table(textConnection("x y   z
+ 1 A 1 yes
+ 2 A 2 yes
+ 3 B 1  no
+ 4 B 1 yes
+ 5 C 2  no
+ 6 C 3  no
+ 7 D 1  no"), header=TRUE)
> do.call('rbind', by(a, a$x, function(.sub){
+     .sub[which.max(.sub$y),]
+ }))
  x y   z
A A 2 yes
B B 1  no
C C 3  no
D D 1  no


On 9/13/07, T.Lok <T.Lok at rug.nl> wrote:
> Yesterday I spend the whole day struggling on how to get
> the maximum value of "y" for every unique value of "x"
> from the dataframe "test". In the R Book (Crawley, 2007)
> an example of this can be found on page 121. I tried to do
> it this way, but I failed.
>
> In the end, I figured out how to get it working (first
> order, and afterwards use !duplicated()). My question is:
> why does it not work with the unique() function on p. 121
> (
> i.e. test[rev(order(x)),][unique(y),]) ?
>
> As a simple example, I used to following syntax:
>
> > x <- c("A","A","B","B","C","C","D")
> > y <- c(1,2,1,1,2,3,1)
> > z <- c("yes","yes","no","yes","no","no","no")
> > test <- data.frame(x,y,z)
> > test
>
>   x y   z
> 1 A 1 yes
> 2 A 2 yes
> 3 B 1  no
> 4 B 1 yes
> 5 C 2  no
> 6 C 3  no
> 7 D 1  no
>
> > test[rev(order(test$y, test$z)),][unique(test$x),]
>
>   x y   z
> 6 C 3  no
> 2 A 2 yes
> 5 C 2  no
> 4 B 1 yes
>
> # this clearly does not give a unique value for x, since
> there are 2 C's and no D!
>
> > test[rev(order(test$y, test$z)),][!duplicated(test$x),]
>
>   x y   z
> 6 C 3  no
> 5 C 2  no
> 1 A 1 yes
> 3 B 1  no
>
> # this also doesn't work
> # then I thought, maybe first use the order() function,
> then unique()
>
> > test[rev(order(test$y, test$z)),]
>
>   x y   z
> 6 C 3  no
> 2 A 2 yes
> 5 C 2  no
> 4 B 1 yes
> 1 A 1 yes
> 7 D 1  no
> 3 B 1  no
>
> > test1 <- test[rev(order(test$y, test$z)),]
> > test1[unique(test1$x),]
>
>   x y   z
> 5 C 2  no
> 6 C 3  no
> 2 A 2 yes
> 4 B 1 yes
>
> # still no unique values for x
>
> > test1[!duplicated(test1$x),]
>
>   x y   z
> 6 C 3  no
> 2 A 2 yes
> 4 B 1 yes
> 7 D 1  no
>
> # finally I get unique values for x, for the maximum value
> of y (and z). But why does this not work when giving the
> order() and !duplicated() command simultaneously?
> And why does only !duplicated() work, and not unique()?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list