[R] difference between unique() and !duplicated()
T.Lok
T.Lok at rug.nl
Thu Sep 13 11:47:50 CEST 2007
Yesterday I spend the whole day struggling on how to get
the maximum value of "y" for every unique value of "x"
from the dataframe "test". In the R Book (Crawley, 2007)
an example of this can be found on page 121. I tried to do
it this way, but I failed.
In the end, I figured out how to get it working (first
order, and afterwards use !duplicated()). My question is:
why does it not work with the unique() function on p. 121
(
i.e. test[rev(order(x)),][unique(y),]) ?
As a simple example, I used to following syntax:
> x <- c("A","A","B","B","C","C","D")
> y <- c(1,2,1,1,2,3,1)
> z <- c("yes","yes","no","yes","no","no","no")
> test <- data.frame(x,y,z)
> test
x y z
1 A 1 yes
2 A 2 yes
3 B 1 no
4 B 1 yes
5 C 2 no
6 C 3 no
7 D 1 no
> test[rev(order(test$y, test$z)),][unique(test$x),]
x y z
6 C 3 no
2 A 2 yes
5 C 2 no
4 B 1 yes
# this clearly does not give a unique value for x, since
there are 2 C's and no D!
> test[rev(order(test$y, test$z)),][!duplicated(test$x),]
x y z
6 C 3 no
5 C 2 no
1 A 1 yes
3 B 1 no
# this also doesn't work
# then I thought, maybe first use the order() function,
then unique()
> test[rev(order(test$y, test$z)),]
x y z
6 C 3 no
2 A 2 yes
5 C 2 no
4 B 1 yes
1 A 1 yes
7 D 1 no
3 B 1 no
> test1 <- test[rev(order(test$y, test$z)),]
> test1[unique(test1$x),]
x y z
5 C 2 no
6 C 3 no
2 A 2 yes
4 B 1 yes
# still no unique values for x
> test1[!duplicated(test1$x),]
x y z
6 C 3 no
2 A 2 yes
4 B 1 yes
7 D 1 no
# finally I get unique values for x, for the maximum value
of y (and z). But why does this not work when giving the
order() and !duplicated() command simultaneously?
And why does only !duplicated() work, and not unique()?
More information about the R-help
mailing list