[R] Problems using unique function and !duplicated
(Ted Harding)
efh at wlandres.net
Mon Feb 28 17:20:02 CET 2011
On 28-Feb-11 15:51:17, JonC wrote:
> Hi, I am trying to simultaneously remove duplicate variables from two
> or more
> variables in a small R data.frame. I am trying to reproduce the SAS
> statements from a Proc Sort with Nodupkey for those familiar with SAS.
>
> Here's my example data :
>
> test <- read.csv("test.csv", sep=",", as.is=TRUE)
>> test
> date var1 var2 num1 num2
> 1 28/01/11 a 1 213 71
> 2 28/01/11 b 1 141 47
> 3 28/01/11 c 2 867 289
> 4 29/01/11 a 2 234 78
> 5 29/01/11 b 2 666 222
> 6 29/01/11 c 2 912 304
> 7 30/01/11 a 3 417 139
> 8 30/01/11 b 3 108 36
> 9 30/01/11 c 2 288 96
>
> I am trying to obtain the following, where duplicates of date AND var2
> are removed from the above data.frame.
>
> date var1 var2 num1 num2
> 28/01/2011 a 1 213 71
> 28/01/2011 c 2 867 289
> 29/01/2011 a 2 234 78
> 30/01/2011 c 2 288 96
> 30/01/2011 a 3 417 139
>
>
>
> If I use the !duplicated function with one variable everything works
> fine.
> However I wish to remove duplicates of both Date and var2.
>
> test[!duplicated(test$date),]
> date var1 var2 num1 num2
> 1 0011-01-28 a 1 213 71
> 4 0011-01-29 a 2 234 78
> 7 0011-01-30 a 3 417 139
>
> test2 <- test[!duplicated(test$date),!duplicated(test$var2),]
> Error in `[.data.frame`(test, !duplicated(test$date),
> !duplicated(test$var2), : undefined columns selected
> I got different errors when using the unique() function.
>
> Can anybody solve this ?
>
> Thanks in advance.
> Jon
The following gives what you state you wish to obtain (though
not quite in the same order of rows. Call the original dataframe 'df':
df
# date var1 var2 num1 num2
# 1 28/01/11 a 1 213 71
# 2 28/01/11 b 1 141 47
# 3 28/01/11 c 2 867 289
# 4 29/01/11 a 2 234 78
# 5 29/01/11 b 2 666 222
# 6 29/01/11 c 2 912 304
# 7 30/01/11 a 3 417 139
# 8 30/01/11 b 3 108 36
# 9 30/01/11 c 2 288 96
ix <-which(duplicated(data.frame(df$date,df$var2)))
ix
# [1] 2 5 6 8
df[-ix,]
# date var1 var2 num1 num2
# 1 28/01/11 a 1 213 71
# 3 28/01/11 c 2 867 289
# 4 29/01/11 a 2 234 78
# 7 30/01/11 a 3 417 139
# 9 30/01/11 c 2 288 96
Does this help?
Ted.
PS I'm posting this from a temporarily subscribed alternative
address (for testing purposes) instead of my usual
ted.harding at wlandres.net
--------------------------------------------------------------------
E-Mail: (Ted Harding) <efh at wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 28-Feb-11 Time: 16:19:59
------------------------------ XFMail ------------------------------
More information about the R-help
mailing list