[R] Problems using unique function and !duplicated

JonC jon_d_cooke at yahoo.co.uk
Mon Feb 28 16:51:17 CET 2011


Hi, I am trying to simultaneously remove duplicate variables from two or more
variables in a small R data.frame. I am trying to reproduce the SAS
statements from a Proc Sort with Nodupkey for those familiar with SAS. 

Here's my example data : 

test <- read.csv("test.csv", sep=",", as.is=TRUE)
> test
      date var1 var2 num1 num2
1 28/01/11    a    1  213   71
2 28/01/11    b    1  141   47
3 28/01/11    c    2  867  289
4 29/01/11    a    2  234   78
5 29/01/11    b    2  666  222
6 29/01/11    c    2  912  304
7 30/01/11    a    3  417  139
8 30/01/11    b    3  108   36
9 30/01/11    c    2  288   96

I am trying to obtain the following, where duplicates of date AND var2 are
removed from the above data.frame.

date          	var1	var2	num1	num2
28/01/2011	a	1	213	       71
28/01/2011	c	2	867	       289
29/01/2011	a	2	234	       78
30/01/2011	c	2	288	       96
30/01/2011	a	3	417	       139



If I use the !duplicated function with one variable everything works fine.
However I wish to remove duplicates of both Date and var2.

 test[!duplicated(test$date),]
        date var1 var2 num1 num2
1 0011-01-28    a    1  213   71
4 0011-01-29    a    2  234   78
7 0011-01-30    a    3  417  139

test2 <- test[!duplicated(test$date),!duplicated(test$var2),]
Error in `[.data.frame`(test, !duplicated(test$date),
!duplicated(test$var2),  :   undefined columns selected

I get an error ? 
I got different errors when using the unique() function. 

Can anybody solve this ? 

Thanks in advance.

Jon


-- 
View this message in context: http://r.789695.n4.nabble.com/Problems-using-unique-function-and-duplicated-tp3328150p3328150.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list