[R] Remove duplicated rows

Gustaf Rydevik gustaf.rydevik at gmail.com
Fri Apr 23 13:20:12 CEST 2010


On Fri, Apr 23, 2010 at 4:05 AM, chrisli1223
<chrisli at austwaterenv.com.au> wrote:
>
> Hi all,
>
> I have a dataset similar to the following
>
> Name    Date    Value
> A       1/01/2000       4
> A       2/01/2000       4
> A       3/01/2000       5
> A       4/01/2000       4
> A       5/01/2000       1
> B       6/01/2000       2
> B       7/01/2000       1
> B       8/01/2000       1
>
> I would like R to remove duplicates based on column 1 and 3 only. In
> addition, I would like R to remove duplicates based on the underlying and
> overlying row only. For example, for A, I would like to remove row 2 only
> and keep row 1, 3 and 4.
>
> I have tried: unique() and replicated(), but I do not have much success. I
> have also tried: dataset<-c(1,diff(dataset)!=0), but I don't know how to
> apply it to this multi-column situation.
>
> Any help would be greatly appreciated.
>
> Thanks in advance,
> Chris
> --



Hi,

This code is a bit ugly, but it works. Hope it helps.
/Gustaf

library(zoo)
test<-read.table("clipboard",header=T)
test$code<-paste(test$Name,test$Value,sep="")

drop.ndx<-rollapply(zoo(test$code),3,function(x)(x[2]%in%c(x[1],x[3])))

drop.ndx<-c(FALSE,drop.ndx,FALSE)
test[!drop.ndx,]



-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik



More information about the R-help mailing list