[R] Removing duplicates without a for loop

Rui Barradas ruipbarradas at sapo.pt
Wed Sep 26 20:23:58 CEST 2012


Hello,

If I understand it correctly, something like this will get you what you 
want.


d <- Sys.Date() + 1:4
d2 <- sample(d, 2)
dat <- data.frame(id = 1:6, date = c(d, d2), value = rnorm(6))

aggregate(dat, by = list(dat$date), FUN = tail, 1)

Hope this helps,

Rui Barradas
Em 26-09-2012 16:19, wwreith escreveu:
>   I have several thousand rows of shipment data imported into R as a data
> frame, with two columns of particular interest, col 1 is the entry date, and
> col 2 is the tracking number (colname is REQ.NR). Tracking numbers should be
> unique but on occassion aren't because they get entered more than once. This
> creates two or more rows of with the same tracking number but different
> dates. I wrote a for loop that will keep the row with the oldest date but it
> is extremely slow.
>
> Any suggestions of how I should write this so that it is faster?
>
> # Creates a vector of on the unique tracking numbers #
> u<-na.omit(unique(Para.5C$REQ.NR))
>
> # Create Data Frame to rbind unique rows to #
> Para.5C.final<-data.frame()
>
> # For each value in u subset Para.5C find the min date and rbind it to
> Para.5C.final #
> for(i in 1:length(u))
> {
>    x<-subset(Para.5C,Para.5C$REQ.NR==u[i])
>    Para.5C.final<-rbind(Para.5C.final,x[which(x[,1]==min(x[,1])),])
> }
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Removing-duplicates-without-a-for-loop-tp4644255.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list