[R] Removing duplicates without a for loop

Rui Barradas ruipbarradas at sapo.pt
Wed Sep 26 20:30:05 CEST 2012


Sorry, but in my previous post I've confused the columns. It's by 
REQ.NR, not by date

REQ.NR <- 1:4
REQ.NR <- c(REQ.NR, sample(REQ.NR, 2))
dat <- data.frame(date = Sys.Date() + 1:6, REQ.NR = REQ.NR, value = 
rnorm(6))

aggregate(dat, by = list(dat$REQ.NR), FUN = tail, 1)

Rui Barradas
Em 26-09-2012 16:19, wwreith escreveu:
>   I have several thousand rows of shipment data imported into R as a data
> frame, with two columns of particular interest, col 1 is the entry date, and
> col 2 is the tracking number (colname is REQ.NR). Tracking numbers should be
> unique but on occassion aren't because they get entered more than once. This
> creates two or more rows of with the same tracking number but different
> dates. I wrote a for loop that will keep the row with the oldest date but it
> is extremely slow.
>
> Any suggestions of how I should write this so that it is faster?
>
> # Creates a vector of on the unique tracking numbers #
> u<-na.omit(unique(Para.5C$REQ.NR))
>
> # Create Data Frame to rbind unique rows to #
> Para.5C.final<-data.frame()
>
> # For each value in u subset Para.5C find the min date and rbind it to
> Para.5C.final #
> for(i in 1:length(u))
> {
>    x<-subset(Para.5C,Para.5C$REQ.NR==u[i])
>    Para.5C.final<-rbind(Para.5C.final,x[which(x[,1]==min(x[,1])),])
> }
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Removing-duplicates-without-a-for-loop-tp4644255.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list