[R] Removing duplicates without a for loop

Wed Sep 26 23:25:26 CEST 2012

On Sep 26, 2012, at 11:23 AM, Rui Barradas wrote:

> Hello,
> 
> If I understand it correctly, something like this will get you what you want.
> 
> 
> d <- Sys.Date() + 1:4
> d2 <- sample(d, 2)
> dat <- data.frame(id = 1:6, date = c(d, d2), value = rnorm(6))
> 
> aggregate(dat, by = list(dat$date), FUN = tail, 1)

If these are sorted by date, then the oldest date would come first any you would want:

 aggregate(dat, by = list(dat$date), FUN = head, 1)
-- 
David.
> 
> Hope this helps,
> 
> Rui Barradas
> Em 26-09-2012 16:19, wwreith escreveu:
>>  I have several thousand rows of shipment data imported into R as a data
>> frame, with two columns of particular interest, col 1 is the entry date, and
>> col 2 is the tracking number (colname is REQ.NR). Tracking numbers should be
>> unique but on occassion aren't because they get entered more than once. This
>> creates two or more rows of with the same tracking number but different
>> dates. I wrote a for loop that will keep the row with the oldest date but it
>> is extremely slow.
>> 
>> Any suggestions of how I should write this so that it is faster?
>> 
>> # Creates a vector of on the unique tracking numbers #
>> u<-na.omit(unique(Para.5C$REQ.NR))
>> 
>> # Create Data Frame to rbind unique rows to #
>> Para.5C.final<-data.frame()
>> 
>> # For each value in u subset Para.5C find the min date and rbind it to
>> Para.5C.final #
>> for(i in 1:length(u))
>> {
>>   x<-subset(Para.5C,Para.5C$REQ.NR==u[i])
>>   Para.5C.final<-rbind(Para.5C.final,x[which(x[,1]==min(x[,1])),])
>> }
>> 
-- 

David Winsemius, MD
Alameda, CA, USA