[R] Removing duplicates without a for loop
David Winsemius
dwinsemius at comcast.net
Wed Sep 26 23:25:26 CEST 2012
On Sep 26, 2012, at 11:23 AM, Rui Barradas wrote:
> Hello,
>
> If I understand it correctly, something like this will get you what you want.
>
>
> d <- Sys.Date() + 1:4
> d2 <- sample(d, 2)
> dat <- data.frame(id = 1:6, date = c(d, d2), value = rnorm(6))
>
> aggregate(dat, by = list(dat$date), FUN = tail, 1)
If these are sorted by date, then the oldest date would come first any you would want:
aggregate(dat, by = list(dat$date), FUN = head, 1)
--
David.
>
> Hope this helps,
>
> Rui Barradas
> Em 26-09-2012 16:19, wwreith escreveu:
>> I have several thousand rows of shipment data imported into R as a data
>> frame, with two columns of particular interest, col 1 is the entry date, and
>> col 2 is the tracking number (colname is REQ.NR). Tracking numbers should be
>> unique but on occassion aren't because they get entered more than once. This
>> creates two or more rows of with the same tracking number but different
>> dates. I wrote a for loop that will keep the row with the oldest date but it
>> is extremely slow.
>>
>> Any suggestions of how I should write this so that it is faster?
>>
>> # Creates a vector of on the unique tracking numbers #
>> u<-na.omit(unique(Para.5C$REQ.NR))
>>
>> # Create Data Frame to rbind unique rows to #
>> Para.5C.final<-data.frame()
>>
>> # For each value in u subset Para.5C find the min date and rbind it to
>> Para.5C.final #
>> for(i in 1:length(u))
>> {
>> x<-subset(Para.5C,Para.5C$REQ.NR==u[i])
>> Para.5C.final<-rbind(Para.5C.final,x[which(x[,1]==min(x[,1])),])
>> }
>>
--
David Winsemius, MD
Alameda, CA, USA
More information about the R-help
mailing list