[R] Find last row (observation) for each combination of variables

Leif Kirschenbaum leif at reflectivity.com
Tue Jan 10 22:03:48 CET 2006


Let's say I have a data.frame like
A	B	C	TS	other columns
1	1	1	12345
1	1	1	56789
1	2	1	23456
1	2	2	23457
2	4	7	23458
2	4	7	34567
2	4	7	45678

and I want the last row for each unique combination of A/B/C, where by "last" I mean greatest TS.
A	B	C	TS	other columns
1	1	1	56789
1	2	1	23456
1	2	2	23457
2	4	7	45678

I did this simply in SAS:
 proc sort data=DF;
   by A B C descending TS
 run;
 proc sort data=DF NODUPKEY;
   by A B C;
 run;

I tried using "aggregate" to find the maximum TS for each combination of A/B/C, but it's slow.
I also tried "by" but it's also slow.
My current (faster) solution is:

 DF$abc<-paste(DF$A,DF$B,DF$C,sep="")
 abclist<-unique(DF$ABC)
 numtest<-length(abclist)
 maxTS<-rep(0,numtest)
 for(i in 1:numtest){
  maxTS[i]<-max(DF$TS[DF$abc==abclist[i]],na.rm=TRUE)
 }
 maxTSdf<-data.frame(device=I(abc),maxTS=maxTS )
 DF<-merge(DF,maxTSdf,by="abc",all.x=TRUE)
 DF<-Df[DF$TS==DF$maxTS,,drop=TRUE]
 DF$maxTS<-NULL

This seems a bit lengthy for such a simple task.

Any simpler suggestions?

-Leif K.

Leif Kirschenbaum
Senior Yield Engineer
Reflectivity, Inc.
(408) 737-8100 x307
leif at reflectivity.com




More information about the R-help mailing list