[R] Subsetting data frames without a loop
Uwe Ligges
ligges at statistik.uni-dortmund.de
Wed Jan 16 09:44:28 CET 2002
Michael_Nielsen/Syd/Synergy.SYNERGY at synergy.com.au wrote:
>
> I KNOW this should be easy, but I'm stuck.
>
> My data frame consists of multiple observations from each of a number of
> stations, and what I would like to do is create another data frame that
> contains all the variables of the first, but only rows where a certain
> variable is at its maximum for the station.
>
> So, for example:
>
> > my.df
> stn obs v
> 1 1 1 0.26400396
> 2 2 1 -0.79194397
> 3 3 1 0.11924528
> 4 4 1 0.42596859
> 5 5 1 -0.50528235
> 6 1 2 -1.57524853
> 7 2 2 0.17762482
> 8 3 2 -0.83013770
> 9 4 2 -0.53203400
> 10 5 2 -2.71397275
> 11 1 3 0.26902053
> 12 2 3 2.01147908
> 13 3 3 0.73301643
> 14 4 3 -0.67333384
> 15 5 3 -1.36219773
> 16 1 4 -2.20342109
> 17 2 4 0.18941702
> 18 3 4 0.51492032
> 19 4 4 0.03597370
> 20 5 4 -1.43502366
> 21 1 5 -1.34589392
> 22 2 5 1.00389195
> 23 3 5 -0.21233041
> 24 4 5 -1.35141044
> 25 5 5 -0.02052348
>
> > tapply(v,factor(stn),max)
> 1 2 3 4 5
> 0.26902053 2.01147908 0.73301643 0.42596859 -0.02052348
>
> so my new data frame should contain (possibly multiple rows per station)
>
> stn obs v
> 1 1 3 0.26902053
> 2 2 3 2.01147908
> 3 3 3 0.73301643
> 4 4 1 0.42596859
> 5 5 5 -0.02052348
As a first idea:
my.df[tapply(v,factor(stn), function(x) which(v==max(x))),]
Uwe
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list