[R] inclusion criteria help
Thomas Lumley
tlumley at u.washington.edu
Tue Nov 27 17:24:20 CET 2001
On Tue, 27 Nov 2001, Aaron J Mackey wrote:
>
> I have a dataset that looks like this (many other variables not
> shown. including a unique row identifier "id"):
>
> > summary(hits)
> query lib coverage percid
> Length:80664 Length:80664 Min. :0.080 Min. :0.2250
> Mode :character Mode :character 1st Qu.:0.980 1st Qu.:0.8160
> Median :1.000 Median :0.9230
> Mean :0.946 Mean :0.8536
> 3rd Qu.:1.000 3rd Qu.:0.9900
> Max. :1.000 Max. :1.0000
>
> For any query/lib combination there may be 1 or more rows of data. I'd
> like to be able to specify only the rows for each query/lib combination
> that have the maximum (or minimum or whatever) coverage or percid or some
> other data element, and carry along the other corresponding data elements
> from that same row.
>
> I know I can do this procedurally in a loop:
>
<snip: he does it>
>
> So, how could I accomplish the same plot as above without the looping and
> creating a new dataframe?
Well, for one query you can do the subsetting like
hits[hits$coverage == max(hits$coverage),]
so for many queries you could tapply() or by() this process
filter<-function(this.subset){
this.subset[this.subset$coverage==max(this.subset$coverage),]
}
filtered<-by(hits, hits$query,filter)
This produces a list of dataframes, so you want to staple them back
together
filtered<-do.call("rbind",filtered)
In the case of maximum or minimum it would be faster to use the
which.max/which.min functions instead of the expression
hits$coverage == max(hits$coverage)
This may or may not be faster than the loop, but it should be easier to
read (at least if you understand by()).
-thomas
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list