[R] inclusion criteria help

Thomas Lumley tlumley at u.washington.edu
Tue Nov 27 17:24:20 CET 2001


On Tue, 27 Nov 2001, Aaron J Mackey wrote:

>
> I have a dataset that looks like this (many other variables not
> shown. including a unique row identifier "id"):
>
> > summary(hits)
>     query               lib               coverage         percid
>  Length:80664       Length:80664       Min.   :0.080   Min.   :0.2250
>  Mode  :character   Mode  :character   1st Qu.:0.980   1st Qu.:0.8160
>                                        Median :1.000   Median :0.9230
>                                        Mean   :0.946   Mean   :0.8536
>                                        3rd Qu.:1.000   3rd Qu.:0.9900
>                                        Max.   :1.000   Max.   :1.0000
>
> For any query/lib combination there may be 1 or more rows of data. I'd
> like to be able to specify only the rows for each query/lib combination
> that have the maximum (or minimum or whatever) coverage or percid or some
> other data element, and carry along the other corresponding data elements
> from that same row.
>
> I know I can do this procedurally in a loop:
>
<snip: he does it>
>
> So, how could I accomplish the same plot as above without the looping and
> creating a new dataframe?

Well, for one query you can do the subsetting like

	hits[hits$coverage == max(hits$coverage),]

so for many queries you could tapply() or by() this process

  filter<-function(this.subset){
	this.subset[this.subset$coverage==max(this.subset$coverage),]
	}
  filtered<-by(hits, hits$query,filter)

This produces a list of dataframes, so you want to staple them back
together
  filtered<-do.call("rbind",filtered)

In the case of maximum or minimum it would be faster to use the
which.max/which.min functions instead of the expression
    hits$coverage == max(hits$coverage)

This may or may not be faster than the loop, but it should be easier to
read (at least if you understand by()).


	-thomas


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list