[BioC] Getting first or last hit from a Hits object
Harris A. Jaffee
hj at jhu.edu
Mon Oct 21 00:17:33 CEST 2013
I needed essentially your "last_hits" for bumphunter:::fuzzy.match2
In your notation, it comes out:
last_hits_rows <- cumsum(rle(queryHits(all_hits))$lengths)
last_hits <- all_hits[last_hits_rows,]
Having expended that effort, you could do
first_hits_rows <- c(1, 1 + last_hits_rows[-length(query)])
Maybe there's a better way. I have not compared these to anything of
Hector's.
On Oct 20, 2013, at 4:32 PM, Peter Hickey <hickey at wehi.EDU.AU> wrote:
> Hi all,
>
> Suppose I've created a Hits object by
> all_hits <- findOverlaps(query, subject, select = 'all', type = 'any')
> and now I want to get the first (or last) hit for each query. Of course, I could use
> first_hits <- findOverlaps(query, subject, select = 'first', type = 'any')
> last_hits <- findOverlaps(query, subject, select = 'last', type = 'any')
> but I this isn't necessary as all the required information is in the all_hits Hits object.
>
> I was unable to find a function to do what I wanted, however, based at the source for findOverlaps-methods.R (actually Hector's branch at https://github.com/hcorrada/GenomicRanges/blob/master/R/findOverlaps-methods.R) I wrote the following simple functions
> getFirstHit <- function(all_hits){
> ans <- rep.int(NA_integer_, queryLength(all_hits))
> oo <- IRanges:::orderIntegerPairs(queryHits(all_hits), subjectHits(all_hits), decreasing = TRUE)
> ans[queryHits(all_hits)[oo]] <- subjectHits(all_hits)[oo]
> return(ans)
> }
> getLastHit <- function(all_hits){
> ans <- rep.int(NA_integer_, queryLength(all_hits))
> ans[queryHits(all_hits)] <- subjectHits(all_hits)
> return(ans)
> }
>
> These do what I want and are faster than re-running findOverlaps for my use case (millions of query intervals with tens of thousands of subject intervals)
>
> Now to my questions:
> (1) Did I just overlook built-in "getFirstHit" and "getLastHit" functions?
> (2) Would others find "getFirstHit" and "getLastHit" useful enough to justify their inclusion in GenomicRanges (or IRanges, wherever it belongs)?
> (3) I recall some discussion on this mailing list and Rdevel stating that using ":::" to access non-exported functions isn't a great idea for use in package code. If I need to include my own "getFirstHit" would someone be so kind as to point me to the best advice for avoiding calls to ":::" in my code to access IRanges:::orderIntegerPairs(queryHits)?
>
> Many thanks,
> Pete
> --------------------------------
> Peter Hickey,
> PhD Student/Research Assistant,
> Bioinformatics Division,
> Walter and Eliza Hall Institute of Medical Research,
> 1G Royal Parade, Parkville, Vic 3052, Australia.
> Ph: +613 9345 2324
>
> hickey at wehi.edu.au
> http://www.wehi.edu.au
>
>
> ______________________________________________________________________
> The information in this email is confidential and intend...{{dropped:8}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list