[BioC] nearly exact overlaps between GAlignmentPairs and GenomicRanges
Georg Otto
georg.otto at imm.ox.ac.uk
Fri Aug 22 19:40:06 CEST 2014
Michael Lawrence <michafla at gene.com> writes:
> But to get exactly what you want, we need to use type="within" with
> findOverlaps() and then filter the hits for cases where the overhang is
> below some tolerance (3bp):
>
> reads <- granges(galp) # simplifies things a bit
> hits <- findOverlaps(reads, ranges, type="within")
> overhang <- width(ranges[subjectHits(hits)]) - width(reads[queryHits(hits)])
> hits <- hits[overhang < 3L]
>
> Now we need the count vector:
>
> counts <- countSubjectHits(hits)
>
> Then it's just a simple exercise to insert that code into a function that
> conforms to the expectations of the "mode" parameter.
>
Thanks a lot for your help. I have problems, however, to wrap this into
a mode function. Here is what I have done:
> ex1_file <- system.file("extdata", "ex1.bam", package="Rsamtools")
> galp <- readGAlignmentPairs(ex1_file, use.names=TRUE)
> galp[3:4]
GAlignmentPairs with 2 alignment pairs and 0 metadata columns:
seqnames strand : ranges -- ranges
<Rle> <Rle> : <IRanges> -- <IRanges>
EAS54_65:3:321:311:983 seq1 + : [51, 85] -- [228, 262]
B7_591:5:42:540:501 seq1 + : [60, 95] -- [224, 259]
---
seqlengths:
seq1 seq2
1575 1584
> ranges <- GRanges(c("seq1", "seq1"), IRanges(c(50, 57), end = c(263, 260)))
> ranges
GRanges with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] seq1 [50, 263] *
[2] seq1 [57, 260] *
---
seqlengths:
seq1
NA
> summary <-summarizeOverlaps(ranges, galp[3:4], mode = "IntersectionStrict",
+ inter.feature = FALSE, fragments = FALSE)
Warning message:
In if (all(strand(reads) == "*")) ignore.strand <- TRUE :
the condition has length > 1 and only the first element will be used
> assays(summary)$counts
reads
[1,] 2
[2,] 1
However, I want the first feature only to be matched by galp[3] so I use
this function:
> intersectFun <-function (features, reads, ignore.strand = FALSE, inter.feature = TRUE) {
+
+ reads <- granges(reads)
+ hits <- findOverlaps(reads, features, type="within")
+ overhang <- width(ranges[subjectHits(hits)]) - width(reads[queryHits(hits)])
+ hits <- hits[overhang < 3L]
+ counts <- countSubjectHits(hits)
+ return(counts)
+ }
> summary <-summarizeOverlaps(ranges, galp, mode = "intersectFun",
+ inter.feature = FALSE, fragments = FALSE)
Error in (function (classes, fdef, mtable) (from #3) :
unable to find an inherited method for function ‘granges’ for signature ‘"GRangesList"’
In addition: Warning message:
In if (all(strand(reads) == "*")) ignore.strand <- TRUE :
the condition has length > 1 and only the first element will be used
Apparently, the summarizeOverlaps converts the input GAlignmentPairs
object into a GRangesList, which then can not be used by the mode
function. Sorry, I do not have a deep understanding of these data
structures. Could you help me with a workaround?
Best wishes,
Georg
More information about the Bioconductor
mailing list