[BioC] nearly exact overlaps between GAlignmentPairs and GenomicRanges
Georg Otto
georg.otto at imm.ox.ac.uk
Thu Aug 21 21:59:45 CEST 2014
Dear all,
I have a problem finding nerly exact overlaps between read pairs and
genomic regions. I try this by using GAlignmentPairs and GenomicRanges:
> library(Rsamtools) ex1_file <- system.file("extdata", "ex1.bam",
> package="Rsamtools") galp <- readGAlignmentPairs(ex1_file, use.names=TRUE)
> galp[3:4]
GAlignmentPairs with 2 alignment pairs and 0 metadata columns:
seqnames strand : ranges -- ranges
<Rle> <Rle> : <IRanges> -- <IRanges>
EAS54_65:3:321:311:983 seq1 + : [51, 85] -- [228, 262]
B7_591:5:42:540:501 seq1 + : [60, 95] -- [224, 259]
---
seqlengths:
seq1 seq2 1575 1584
> ranges <- GRanges(c("seq1", "seq1"), IRanges(c(50, 57), end = c(263, 260)))
> ranges
GRanges with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] seq1 [50, 263] *
[2] seq1 [57, 260] *
---
seqlengths:
seq1
NA
> summary <-summarizeOverlaps(ranges, galp[3:4], mode = "IntersectionStrict",
+ inter.feature = FALSE, fragments = FALSE)
Warning message:
In if (all(strand(reads) == "*")) ignore.strand <- TRUE :
the condition has length > 1 and only the first element will be used
> assays(summary)$counts
reads
[1,] 2
[2,] 1
I am not entirely sure what the warning means, but my point is a
different one: I use the "IntersectStrict" mode because I only want to count read
pairs that are completely within the interval. This is what I
get, but I want to use a more strict selection for "almost equal",
i.e. intervals that are the same size or only slightly larger (say 3 bp)
than the range of the read pairs. So ranges[1] should count galp[3], but
not galp[4]
Any hint will be appreciated....
Best wishes,
Georg
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] Rsamtools_1.14.3 Biostrings_2.30.1 GenomicRanges_1.14.4
[4] XVector_0.2.0 IRanges_1.20.7 BiocGenerics_0.8.0
loaded via a namespace (and not attached):
[1] bitops_1.0-6 compiler_3.0.1 stats4_3.0.1 tools_3.0.1 zlibbioc_1.8.0
More information about the Bioconductor
mailing list