[BioC] summarizeOverlaps question
James W. MacDonald
jmacdon at uw.edu
Wed Aug 29 18:39:39 CEST 2012
I am getting results from summarizeOverlaps(<GrangesList>,
<BamFileList>) that I don't understand. Here is the way I normally use
the function:
library(Rsamtools)
library(TxDb.Mmusculus.UCSC.mm9.knownGene)
ex <- exonsBy(TxDb.Mmusculus.UCSC.mm9.knownGene, use.names = TRUE)
samps_casava <- BamFileList(<path to bamfiles>)
olaps <- summarizeOverlaps(ex, samps_casava, mode = "IntersectionNotEmpty")
and then as an example, check the counts for uc007amp.2
> assays(olaps)$counts["uc007amp.2",, drop=F]
GG1_ATCACG_L002.uniques.sorted.bam
uc007amp.2 1
GG2_TTAGGC_L002.uniques.sorted.bam
uc007amp.2 1
GG3_ACTTGA_L002.uniques.sorted.bam
uc007amp.2 0
GG4_GATCAG_L002.uniques.sorted.bam
uc007amp.2 0
GG5_TAGCTT_L003.uniques.sorted.bam
uc007amp.2 0
GG6_GGCTAC_L003.uniques.sorted.bam
uc007amp.2 1
GG7_GTGGCC_L003.uniques.sorted.bam
uc007amp.2 0
GG8_GTTTCG_L003.uniques.sorted.bam
uc007amp.2 0
GG9_CGTACG_L004.uniques.sorted.bam
uc007amp.2 0
GG10_GAGTGG_L004.uniques.sorted.bam
uc007amp.2 0
GG11_ACTGAT_L004.uniques.sorted.bam
uc007amp.2 1
GG12_ATTCCT_L004.uniques.sorted.bam
uc007amp.2 3
However, this doesn't make any sense to me, as there are actually tons
of reads that overlap this transcript (I know this from looking at the
transcript and these data using IGV). So let's just summarize the
overlaps for that transcript alone.
> ex2 <- ex["uc007amp.2"]
> olaps2 <- summarizeOverlaps(ex2, samps_casava, mode =
"IntersectionNotEmpty")
> assays(olaps2)$counts
GG1_ATCACG_L002.uniques.sorted.bam
uc007amp.2 45
GG2_TTAGGC_L002.uniques.sorted.bam
uc007amp.2 90
GG3_ACTTGA_L002.uniques.sorted.bam
uc007amp.2 70
GG4_GATCAG_L002.uniques.sorted.bam
uc007amp.2 64
GG5_TAGCTT_L003.uniques.sorted.bam
uc007amp.2 111
GG6_GGCTAC_L003.uniques.sorted.bam
uc007amp.2 26
GG7_GTGGCC_L003.uniques.sorted.bam
uc007amp.2 34
GG8_GTTTCG_L003.uniques.sorted.bam
uc007amp.2 36
GG9_CGTACG_L004.uniques.sorted.bam
uc007amp.2 98
GG10_GAGTGG_L004.uniques.sorted.bam
uc007amp.2 153
GG11_ACTGAT_L004.uniques.sorted.bam
uc007amp.2 164
GG12_ATTCCT_L004.uniques.sorted.bam
uc007amp.2 174
Any ideas where I am going wrong?
Best,
Jim
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list