[BioC] Problems Counting Reads with summarizeOverlaps
Valerie Obenchain
vobencha at fhcrc.org
Sun Dec 8 19:31:05 CET 2013
summarizeOverlaps() does count paired-end reads. Use option
singleEnd=FALSE. Details under 'singleEnd' in the 'Arguments' section on
the man page.
library(GenomicAlignments) ## devel
library(GenomicRanges) ## release
?summarizeOverlaps
Valerie
On 12/08/13 09:42, Reema Singh wrote:
> Hi Sam,
>
> It depends on whether you are using single read or paired end reads
> files. As far as I known summarizeOverlaps function only working in case
> of single read. Anyone please correct me if I am wrong.
>
> Kind Regards
>
>
> On Sun, Dec 8, 2013 at 8:25 PM, Valerie Obenchain <vobencha at fhcrc.org
> <mailto:vobencha at fhcrc.org>> wrote:
>
> Hi Sam,
>
> Are you getting any error messages related to seqlevels when you
> count? If you have confirmed that annotation seqlevels match the bam
> files next look at the maximum possible overlap in a single bam file.
>
> bf <- singleBamFile
> reads <- readGAlignments(bf) ## assuming single-end reads
> so <- summarizeOverlaps(tx, reads, mode=Union, inter.feature=FALSE)
>
> 'inter.feature=FALSE' with 'mode=Union' is countOverlaps with
> 'type=ANY'. Reads that hit multiple features will be double counted
> in this case. The idea to to see if there are any common regions at all.
>
> If there are still no counts, you could look more closely at a
> smaller chromosome region in 'reads' where you would expect counts.
> This visual inspection might give you some insight as to why there
> is no overlap.
>
>
> Valerie
>
>
> On 12/05/13 18:51, Sam McInturf wrote:
>
> Hello Bioconductors,
> I am working on an Arabidopsis RNA seq set, and after
> executing my count
> reads script, I get a count table with no reads counted (ie
> sum(colSums(mat)) = 0).
>
> I mapped my reads using Tophat2 (without supplying a gtf/gff
> file) and used
> samtools to sort and index my accepted_hits.bam files. I loaded
> these bam
> files into IGV (integrated Genome Browser) and the reads appear
> to have
> been mapped (additionally the align_summary.txt says I have good
> mapping).
> Script and session info below.
>
> Essentially I use Rsamtools to load in my bamfiles, and then
> makeTranscriptsFromGFF to make a txdb object, transcriptsBy to get
> transcripts, and then summarizeOverlaps to count the reads. My
> gff file is
> TAIR10. I am relatively sure that the genome I mapped to was
> the TAIR10
> release, but I am not 100% on that.
>
> I have also used biomaRt to do this (commented out), but I had
> the same
> results.
>
> I am currently upgrading to R 3.0.2 and re-installing
> bioconductor (maybe
> it will change something?)
>
> Thanks for any thoughts!
>
> Script:
>
> gffdir <- "/data/smcintur/Annotation/__TAIR10_GFF3_genes.gff";
> library(ShortRead)
> library(GenomicFeatures)
> library(Rsamtools)
> #library(biomaRt)
> setwd("/data/smcintur/RNASeq/__NMseq/newTophat/BamFiles/")
> bf <- list.files(pattern = ".bam$")
> bi <- list.files(pattern = "bam.bai$")
> bfl <- BamFileList(bf, bi)
>
> #mart <- makeTranscriptDbFromBiomart(__biomart =
> "Public_TAIRV10", dataset=
> "tairv10")
> tx <- transcriptsBy(mart)
> gff <- makeTranscriptDbFromGFF(__gffdir, format = "gff")
> gff
>
> cds <- cdsBy(gff)
> tx <- transcriptsBy(gff)
>
> olapTx <- summarizeOverlaps(tx, bfl)
> olapCds <- summarizeOverlaps(cds, bfl)
>
> txMat <- assays(olapTx)$counts
> cdsMat <- assays(olapCds)$counts
>
> write.table(txMat, file = "countMatTxGff.txt", sep = "\t")
> write.table(cdsMat, file = "countMatCdsGff.txt", sep = "\t")
>
> sessionInfo
> R version 3.0.0 (2013-04-03)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] BiocInstaller_1.12.0
>
> loaded via a namespace (and not attached):
> [1] AnnotationDbi_1.24.0 Biobase_2.22.0
> BiocGenerics_0.8.0
> [4] biomaRt_2.18.0 Biostrings_2.30.1 bitops_1.0-6
> [7] BSgenome_1.30.0 DBI_0.2-7
> GenomicFeatures_1.14.2
> [10] GenomicRanges_1.14.3 IRanges_1.20.6 parallel_3.0.0
> [13] RCurl_1.95-4.1 Rsamtools_1.14.2 RSQLite_0.11.4
> [16] rtracklayer_1.22.0 stats4_3.0.0 tools_3.0.0
> [19] XML_3.98-1.1 XVector_0.2.0 zlibbioc_1.8.0
>
>
> _________________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/__listinfo/bioconductor
> <https://stat.ethz.ch/mailman/listinfo/bioconductor>
> Search the archives:
> http://news.gmane.org/gmane.__science.biology.informatics.__conductor <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>
>
> --
> Reema Singh
> PhD Scholar
> Computational Biology and Bioinformatics
> School of Computational and Integrative Sciences
> Jawaharlal Nehru University
> New Delhi-110067
> INDIA
More information about the Bioconductor
mailing list