Hi Sam,

It depends on whether you are using single read or paired end reads files.
As far as I known summarizeOverlaps function only working in case of single
read. Anyone please correct me if I am wrong.

Kind Regards


On Sun, Dec 8, 2013 at 8:25 PM, Valerie Obenchain <vobencha@fhcrc.org>wrote:

> Hi Sam,
>
> Are you getting any error messages related to seqlevels when you count? If
> you have confirmed that annotation seqlevels match the bam files next look
> at the maximum possible overlap in a single bam file.
>
>   bf <- singleBamFile
>   reads <- readGAlignments(bf)  ## assuming single-end reads
>   so <- summarizeOverlaps(tx, reads, mode=Union, inter.feature=FALSE)
>
> 'inter.feature=FALSE' with 'mode=Union' is countOverlaps with 'type=ANY'.
> Reads that hit multiple features will be double counted in this case. The
> idea to to see if there are any common regions at all.
>
> If there are still no counts, you could look more closely at a smaller
> chromosome region in 'reads' where you would expect counts. This visual
> inspection might give you some insight as to why there is no overlap.
>
>
> Valerie
>
>
> On 12/05/13 18:51, Sam McInturf wrote:
>
>> Hello Bioconductors,
>>     I am working on an Arabidopsis RNA seq set, and after executing my
>> count
>> reads script, I get a count table with no reads counted (ie
>> sum(colSums(mat)) = 0).
>>
>> I mapped my reads using Tophat2 (without supplying a gtf/gff file) and
>> used
>> samtools to sort and index my accepted_hits.bam files.  I loaded these bam
>> files into IGV (integrated Genome Browser) and the reads appear to have
>> been mapped (additionally the align_summary.txt says I have good mapping).
>>   Script and session info below.
>>
>> Essentially I use Rsamtools to load in my bamfiles, and then
>> makeTranscriptsFromGFF to make a txdb object, transcriptsBy to get
>> transcripts, and then summarizeOverlaps to count the reads.  My gff file
>> is
>> TAIR10.  I am relatively sure that the genome I mapped to was the TAIR10
>> release, but I am not 100% on that.
>>
>> I have also used biomaRt to do this (commented out), but I had the same
>> results.
>>
>> I am currently upgrading to R 3.0.2 and re-installing bioconductor (maybe
>> it will change something?)
>>
>> Thanks for any thoughts!
>>
>> Script:
>>
>> gffdir <-  "/data/smcintur/Annotation/TAIR10_GFF3_genes.gff";
>> library(ShortRead)
>> library(GenomicFeatures)
>> library(Rsamtools)
>> #library(biomaRt)
>> setwd("/data/smcintur/RNASeq/NMseq/newTophat/BamFiles/")
>> bf <- list.files(pattern = ".bam$")
>> bi <- list.files(pattern = "bam.bai$")
>> bfl <- BamFileList(bf, bi)
>>
>> #mart <- makeTranscriptDbFromBiomart(biomart = "Public_TAIRV10", dataset=
>> "tairv10")
>> tx <- transcriptsBy(mart)
>> gff <- makeTranscriptDbFromGFF(gffdir, format = "gff")
>> gff
>>
>> cds <- cdsBy(gff)
>> tx <- transcriptsBy(gff)
>>
>> olapTx <- summarizeOverlaps(tx, bfl)
>> olapCds <- summarizeOverlaps(cds, bfl)
>>
>> txMat <- assays(olapTx)$counts
>> cdsMat <- assays(olapCds)$counts
>>
>> write.table(txMat, file = "countMatTxGff.txt", sep = "\t")
>> write.table(cdsMat, file = "countMatCdsGff.txt", sep = "\t")
>>
>> sessionInfo
>> R version 3.0.0 (2013-04-03)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=C                 LC_NAME=C
>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] BiocInstaller_1.12.0
>>
>> loaded via a namespace (and not attached):
>>   [1] AnnotationDbi_1.24.0   Biobase_2.22.0         BiocGenerics_0.8.0
>>   [4] biomaRt_2.18.0         Biostrings_2.30.1      bitops_1.0-6
>>   [7] BSgenome_1.30.0        DBI_0.2-7              GenomicFeatures_1.14.2
>> [10] GenomicRanges_1.14.3   IRanges_1.20.6         parallel_3.0.0
>> [13] RCurl_1.95-4.1         Rsamtools_1.14.2       RSQLite_0.11.4
>> [16] rtracklayer_1.22.0     stats4_3.0.0           tools_3.0.0
>> [19] XML_3.98-1.1           XVector_0.2.0          zlibbioc_1.8.0
>>
>>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.
> science.biology.informatics.conductor
>



-- 
Reema Singh
PhD Scholar
Computational Biology and Bioinformatics
School of Computational and Integrative Sciences
Jawaharlal Nehru University
New Delhi-110067
INDIA

	[[alternative HTML version deleted]]

