[BioC] RNASeq:- getting Zero Count
Valerie Obenchain
vobencha at fhcrc.org
Tue Aug 6 17:43:43 CEST 2013
I would do some investigating with a single bam file.
Confirm 'gnModel' and 'aln' have some common seqlevels. This call should
produce a result.
intersect(seqlevels(aln), seqlevels(gnModel))
Call countOverlaps on a single file.
co <- countOverlaps(aln, gnModel)
Evidently you only want the bam records that have exactly 5 hits. This
could be limiting. To see the distribution of hits make a table of the
counts.
table(co)
Valerie
On 08/05/2013 10:05 PM, Reema Singh wrote:
> Hi Valerie,
>
> Thank you so much for the reply.
>
> After checking the seqlevels, I am able to get rid off the error, but
> still getting the zero count entries. Is there any another way of doing
> this?
>
> KInd Regards
>
>
> On Mon, Aug 5, 2013 at 9:21 PM, Valerie Obenchain <vobencha at fhcrc.org
> <mailto:vobencha at fhcrc.org>> wrote:
>
> Hi Reema,
>
> To perform overlap or matching operations the seqlevels (chromosome
> names) of the objects must match. The error message is telling you
> that some of these do not match. It's reasonable that a few names
> may not match (maybe a chromosome is present in one object and not
> the other) but the majority should.
>
> Check the seqlevels:
> seqlevels(aln)
> seqlevels(gnModel)
>
> Which names are common to both:
> intersect(seqlevels(gnModel), seqlevels(aln))
>
> You can rename seqlevels in several different ways. See
> ?renameSeqlevels or ?seqlevels for examples.
>
> Valerie
>
>
> On 08/04/2013 06:35 AM, Reema Singh wrote:
>
> Dear All,
>
> I am trying to extract the read count from three .bam files. But
> I am
> getting Zero count entries.
>
> I am using Mycobacterium Tuberculosis H37Rv gtf file (
> ftp://ftp.ensemblgenomes.org/__pub/release-19/bacteria//gtf/__bacteria_1_collection/__mycobacterium_tuberculosis___h37rv/Mycobacterium___tuberculosis_h37rv.GCA___000277735.1.19.gtf.gz
> <ftp://ftp.ensemblgenomes.org/pub/release-19/bacteria//gtf/bacteria_1_collection/mycobacterium_tuberculosis_h37rv/Mycobacterium_tuberculosis_h37rv.GCA_000277735.1.19.gtf.gz>)
> and RNASeq data used here were downloaded from (
> http://www.ncbi.nlm.nih.gov/__geo/query/acc.cgi?acc=GSE40846
> <http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE40846>__)
> and aligned
> with bowtie2.
>
>
> library(GenomicFeatures)
> txdb <-
> makeTranscriptDbFromGFF(file="__Mycobacterium_tuberculosis___h37rv.GCA_000277735.1.19.gtf",__format="gtf")
> saveDb(txdb,file="__MycoTubeH37Rv.sqlite")
> load("MycoTubeH37Rv.sqlite")
> gnModel <- exonsBy(txdb,"gene") ### *also tried with
> "transcripts", "cds",
> but getting same *
>
>
> bamFiles <- list.files(".", "bam$", full=TRUE)
> names(bamFiles) <- sub("\\..*","",basename(__bamFiles))
> counter <- function(fl, gnModel){
> aln <- GenomicRanges::__readGappedAlignments(fl)
> strand(aln)
> hits <- countOverlaps(aln,gnModel)
> counts <- countOverlaps(gnModel,aln[__hits==5])
> names(counts) <- names(gnModel)
> counts
> }
>
> counts <- sapply(bamFiles,counter,__gnModel)
>
> Note: method with signature ‘Vector#GRangesList’ chosen for function
> ‘countOverlaps’,
> target signature ‘GappedAlignments#GRangesList’__.
> "GappedAlignments#Vector" would also be valid
> Note: method with signature ‘GRangesList#Vector’ chosen for function
> ‘countOverlaps’,
> target signature ‘GRangesList#GappedAlignments’__.
> "Vector#GappedAlignments" would also be valid
> Warning messages:
> 1: In .Seqinfo.mergexy(x, y) :
> Each of the 2 combined objects has sequence levels not in
> the other:
> - in 'x': gi|448814763|ref|NC_000962.3|
> - in 'y': Chromosome
> Make sure to always combine/compare objects based on the
> same reference
> genome (use suppressWarnings() to suppress this warning).
> 2: In .Seqinfo.mergexy(x, y) :
> Each of the 2 combined objects has sequence levels not in
> the other:
> - in 'x': Chromosome
> - in 'y': gi|448814763|ref|NC_000962.3|
> Make sure to always combine/compare objects based on the
> same reference
> genome (use suppressWarnings() to suppress this warning).
> 3: In .Seqinfo.mergexy(x, y) :
> Each of the 2 combined objects has sequence levels not in
> the other:
> - in 'x': gi|448814763|ref|NC_000962.3|
> - in 'y': Chromosome
> Make sure to always combine/compare objects based on the
> same reference
> genome (use suppressWarnings() to suppress this warning).
> 4: In .Seqinfo.mergexy(x, y) :
> Each of the 2 combined objects has sequence levels not in
> the other:
> - in 'x': Chromosome
> - in 'y': gi|448814763|ref|NC_000962.3|
> Make sure to always combine/compare objects based on the
> same reference
> genome (use suppressWarnings() to suppress this warning).
> 5: In .Seqinfo.mergexy(x, y) :
> Each of the 2 combined objects has sequence levels not in
> the other:
> - in 'x': gi|448814763|ref|NC_000962.3|
> - in 'y': Chromosome
> Make sure to always combine/compare objects based on the
> same reference
> genome (use suppressWarnings() to suppress this warning).
> 6: In .Seqinfo.mergexy(x, y) :
> Each of the 2 combined objects has sequence levels not in
> the other:
> - in 'x': Chromosome
> - in 'y': gi|448814763|ref|NC_000962.3|
> Make sure to always combine/compare objects based on the
> same reference
> genome (use suppressWarnings() to suppress this warning).
>
> head(counts)
>
> SRR568038 SRR568039 SRR568040
> RVBD_0001 0 0 0
> RVBD_0002 0 0 0
> RVBD_0003 0 0 0
> RVBD_0004 0 0 0
> RVBD_0005 0 0 0
> RVBD_0006 0 0 0
>
> sessionInfo()
>
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-redhat-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets
> methods
> [8] base
>
> other attached packages:
> [1] Rsamtools_1.12.3 Biostrings_2.28.0
> GenomicFeatures_1.12.3
> [4] AnnotationDbi_1.22.6 Biobase_2.20.1
> GenomicRanges_1.12.4
> [7] IRanges_1.18.2 BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
> [1] biomaRt_2.16.0 bitops_1.0-5 BSgenome_1.28.0
> DBI_0.2-6
>
> [5] RCurl_1.95-4.1 RSQLite_0.11.3 rtracklayer_1.20.4
> stats4_3.0.1
>
> [9] tools_3.0.1 XML_3.96-1.1 zlibbioc_1.6.0
>
>
> Kind regards
>
>
>
>
> _________________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/__listinfo/bioconductor
> <https://stat.ethz.ch/mailman/listinfo/bioconductor>
> Search the archives:
> http://news.gmane.org/gmane.__science.biology.informatics.__conductor
> <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>
>
> --
> Reema Singh
> PhD Scholar
> Computational Biology and Bioinformatics
> School of Computational and Integrative Sciences
> Jawaharlal Nehru University
> New Delhi-110067
> INDIA
More information about the Bioconductor
mailing list