[BioC] RNASeq:- getting Zero Count

Tue Aug 6 17:43:43 CEST 2013

I would do some investigating with a single bam file.

Confirm 'gnModel' and 'aln' have some common seqlevels. This call should 
produce a result.

     intersect(seqlevels(aln), seqlevels(gnModel))

Call countOverlaps on a single file.

     co <- countOverlaps(aln, gnModel)

Evidently you only want the bam records that have exactly 5 hits. This 
could be limiting. To see the distribution of hits make a table of the 
counts.

     table(co)

Valerie

On 08/05/2013 10:05 PM, Reema Singh wrote:
> Hi Valerie,
>
> Thank you so much for the reply.
>
> After checking the seqlevels, I am able to get rid off the error, but
> still getting the zero count entries. Is there any another way of doing
> this?
>
> KInd Regards
>
>
> On Mon, Aug 5, 2013 at 9:21 PM, Valerie Obenchain <vobencha at fhcrc.org
> <mailto:vobencha at fhcrc.org>> wrote:
>
>     Hi Reema,
>
>     To perform overlap or matching operations the seqlevels (chromosome
>     names) of the objects must match. The error message is telling you
>     that some of these do not match. It's reasonable that a few names
>     may not match (maybe a chromosome is present in one object and not
>     the other) but the majority should.
>
>     Check the seqlevels:
>     seqlevels(aln)
>     seqlevels(gnModel)
>
>     Which names are common to both:
>     intersect(seqlevels(gnModel), seqlevels(aln))
>
>       You can rename seqlevels in several different ways. See
>     ?renameSeqlevels or ?seqlevels for examples.
>
>     Valerie
>
>
>     On 08/04/2013 06:35 AM, Reema Singh wrote:
>
>         Dear All,
>
>         I am trying to extract the read count from three .bam files. But
>         I am
>         getting Zero count entries.
>
>         I am using Mycobacterium Tuberculosis H37Rv gtf file (
>         ftp://ftp.ensemblgenomes.org/__pub/release-19/bacteria//gtf/__bacteria_1_collection/__mycobacterium_tuberculosis___h37rv/Mycobacterium___tuberculosis_h37rv.GCA___000277735.1.19.gtf.gz
>         <ftp://ftp.ensemblgenomes.org/pub/release-19/bacteria//gtf/bacteria_1_collection/mycobacterium_tuberculosis_h37rv/Mycobacterium_tuberculosis_h37rv.GCA_000277735.1.19.gtf.gz>)
>         and RNASeq data used here were downloaded from (
>         http://www.ncbi.nlm.nih.gov/__geo/query/acc.cgi?acc=GSE40846
>         <http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE40846>__)
>         and aligned
>         with bowtie2.
>
>
>         library(GenomicFeatures)
>         txdb <-
>         makeTranscriptDbFromGFF(file="__Mycobacterium_tuberculosis___h37rv.GCA_000277735.1.19.gtf",__format="gtf")
>         saveDb(txdb,file="__MycoTubeH37Rv.sqlite")
>         load("MycoTubeH37Rv.sqlite")
>         gnModel <- exonsBy(txdb,"gene") ### *also tried with
>         "transcripts", "cds",
>         but getting same *
>
>
>         bamFiles <- list.files(".", "bam$", full=TRUE)
>         names(bamFiles) <- sub("\\..*","",basename(__bamFiles))
>         counter <- function(fl, gnModel){
>         aln <- GenomicRanges::__readGappedAlignments(fl)
>         strand(aln)
>         hits <- countOverlaps(aln,gnModel)
>         counts <- countOverlaps(gnModel,aln[__hits==5])
>         names(counts) <- names(gnModel)
>         counts
>         }
>
>         counts <- sapply(bamFiles,counter,__gnModel)
>
>         Note: method with signature ‘Vector#GRangesList’ chosen for function
>         ‘countOverlaps’,
>            target signature ‘GappedAlignments#GRangesList’__.
>            "GappedAlignments#Vector" would also be valid
>         Note: method with signature ‘GRangesList#Vector’ chosen for function
>         ‘countOverlaps’,
>            target signature ‘GRangesList#GappedAlignments’__.
>            "Vector#GappedAlignments" would also be valid
>         Warning messages:
>         1: In .Seqinfo.mergexy(x, y) :
>             Each of the 2 combined objects has sequence levels not in
>         the other:
>             - in 'x': gi|448814763|ref|NC_000962.3|
>             - in 'y': Chromosome
>             Make sure to always combine/compare objects based on the
>         same reference
>             genome (use suppressWarnings() to suppress this warning).
>         2: In .Seqinfo.mergexy(x, y) :
>             Each of the 2 combined objects has sequence levels not in
>         the other:
>             - in 'x': Chromosome
>             - in 'y': gi|448814763|ref|NC_000962.3|
>             Make sure to always combine/compare objects based on the
>         same reference
>             genome (use suppressWarnings() to suppress this warning).
>         3: In .Seqinfo.mergexy(x, y) :
>             Each of the 2 combined objects has sequence levels not in
>         the other:
>             - in 'x': gi|448814763|ref|NC_000962.3|
>             - in 'y': Chromosome
>             Make sure to always combine/compare objects based on the
>         same reference
>             genome (use suppressWarnings() to suppress this warning).
>         4: In .Seqinfo.mergexy(x, y) :
>             Each of the 2 combined objects has sequence levels not in
>         the other:
>             - in 'x': Chromosome
>             - in 'y': gi|448814763|ref|NC_000962.3|
>             Make sure to always combine/compare objects based on the
>         same reference
>             genome (use suppressWarnings() to suppress this warning).
>         5: In .Seqinfo.mergexy(x, y) :
>             Each of the 2 combined objects has sequence levels not in
>         the other:
>             - in 'x': gi|448814763|ref|NC_000962.3|
>             - in 'y': Chromosome
>             Make sure to always combine/compare objects based on the
>         same reference
>             genome (use suppressWarnings() to suppress this warning).
>         6: In .Seqinfo.mergexy(x, y) :
>             Each of the 2 combined objects has sequence levels not in
>         the other:
>             - in 'x': Chromosome
>             - in 'y': gi|448814763|ref|NC_000962.3|
>             Make sure to always combine/compare objects based on the
>         same reference
>             genome (use suppressWarnings() to suppress this warning).
>
>         head(counts)
>
>                     SRR568038 SRR568039 SRR568040
>         RVBD_0001         0         0         0
>         RVBD_0002         0         0         0
>         RVBD_0003         0         0         0
>         RVBD_0004         0         0         0
>         RVBD_0005         0         0         0
>         RVBD_0006         0         0         0
>
>             sessionInfo()
>
>         R version 3.0.1 (2013-05-16)
>         Platform: x86_64-redhat-linux-gnu (64-bit)
>
>         locale:
>            [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
>            [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
>            [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8
>            [7] LC_PAPER=C                LC_NAME=C
>            [9] LC_ADDRESS=C              LC_TELEPHONE=C
>         [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>
>         attached base packages:
>         [1] parallel  stats     graphics  grDevices utils     datasets
>           methods
>         [8] base
>
>         other attached packages:
>         [1] Rsamtools_1.12.3       Biostrings_2.28.0
>           GenomicFeatures_1.12.3
>         [4] AnnotationDbi_1.22.6   Biobase_2.20.1
>         GenomicRanges_1.12.4
>         [7] IRanges_1.18.2         BiocGenerics_0.6.0
>
>         loaded via a namespace (and not attached):
>            [1] biomaRt_2.16.0     bitops_1.0-5       BSgenome_1.28.0
>           DBI_0.2-6
>
>            [5] RCurl_1.95-4.1     RSQLite_0.11.3     rtracklayer_1.20.4
>         stats4_3.0.1
>
>            [9] tools_3.0.1        XML_3.96-1.1       zlibbioc_1.6.0
>
>
>         Kind regards
>
>
>
>
>         _________________________________________________
>         Bioconductor mailing list
>         Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>         https://stat.ethz.ch/mailman/__listinfo/bioconductor
>         <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>         Search the archives:
>         http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>         <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>
>
> --
> Reema Singh
> PhD Scholar
> Computational Biology and Bioinformatics
> School of Computational and Integrative Sciences
> Jawaharlal Nehru University
> New Delhi-110067
> INDIA