[BioC] miRNA annotation
Steve Lianoglou
mailinglist.honeypot at gmail.com
Mon May 7 20:59:20 CEST 2012
Hi,
On Mon, May 7, 2012 at 2:15 PM, Wu, Huiyun <william.wu at vanderbilt.edu> wrote:
> Hello there,
>
> I am new in seq data analysis, and have difficulty in miRNA annotation. To clarify my question, I have the following codes.
>
>
> bamfile <- "GSE33858_1.bowtie_alignment.bam"
> aln <- readBamGappedAlignments(bamfile)
> txdb<-makeTranscriptDbFromUCSC(genome="hg19",tablename="knownGene")
> exonRangesList <- exonsBy(txdb, by = "gene")
> exonRanges <- unlist(exonRangesList)
> strand(exonRanges) <- "*"
> geneNames <- sub("\\..*$", "", names(exonRanges))
> exonRangesListNoStrand <- split(exonRanges, geneNames)
> numberOfexons <- sapply(width(exonRangesListNoStrand), length)
> geneLength <- sum(width(reduce(exonRangesListNoStrand)))
> # Counting reads for the BAM file
> counts <- countOverlaps(exonRangesListNoStrand, aln)
> names(counts) <- names(exonRangesListNoStrand)
>
> > length(counts)
>
> [1] 22932
>
> So my question is from aligned file to annotated file. Specifically, why I got 22932 features after mapping to hg19? I learned there are only less than 2000 miRNAs matched genes. I also know there are several RNA databases including miRBase for annotation. Did I do something logically wrong? It would be greatly appreciated if someone could show me how to implement this annotation in R.
What does `sum(counts == 0)` give you?
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioconductor
mailing list