[BioC] IRanges, GenomicRanges, GenomicFeatures?
Steve Lianoglou
mailinglist.honeypot at gmail.com
Mon Nov 1 06:19:30 CET 2010
Hi,
On Sun, Oct 31, 2010 at 11:10 PM, Oleg Moskvin <moskvin at wisc.edu> wrote:
> Hello list members,
>
> For a RNA-seq analysis, what would you suggest to use to convert raw-sequence-based read coverage to annotated ORF-based coverage, if the genome of interest is NOT supported in neither UCSC nor ENSEMBL, which means that creation of a TranscriptDB object in a straightforward way (I.e. according to the GenomicFeatures pipeline) is impossible? What would you recommend to import a .gff file (containing annotation of a particular genome, from GenBank) into R/Bioconductor to eventually generate a gene-centric countTable readable by packages like DESeq?
Assuming I've understood your question and how you have your data
available to you, here is one (maybe too simple) approach:
I think I'd parse the GFF into a GRangesList object (each item of the
list would be a GRanges object that stores the exon structure of your
transcripts (or genes) (which I'm assuming is what's in your GFF
file)).
If you had your rna-seq data in its own GRanges object, you could then
countOverlaps between your data and GRangesList-transcript info pretty
easily, which you could use to create your countTable.
Hope that helps,
-steve
ps - I think rtracklayer has some facilities to import GFF files,
which might be helpful to you.
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioconductor
mailing list