[BioC] IRanges, GenomicRanges, GenomicFeatures?

Steve Lianoglou mailinglist.honeypot at gmail.com
Mon Nov 1 06:19:30 CET 2010


Hi,

On Sun, Oct 31, 2010 at 11:10 PM, Oleg Moskvin <moskvin at wisc.edu> wrote:
> Hello list members,
>
> For a RNA-seq analysis, what would you suggest to use to convert raw-sequence-based read coverage to annotated ORF-based coverage, if the genome of interest is NOT supported in neither UCSC nor ENSEMBL, which means that creation of a TranscriptDB object in a straightforward way (I.e. according to the GenomicFeatures pipeline) is impossible? What would you recommend to import a .gff file (containing annotation of a particular genome, from GenBank) into R/Bioconductor to eventually generate a gene-centric countTable readable by packages like DESeq?

Assuming I've understood your question and how you have your data
available to you, here is one (maybe too simple) approach:

I think I'd parse the GFF into a GRangesList object (each item of the
list would be a GRanges object that stores the exon structure of your
transcripts (or genes) (which I'm assuming is what's in your GFF
file)).

If you had your rna-seq data in its own GRanges object, you could then
countOverlaps between your data and GRangesList-transcript info pretty
easily, which you could use to create your countTable.

Hope that helps,
-steve

ps - I think rtracklayer has some facilities to import GFF files,
which might be helpful to you.

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list