[BioC] Reading GTFs

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Jun 4 05:20:16 CEST 2008


Howdy,

On Jun 3, 2008, at 7:06 PM, Sean Davis wrote:

> On Tue, Jun 3, 2008 at 6:24 PM, Steve Lianoglou
> <mailinglist.honeypot at gmail.com> wrote:
>> Hi,
>>
>> I'm wondering why I can't seem to stumble across any packages that  
>> deal with
>> parsing and mapping GTF annotation data.
>
> GTF is just tab-delimited text, yes?  read.table() should eat that up.
> You could also look at biomaRt, rtracklayer, and GenomeGraph
> packages.

Yes, they are just tab delimited and quite easy to read in w/ R's  
ability to slice and dice delimited text. I was just wondering if I  
was doing this "my own way" instead of taking advantage of something  
that's already there ... meaning, all this great work has gone into R/ 
Bioconductor that allows it to lay claim to the "batteries included"  
type motto. It's just that I feel like at times the batteries are  
somewhere on the top shelf and easy to miss :-)

With packages that are concerned with setting up meta data for chip  
information, and probe mappings/whatever, I was just wondering if I  
should be attacking the problem with a certain bent that would be able  
to be used again in some already existing framework is all.

That said, thanks for the pointers to your suggested packages, and  
I'll look through them more.

>> In order to do some analysis with tiling array data, I need to  
>> incorporate
>> annotation data for chromosome positions
>
> You might look at the tilingArray package.

Yeah ... I've been in and out of that package. It's handy to learn  
from, for sure, and I'm trying to reuse as much of it as possible.

>> I'm happy to whip up some rigged method of doing this myself, but I  
>> feel
>> like others must be doing the same thing and I'm reinventing the  
>> wheel which
>> might not be all that round by the time I'm done with it.
>>
>> Are there better ways to deal with genome annotation? I mentioned the
>> AnnotationDbi in the subject line, because I feel like it provides  
>> something
>> similar, but I don't think it's quite what I'm after.
>
> What do you actually want to do?  The specifics may be relevant.

Currently I'm trying to gather a set of probes that fit a certain set  
of criteria, such as their genomic annotation (intergenic vs exonic,  
etc), number of hits to its genome, etc. I have all the information  
for these from a combination of reblasting the probes to the genome  
(as suggested by W. Huber and others) and the GTF file  and trying to  
store this information in a similar env that the tilingArray and Ringo  
packages use.

Later I'll probably want to go the other way by having a set of  
interesting probes and ensuring a quick way I can get the pertinent  
information for them to send them through some other bioconductor  
functionality, like one of the go* packages (for example).

Anyway .. thanks for the reply.

-steve



More information about the Bioconductor mailing list