[BioC] NCBI gff3 annotation file and read.gff()

Wed Jul 16 21:44:32 CEST 2014

Yes, indeed, I would like to mine into the fascicularis transcriptome 
and using makeTranscriptDbFromGFF() with my gff file is a very good 
suggestion.
I tried earlier this approach but I have to confess that I was unable to 
specify an exonRankAttributeName value.
I shall try to figure it out more carefully later.

Thank you

Ugo

Le 16-07-2014 10:58, Marc Carlson a écrit :
> Yes you definitely can use makeTranscriptDbFromGFF if you want a
> TranscriptDb object.  The following works for example:
> 
> library("GenomicFeatures")
> txdb <- makeTranscriptDbFromGFF(
> file="ref_Macaca_fascicularis_5.0_top_level.gff3.gz",
>                         format="gff3",
>                         exonRankAttributeName=NA,
>                         gffGeneIdAttributeName=NA,
>                         chrominfo=NA,
>                         dataSource=NA,
>                         species=NA,
>                         circ_seqs=DEFAULT_CIRC_SEQS,
>                         miRBaseBuild=NA,
>                         useGenesAsTranscripts=FALSE)
> 
> But is massaging this into a transcriptome what we want here?  Ugo
> hasn't told us what he wants to do with this data.  Also I didn't look
> closely at the data itself.  It may be that you can specify a value
> for exonRankAttributeName (which is always what you should want to do
> if you can manage it).
> 
> 
>  Marc
> 
> 
> 
> On 07/16/2014 09:10 AM, Michael Lawrence wrote:
>> Is there anything makeTranscriptDbFromGFF could do to help with this?
>> Sounds like you typically want something like a TxDb, except perhaps 
>> with
>> some special considerations. Following the NCBI conventions is 
>> probably
>> worth it.
>> 
>> 
>> On Wed, Jul 16, 2014 at 8:58 AM, Chris Stubben <stubben at lanl.gov> 
>> wrote:
>> 
>>> I would also suggest using rtracklayer import or create a genome data
>>> package.   At least for microbial genomes, you often just need to 
>>> return
>>> features (CDS, pseudogenes, tRNAs, etc) that have a parent with a 
>>> locus_tag
>>> key and assign that locus tag to the child (the read.gff default), so
>>> that's what is getting messed up with your large file.
>>> I'll probably use the rtracklayer import object in future versions 
>>> instead
>>> and then join on Parent where locus_tag is NA to the ID where 
>>> locus_tag is
>>> not NA.
>>> Chris
>>> 
>>> 
>>> 
>>>   I cc'd the packageMaintainer(), so that they are more likely to see 
>>> this
>>>> post.
>>>> 
>>>   I don't know whether this helps in this particular case, but 
>>> packages
>>>> should be using rtracklayer::import rather than creating their own 
>>>> readers.
>>>> Then at least whatever deficiencies are identified and corrected 
>>>> benefit
>>>> the entire project.
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> 
>>> Chris Stubben
>>> 
>>> Los Alamos National Lab
>>> Bioscience Division
>>> MS M888
>>> Los Alamos, NM 87545
>>> 
>>> Phone: (505) 667-3295
>>> 
>>> 
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.
>>> science.biology.informatics.conductor
>>> 
>> 	[[alternative HTML version deleted]]
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor