[BioC] NCBI gff3 annotation file and read.gff()
ugo.borello at inserm.fr
ugo.borello at inserm.fr
Wed Jul 16 21:44:32 CEST 2014
Yes, indeed, I would like to mine into the fascicularis transcriptome
and using makeTranscriptDbFromGFF() with my gff file is a very good
suggestion.
I tried earlier this approach but I have to confess that I was unable to
specify an exonRankAttributeName value.
I shall try to figure it out more carefully later.
Thank you
Ugo
Le 16-07-2014 10:58, Marc Carlson a écrit :
> Yes you definitely can use makeTranscriptDbFromGFF if you want a
> TranscriptDb object. The following works for example:
>
> library("GenomicFeatures")
> txdb <- makeTranscriptDbFromGFF(
> file="ref_Macaca_fascicularis_5.0_top_level.gff3.gz",
> format="gff3",
> exonRankAttributeName=NA,
> gffGeneIdAttributeName=NA,
> chrominfo=NA,
> dataSource=NA,
> species=NA,
> circ_seqs=DEFAULT_CIRC_SEQS,
> miRBaseBuild=NA,
> useGenesAsTranscripts=FALSE)
>
> But is massaging this into a transcriptome what we want here? Ugo
> hasn't told us what he wants to do with this data. Also I didn't look
> closely at the data itself. It may be that you can specify a value
> for exonRankAttributeName (which is always what you should want to do
> if you can manage it).
>
>
> Marc
>
>
>
> On 07/16/2014 09:10 AM, Michael Lawrence wrote:
>> Is there anything makeTranscriptDbFromGFF could do to help with this?
>> Sounds like you typically want something like a TxDb, except perhaps
>> with
>> some special considerations. Following the NCBI conventions is
>> probably
>> worth it.
>>
>>
>> On Wed, Jul 16, 2014 at 8:58 AM, Chris Stubben <stubben at lanl.gov>
>> wrote:
>>
>>> I would also suggest using rtracklayer import or create a genome data
>>> package. At least for microbial genomes, you often just need to
>>> return
>>> features (CDS, pseudogenes, tRNAs, etc) that have a parent with a
>>> locus_tag
>>> key and assign that locus tag to the child (the read.gff default), so
>>> that's what is getting messed up with your large file.
>>> I'll probably use the rtracklayer import object in future versions
>>> instead
>>> and then join on Parent where locus_tag is NA to the ID where
>>> locus_tag is
>>> not NA.
>>> Chris
>>>
>>>
>>>
>>> I cc'd the packageMaintainer(), so that they are more likely to see
>>> this
>>>> post.
>>>>
>>> I don't know whether this helps in this particular case, but
>>> packages
>>>> should be using rtracklayer::import rather than creating their own
>>>> readers.
>>>> Then at least whatever deficiencies are identified and corrected
>>>> benefit
>>>> the entire project.
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Chris Stubben
>>>
>>> Los Alamos National Lab
>>> Bioscience Division
>>> MS M888
>>> Los Alamos, NM 87545
>>>
>>> Phone: (505) 667-3295
>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.
>>> science.biology.informatics.conductor
>>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list