[BioC] makeTranscriptDbFromGFF Error for UCSC GTF File
Simon Anders
anders at embl.de
Wed Jul 2 20:34:04 CEST 2014
Hi
On 02/07/14 20:17, Michael Lawrence wrote:
>> In contrast, using GTF or GFF files for making TranscriptDb objects is
>> always a little risky because many of these files will not have been
>> created with the intention of holding a transcriptome as data (which is the
>> specific thing that a TranscriptDb object is meant to hold). This is
>> because the GTF and GFF file formats were not initially intended for the
>> specific purpose of holding a transcriptome but were instead intended to be
>> something more general.
>>
>>
> Actually GTF (Gene Transfer Format) files are designed specifically for
> representing gene models, and we have no excuse for not parsing them
> correctly. There have been some tweaks to attribute parsing (I thought
> limited to GFF3) in devel, so there may be a difference between Herve's
> devel result and Dario's release result. I'll try to find some time to
> look into this.
The problem with GTF files produced by the UCSC Table Browser is that
they contain incorrect gene IDs: The gene_id attribute is always set to
the same value as the transcript_id, and these files hence cannot be
used to define gene models without manual correction (which would be to
remove the transcript number suffix from the gene IDs).
Long ago, I have asked the UCSC Genome Browser help-desk why this is and
was told that it is a bug in the Table Browser which they cannot fix,
for some reason.
Hence, I usually advise to not use these files.
Simon
More information about the Bioconductor
mailing list