[BioC] makeTranscriptDbFromGFF Error for UCSC GTF File

Marc Carlson mcarlson at fhcrc.org
Wed Jul 2 19:16:58 CEST 2014


Hi Dario,

That error says that some of the attributes have been formatted in a way 
that leaves them uninterpretable by the parser.  But what really puzzles 
me is why you want to parse this track as a GTF file at all?  The UCSC 
hg19 track is already available as a package here:

http://www.bioconductor.org/packages/release/data/annotation/html/TxDb.Hsapiens.UCSC.hg19.knownGene.html

And if that is not actually the track you are trying for, then perhaps 
you should just use the makeTranscriptDbFromUCSC() function instead?  
That would be the more typical tool for making UCSC tracks into 
TranscriptDb objects.

In contrast, using GTF or GFF files for making TranscriptDb objects is 
always a little risky because many of these files will not have been 
created with the intention of holding a transcriptome as data (which is 
the specific thing that a TranscriptDb object is meant to hold).  This 
is because the GTF and GFF file formats were not initially intended for 
the specific purpose of holding a transcriptome but were instead 
intended to be something more general.

Hope this helps,


  Marc



On 07/02/2014 12:00 AM, Dario Strbenac wrote:
> Hello,
>
> I used :
>
>> system.time(hg19 <- makeTranscriptDbFromGFF("/home/dario/data/Annotation/hg19.gtf", format = "gtf"))
> Error in .parse_attrCol(attrCol, file, colnames) :
>    Some attributes do not conform to 'tag value' format
> Timing stopped at: 15.605 0.296 16.07
>
> I downloaded the GTF file from UCSC Table Browser. The table's name was refGene. To me, it seems that the attributes are fine :
>
>> hg19table <- read.table("/home/dario/data/Annotation/hg19.gtf", sep = '\t', stringsAsFactors=FALSE)
>> table(sapply(strsplit(hg19table[, 9], ' '), length))
>       4
> 967118
>
> I have R version 3.1.0 (2014-04-10) and GenomicFeatures 1.16.2
>
> --------------------------------------
> Dario Strbenac
> PhD Student
> University of Sydney
> Camperdown NSW 2050
> Australia
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list