[BioC] makeTranscriptDbFromGFF Error for UCSC GTF File
Marc Carlson
mcarlson at fhcrc.org
Wed Jul 2 19:16:58 CEST 2014
Hi Dario,
That error says that some of the attributes have been formatted in a way
that leaves them uninterpretable by the parser. But what really puzzles
me is why you want to parse this track as a GTF file at all? The UCSC
hg19 track is already available as a package here:
http://www.bioconductor.org/packages/release/data/annotation/html/TxDb.Hsapiens.UCSC.hg19.knownGene.html
And if that is not actually the track you are trying for, then perhaps
you should just use the makeTranscriptDbFromUCSC() function instead?
That would be the more typical tool for making UCSC tracks into
TranscriptDb objects.
In contrast, using GTF or GFF files for making TranscriptDb objects is
always a little risky because many of these files will not have been
created with the intention of holding a transcriptome as data (which is
the specific thing that a TranscriptDb object is meant to hold). This
is because the GTF and GFF file formats were not initially intended for
the specific purpose of holding a transcriptome but were instead
intended to be something more general.
Hope this helps,
Marc
On 07/02/2014 12:00 AM, Dario Strbenac wrote:
> Hello,
>
> I used :
>
>> system.time(hg19 <- makeTranscriptDbFromGFF("/home/dario/data/Annotation/hg19.gtf", format = "gtf"))
> Error in .parse_attrCol(attrCol, file, colnames) :
> Some attributes do not conform to 'tag value' format
> Timing stopped at: 15.605 0.296 16.07
>
> I downloaded the GTF file from UCSC Table Browser. The table's name was refGene. To me, it seems that the attributes are fine :
>
>> hg19table <- read.table("/home/dario/data/Annotation/hg19.gtf", sep = '\t', stringsAsFactors=FALSE)
>> table(sapply(strsplit(hg19table[, 9], ' '), length))
> 4
> 967118
>
> I have R version 3.1.0 (2014-04-10) and GenomicFeatures 1.16.2
>
> --------------------------------------
> Dario Strbenac
> PhD Student
> University of Sydney
> Camperdown NSW 2050
> Australia
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list