[BioC] GenomicFeatures Reading GFF Efficiency
Marc Carlson
mcarlson at fhcrc.org
Tue Nov 20 01:34:51 CET 2012
Hi Dario,
I have found and killed a couple bugs with this parser and the fix
should show up in the next couple days.
I will work on better performance as well, but that is not in the latest
update as I had to fix the bug 1st. But please be aware that a lot of
the reason for the slow performance is because GTF files are not
required to encode exon ranking information. In the 800+ megabyte file
you were parsing, there only way to get exon rank information was by
deducing it based on the provided coordinate positions. The fact that
this file does not provide that information should probably concern
you. Even though the inference can be done by the parser, it takes time
to do and more importantly: it makes assumptions about your data. So it
really should not be done if you can avoid it. This is why the function
is throwing a warning about the fact that it is infering the exon rankings.
So if you can get the data in another format, or at least from a GTF
file that does provide the exon ranking information, that would be
strongly recommended.
Marc
On 11/15/2012 06:00 PM, Dario Strbenac wrote:
> After nearly 2 days, it gave an error :
>
> Processing splicing information for gtf file.
> Error in `colnames<-`(`*tmp*`, value = c("exon_chrom", "exon_start", "exon_end", :
> 'names' attribute [9] must be the same length as the vector [6]
> In addition: Warning message:
> In .deduceExonRankings(exs) :
> Infering Exon Rankings. If this is not what you expected, then please be sure that you have provided a valid attribute for exonRankAttributeName
>
> This is the 1.10.0 version of GenomicFeatures in R 2.15.1.
>
> Meanwhile, GENCODE version 14 is released, so you wouldn't have wanted my object of version 13 annotations, in the end.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list