[BioC] GenomicFeatures::makeTranscriptDbFromBiomart - BioMart data anomaly: for some transcripts, the cds cumulative length inferred from the exon and UTR info doesn't match the "cds_length" attribute from BioMart
Cook, Malcolm
MEC at stowers.org
Fri Feb 3 21:32:45 CET 2012
Hi Marc, and other `library(GenomicFeatures)` users working in fly,
I just changed Subject to keep alive one of the issues I still have, namely:
I get the following error:
> library(GenomicFeatures)
> txdb<-makeTranscriptDbFromBiomart(biomart="ensembl", dataset="dmelanogaster_gene_ensembl", circ_seqs=NULL))
Download and preprocess the 'transcripts' data frame ... OK
Download and preprocess the 'chrominfo' data frame ... OK
Download and preprocess the 'splicings' data frame ... Error in .extractCdsRangesFromBiomartTable(bm_table) :
BioMart data anomaly: for some transcripts, the cds cumulative length inferred from the exon and UTR info doesn't match the "cds_length" attribute from BioMart
Marc, you already observed that:
> >> the data for cds ranges and total cds length (both from biomaRt) no
> >> longer agree with each other. In other words, the data from the current
> >> drosophila ranges in biomaRt seems to disagree with itself, and so the
> >> code is refusing to make a package out of this data as a result.
> >> To get the 2nd issue fixed probably involves talking to ensembl about
> >> their CDS data for fly to see if we can resolve the discrepancy.
> > I would be happy to take this to them.
I still wonder:
> Can you recommend a best way to get a more diagnostic trace from the
> attempt at txdb creation so we can correctly report to ensembl team the
> errant transcript(s) ?
I would be happy to take this up with Ensembl team, but, need details which I don't know how to produce.
Finally, one the side, here is a tiny suggestion:
* change the default for circ_seqs in makeTranscriptDbFromBiomart to be NULL, instead of any organism (human) specific.
Regards,
--Malcolm
R version 2.14.0 (2011-10-31)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicFeatures_1.6.7 AnnotationDbi_1.16.11 Biobase_2.14.0
[4] GenomicRanges_1.6.6 IRanges_1.12.5
loaded via a namespace (and not attached):
[1] BSgenome_1.22.0 Biostrings_2.22.0 DBI_0.2-5 RCurl_1.9-5
[5] RSQLite_0.11.1 XML_3.9-4 biomaRt_2.10.0 rtracklayer_1.14.4
[9] tools_2.14.0 zlibbioc_1.0.0
>
More information about the Bioconductor
mailing list