[BioC] GenomicFeatures::makeTranscriptDbFromBiomart - BioMart data anomaly: for some transcripts, the cds cumulative length inferred from the exon and UTR info doesn't match the "cds_length" attribute from BioMart

Cook, Malcolm MEC at stowers.org
Fri Feb 3 21:32:45 CET 2012

Hi Marc, and other `library(GenomicFeatures)` users working in fly,

I just changed Subject to keep alive one of the issues I still have, namely:

I get the following error:

> library(GenomicFeatures)
> txdb<-makeTranscriptDbFromBiomart(biomart="ensembl", dataset="dmelanogaster_gene_ensembl", circ_seqs=NULL))
Download and preprocess the 'transcripts' data frame ... OK	
Download and preprocess the 'chrominfo' data frame ... OK
Download and preprocess the 'splicings' data frame ... Error in .extractCdsRangesFromBiomartTable(bm_table) :	
  BioMart data anomaly: for some transcripts, the cds cumulative length inferred from the exon and UTR info doesn't match the "cds_length" attribute from BioMart

Marc, you already observed that: 

> >> the data for cds ranges and total cds length (both from biomaRt) no
> >> longer agree with each other.  In other words, the data from the current
> >> drosophila ranges in biomaRt seems to disagree with itself, and so the
> >> code is refusing to make a package out of this data as a result.
> >> To get the 2nd issue fixed probably involves talking to ensembl about
> >> their CDS data for fly to see if we can resolve the discrepancy.
> > I would be happy to take this to them.

I still wonder:

> Can you recommend a best way to get a more diagnostic trace from the
> attempt at txdb creation so we can correctly report to ensembl team the
> errant transcript(s) ?

I would be happy to take this up with Ensembl team, but, need details which I don't know how to produce.

Finally, one the side, here is a tiny suggestion:

	* change the default for circ_seqs in makeTranscriptDbFromBiomart to be NULL, instead of any organism (human) specific.



R version 2.14.0 (2011-10-31)                                                                                                                                                                                                                
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)                                                                                                                                                                                           
[1] C                                                                                                                                                                                                                                        
attached base packages:                                                                                                                                                                                                                      
[1] stats     graphics  grDevices utils     datasets  methods   base                                                                                                                                                                         
other attached packages:                                                                                                                                                                                                                     
[1] GenomicFeatures_1.6.7 AnnotationDbi_1.16.11 Biobase_2.14.0                                                                                                                                                                               
[4] GenomicRanges_1.6.6   IRanges_1.12.5                                                                                                                                                                                                     
loaded via a namespace (and not attached):                                                                                                                                                                                                   
 [1] BSgenome_1.22.0    Biostrings_2.22.0  DBI_0.2-5          RCurl_1.9-5                                                                                                                                                                    
 [5] RSQLite_0.11.1     XML_3.9-4          biomaRt_2.10.0     rtracklayer_1.14.4                                                                                                                                                             
 [9] tools_2.14.0       zlibbioc_1.0.0                                                                                                                                                                                                       

More information about the Bioconductor mailing list