[BioC] GenomicFeatures makeTranscriptDbFromBiomart failure

Hervé Pagès hpages at fhcrc.org
Wed Nov 9 19:27:24 CET 2011


Hi,

On 11-11-09 03:33 AM, Tim Rayner wrote:
> Hi Marc,
>
> Thanks very much for looking into this, and also to Michael for
> providing the patch. I've upgraded my GRanges package and the code now
> runs with a couple of warnings:
>
>> txdb.Hs2<- makeTranscriptDbFromBiomart(biomart='ensembl', dataset='hsapiens_gene_ensembl')
> Download and preprocess the 'transcripts' data frame ... OK
> Download and preprocess the 'chrominfo' data frame ... FAILED! (=>  skipped)
> Download and preprocess the 'splicings' data frame ... OK
> Download and preprocess the 'genes' data frame ... OK
> Prepare the 'metadata' data frame ... OK
> Make the TranscriptDb object ... OK
> Warning messages:
> 1: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
> else paste(labels,  :
>    duplicated levels will not be allowed in factors anymore
> 2: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
> else paste(labels,  :
>    duplicated levels will not be allowed in factors anymore
> 3: In .normargChrominfo(chrominfo, transcripts$tx_chrom, splicings$exon_chrom) :
>    chromosome lengths and circularity flags are not available for this
> TranscriptDb object

The 2 first warnings + the fact that downloading the chrominfo failed
is not looking good. Didn't use to be like that. We'll investigate on
our side and report later.

Cheers,
H.

>
> So I think the problem is basically fixed. I wonder if perhaps the
> issue was caused by truncated data transfers; I observed several
> similar failures earlier yesterday afternoon, but in each case the
> problem seemed to occur at a different point in the process.
>
> Thanks again,
>
> Tim
>
> On 8 November 2011 20:16, Marc Carlson<mcarlson at fhcrc.org>  wrote:
>> Hi Tim,
>>
>> There was a small bug last week for this method caused by a decision at
>> ensembl to start supporting psuedoautosomal regions, but it was fixed last
>> week and should be fixed with the version of GenomicFeatures reported here.
>>   I just ran your code locally 4 minutes ago and it still works here.  The
>> only difference I can see is that my GRanges package is one version higher
>> than yours (GenomicRanges_1.6.2).  Please update that package and then run
>> it again and see if you have better luck with ensembl.
>>
>> The patch that Michael mentioned actually arrived at the exact moment that I
>> was testing the bug fix above which means that it has a some conflicts I
>> will have to resolve, but it should be added to devel very soon.
>>
>>
>>   Marc
>>
>>
>>
>> On 11/08/2011 03:55 AM, Michael Lawrence wrote:
>>>
>>> On Tue, Nov 8, 2011 at 3:19 AM, Tim Rayner<tfrayner at gmail.com>    wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm trying to make a TranscriptDb from the Ensembl human Biomart, but
>>>> I've run into a problem. As shown below, the equivalent operation for
>>>> the mouse Biomart works fine:
>>>>
>>>>> # Mouse TranscriptDb created without a hitch:
>>>>> txdb.Mm<- makeTranscriptDbFromBiomart(biomart='ensembl',
>>>>
>>>> dataset='mmusculus_gene_ensembl')
>>>> Download and preprocess the 'transcripts' data frame ... OK
>>>> Download and preprocess the 'chrominfo' data frame ... OK
>>>> Download and preprocess the 'splicings' data frame ... OK
>>>> Download and preprocess the 'genes' data frame ... OK
>>>> Prepare the 'metadata' data frame ... OK
>>>> Make the TranscriptDb object ... OK
>>>>
>>>>> # Here's the problem:
>>>>> txdb.Hs<- makeTranscriptDbFromBiomart(biomart='ensembl',
>>>>
>>>> dataset='hsapiens_gene_ensembl')
>>>> Download and preprocess the 'transcripts' data frame ... OK
>>>> Download and preprocess the 'chrominfo' data frame ... FAILED! (=>
>>>>   skipped)
>>>> Download and preprocess the 'splicings' data frame ... Error in
>>>> scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
>>>>   line 800380 did not have 11 elements
>>>>
>>>>> sessionInfo()
>>>>
>>>> R version 2.14.0 (2011-10-31)
>>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>>
>>>> locale:
>>>> [1] C
>>>>
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>
>>>> other attached packages:
>>>> [1] GenomicFeatures_1.6.1 AnnotationDbi_1.16.0  Biobase_2.14.0
>>>> [4] GenomicRanges_1.6.1   IRanges_1.12.1
>>>>
>>>> loaded via a namespace (and not attached):
>>>>   [1] BSgenome_1.22.0    Biostrings_2.22.0  DBI_0.2-5
>>>>   RCurl_1.6-10
>>>>   [5] RSQLite_0.10.0     XML_3.4-3          biomaRt_2.10.0
>>>> rtracklayer_1.14.1
>>>>   [9] tools_2.14.0       zlibbioc_1.0.0
>>>>
>>>> I don't know if this is an issue with the Biomart instance or the
>>>> GenomicFeatures package. I was wondering if anyone had any suggestions
>>>> as to how I might work around this?
>>>>
>>>> On a related note, would it be possible to add the ability to point
>>>> makeTranscriptDbFromBiomart() at alternate Biomart hosts (as one
>>>> would, for example, by calling
>>>> biomaRt::useMart(host='www.ensembl.org', ...))?
>>>
>>> We've submitted a patch that does just this, as well as supporting an
>>> attribute prefix string for selecting alternative gene models.
>>>
>>>
>>>> It would probably be
>>>> good to be able to pass through the 'archive' argument to useMart as
>>>> well.
>>>>
>>>> Many thanks,
>>>>
>>>> Tim Rayner
>>>>
>>>> --
>>>> Bioinformatician
>>>> Smith Lab, CIMR
>>>> University of Cambridge
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list