[BioC] makeTranscriptDbFromBiomart error
Marc Carlson
mcarlson at fhcrc.org
Thu Jun 7 21:32:12 CEST 2012
One more thing:
The uswest ensmbl biomart mirror has apparently been updated with the
fix (for reasons that are not known to me, the default has still not
been updated). So if you look at the manual page for
?makeTranscriptDbFromBiomart
You can see an example of how to use the uswest.ensembl.org host by
specifying the bomart and host arguments.
Marc
On 06/07/2012 10:40 AM, Marc Carlson wrote:
> Hi Stefanie,
>
> This is related to a bug with the 5' and 3' starts/ends that was in
> the latest version of biomaRt. We reported it to them a couple weeks
> ago because it immediately started to break some of our quality
> control tests for GenomicFeatures. At that time, they told us that it
> has been fixed, but it will still take a couple of weeks for their
> correction to propagate out. In the meantime, using either
> makeTranscriptDbFromUCSC() or the stock annotation packages for human,
> might be a good work-around for you.
>
> The warning that you saw for makeTranscriptDbFromUCSC() was another
> quality control check. We expect that when an annotation resource
> tells us the range for a CDS that this range should be divisible by
> three. When this doesn't happen, we issue the warning you were seeing
> for makeTranscriptDbFromUCSC().
>
> Hope that this clarifies things,
>
>
> Marc
>
>
>
> On 06/07/2012 08:50 AM, Stefanie Tauber wrote:
>> Hi,
>>
>> here is my sessionInfo:
>>
>>> sessionInfo()
>> R version 2.15.0 (2012-03-30)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=C LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] GenomicFeatures_1.8.0 AnnotationDbi_1.18.0 Biobase_2.16.0
>> [4] GenomicRanges_1.8.1 IRanges_1.14.2 BiocGenerics_0.2.0
>>
>> loaded via a namespace (and not attached):
>> [1] biomaRt_2.12.0 Biostrings_2.24.0 bitops_1.0-4.1
>> BSgenome_1.24.0
>> [5] DBI_0.2-5 RCurl_1.91-1 Rsamtools_1.8.0
>> RSQLite_0.11.1
>> [9] rtracklayer_1.16.0 stats4_2.15.0 tools_2.15.0 XML_3.9-4
>> [13] zlibbioc_1.2.0
>>
>> I updated GenomicFeatures to 1.8.1, but unfortunately did not help.
>>
>>
>> BUT: makeTranscriptDbFromUCSC did work :)
>>
>>> txdb<- makeTranscriptDbFromUCSC(genome="hg19", tablename="ensGene")
>> Download the ensGene table ... OK
>> Extract the 'transcripts' data frame ... OK
>> Extract the 'splicings' data frame ... OK
>> Download and preprocess the 'chrominfo' data frame ... OK
>> Prepare the 'metadata' data frame ... metadata: OK
>> Make the TranscriptDb object ... OK
>> There were 50 or more warnings (use warnings() to see the first 50)
>>
>>> txdb
>> TranscriptDb object:
>> | Db type: TranscriptDb
>> | Supporting package: GenomicFeatures
>> | Data source: UCSC
>> | Genome: hg19
>> | Genus and Species: Homo sapiens
>> | UCSC Table: ensGene
>> | Resource URL: http://genome.ucsc.edu/
>> | Type of Gene ID: Ensembl gene ID
>> | Full dataset: yes
>> | miRBase build ID: NA
>> | transcript_nrow: 181648
>> | exon_nrow: 541825
>> | cds_nrow: 278798
>> | Db created by: GenomicFeatures package from Bioconductor
>> | Creation time: 2012-06-07 17:48:45 +0200 (Thu, 07 Jun 2012)
>> | GenomicFeatures version at creation time: 1.8.1
>> | RSQLite version at creation time: 0.11.1
>> | DBSCHEMAVERSION: 1.0
>>
>>> warnings()
>> Warning messages:
>> 1: In .extractUCSCCdsStartEnd(cdsStart[i], cdsEnd[i],
>> exon_locs$start[[i]], ... :
>> UCSC data anomaly in transcript ENST00000513161: the cds
>> cumulative length is not a multiple of 3
>> 2: In .extractUCSCCdsStartEnd(cdsStart[i], cdsEnd[i],
>> exon_locs$start[[i]], ... :
>> UCSC data anomaly in transcript ENST00000417833: the cds
>> cumulative length is not a multiple of 3
>> 3: In .extractUCSCCdsStartEnd(cdsStart[i], cdsEnd[i],
>> exon_locs$start[[i]], ... :
>> UCSC data anomaly in transcript ENST00000450884: the cds
>> cumulative length is not a multiple of 3
>>
>>
>> Best,
>> Stefanie
>>
>> Am 07.06.2012 um 16:25 schrieb Steve Lianoglou:
>>
>>> Hi Stefanie,
>>>
>>> On Thu, Jun 7, 2012 at 5:16 AM, Stefanie Tauber
>>> <stefanie.tauber at univie.ac.at> wrote:
>>>> Hi
>>>>
>>>> I just tried it with R 2.15, I get the same error.
>>>>
>>>> If I follow your suggestion:
>>>>
>>>> txdb<- makeTranscriptDbFromUCSC(genome="hg19", tablename="ensGene")
>>>>
>>>>
>>>> I get:
>>>>
>>>> Download the ensGene table ... OK
>>>> Extract the 'transcripts' data frame ... OK
>>>> Extract the 'splicings' data frame ... OK
>>>> Download and preprocess the 'chrominfo' data frame ... Error in
>>>> download.file(url, destfile, quiet = TRUE) :
>>>> cannot open URL
>>>> 'http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/chromInfo.txt.gz'
>>>>
>>>> In addition: There were 50 or more warnings (use warnings() to see
>>>> the first
>>>> 50)
>>> [snip]
>>>
>>> Strange ... I also get the same warnings you get (the "cds cumulative
>>> length is not a multiple of 3") for some transcripts, but I think this
>>> is something beyond our control. I don't get any error(s) when
>>> downloading and building the TxDB, so it completes fine for me.
>>>
>>> I'm actually running the *-devel versions of the bioc packages w/
>>> R-2.15.x so it's not very easy for me to check the current released
>>> GenomicFeatures package, but I'd be a bit surprised if the error is
>>> there.
>>>
>>> Could you paste the output of `sessionInfo()` after you call
>>> `library(GenomicFeatures)` when running your new R-2.15.x install?
>>>
>>> -steve
>>>
>>>
>>> --
>>> Steve Lianoglou
>>> Graduate Student: Computational Systems Biology
>>> | Memorial Sloan-Kettering Cancer Center
>>> | Weill Medical College of Cornell University
>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>> DI Stefanie Tauber
>>
>> Center for Integrative Bioinformatics Vienna (CIBIV)
>> (CIBIV is a joint institute of Vienna University, Medical University,
>> and University of Veterinary Medicine, Vienna, Austria)
>> Max F. Perutz Laboratories (MFPL)
>> Campus Vienna Biocenter 5 (VBC5), Ebene 1, Room 1812.2
>> Dr. Bohr Gasse 9
>> A-1030 Wien, Austria
>> Phone: ++43 +1 / 42772-4030
>> Fax: ++43 +1 / 42772-4098
>> email: stefanie.tauber at univie.ac.at
>> www.cibiv.at
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list