[BioC] zero-row result breaks select() on PolyPhen.Hsapiens.* and SIFT.Hsapiens.*

Valerie Obenchain vobencha at fhcrc.org
Tue Sep 24 21:17:45 CEST 2013


I will update the SIFT and PolyPhen databases for the upcoming release.

Valerie


On 09/23/2013 02:21 PM, Robert Castelo wrote:
> hi Valerie,
>
> On 9/23/13 9:41 PM, Valerie Obenchain wrote:
>> Hi Robert,
>>
>> Thanks for reporting this. Now fixed in VariantAnnotation 1.7.47.
>>
> great! thanks for the quick fix.
>
>> Have you looked at the ensemblVEP package? It's a wrapper to Ensembl's
>> Variant Effect Predictor tool. We encourage the use of ensemblVEP
>> instead of the SIFT and PolyPhen databases because it accesses the
>> most current information. As you know, the SIFT and PolyPhen dbs are
>> becoming dated and we don't have plans to package newer versions.
>>
>> emsemblVEP requires that you download and install the script located
>> here,
>>
>> http://uswest.ensembl.org/info/docs/tools/vep/script/vep_download.html
>>
>> The variant_effect_predictor.pl executable must be in your path. Let
>> us know if you have trouble with the install/setup.
> yes, i looked at it, and i think it is a great solution for analysis of
> a few hundred variants as it needs to acces the internet to download the
> information. However, i'm working on a package that eventually needs to
> annotate a few thousand variants and i find the dependency on an
> external perl script that the end user must install, somewhat troubling.
> let me know if you have suggestions about this.
>
> for software packages that need to efficiently access SIFT and PolyPhen
> annotations from R, freezing the data regularly is, in my opinion, a
> much better solution. i was actually going to ask you if you could
> update these two packages. As much as you want to keep an up to date
> version of the SNPloc.Hsapiens.* or TxDb.* packages, i'd do it for SIFT
> and Polyphen, unless there's some licensing issue that prevents this, as
> it happens now with OMIM.
>
> cheers,
> robert.
>
>> Valerie
>>
>> On 09/20/2013 05:25 PM, Robert Castelo wrote:
>>> Dear list,
>>>
>>> interrogating the TxDb.Hsapiens.UCSC.hg19.knownGene package with no
>>> result gives the following expected result:
>>>
>>> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>> select(TxDb.Hsapiens.UCSC.hg19.knownGene, keys="dummy",
>>> keytype="GENEID", cols="SYMBOL")
>>> [1] GENEID
>>> <0 rows> (or 0-length row.names)
>>>
>>> however, when i try the same with the annotation packages
>>> PolyPhen.Hsapiens.dbSNP131 and SIFT.Hsapiens.dbSNP132, the select
>>> instruction breaks into an error:
>>>
>>> library(SIFT.Hsapiens.dbSNP132)
>>> library(PolyPhen.Hsapiens.dbSNP131)
>>>
>>> select(SIFT.Hsapiens.dbSNP132, keys=c("dummy"))
>>> Error in data.frame(RSID = unlist(rsid), PROTEINID =
>>> unlist(protein_id),  :
>>>    arguments imply differing number of rows: 1, 0
>>>
>>> select(PolyPhen.Hsapiens.dbSNP131, keys="dummy")
>>> Error in `*tmp*`$RSID : $ operator is invalid for atomic vectors
>>>
>>> i guess these two annotation packages should work analogously to
>>> TxDb.Hsapiens.UCSC.hg19.knownGene, and give just a 0-row data.frame
>>> object, right?
>>>
>>> these errors reproduce also with the current devel version of BioC,
>>> please find below both sessionInfo() outputs.
>>>
>>> cheers,
>>> robert.
>>>
>>> =====RELEASE====
>>> R version 3.0.1 (2013-05-16)
>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] parallel  stats     graphics  grDevices utils     datasets
>>> methods base
>>>
>>> other attached packages:
>>>   [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.9.2 GenomicFeatures_1.12.3
>>>   [3] AnnotationDbi_1.22.6 Biobase_2.20.1
>>>   [5] PolyPhen.Hsapiens.dbSNP131_1.0.2 SIFT.Hsapiens.dbSNP132_1.0.2
>>>   [7] RSQLite_0.11.4 DBI_0.2-7
>>>   [9] VariantAnnotation_1.6.7 Rsamtools_1.12.4
>>> [11] Biostrings_2.28.0 GenomicRanges_1.12.5
>>> [13] IRanges_1.18.3 BiocGenerics_0.6.0
>>> [15] vimcom_0.9-8 setwidth_1.0-3
>>> [17] colorout_1.0-0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] biomaRt_2.16.0     bitops_1.0-6       BSgenome_1.28.0
>>> RCurl_1.95-4.1     rtracklayer_1.20.4
>>> [6] stats4_3.0.1       tools_3.0.1        XML_3.95-0.2 zlibbioc_1.6.0
>>>
>>>
>>>
>>> =====DEVEL=====
>>> R version 3.0.1 (2013-05-16)
>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] parallel  stats     graphics  grDevices utils     datasets
>>> methods base
>>>
>>> other attached packages:
>>>   [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.9.2 GenomicFeatures_1.13.40
>>>   [3] AnnotationDbi_1.23.23 Biobase_2.21.7
>>>   [5] PolyPhen.Hsapiens.dbSNP131_1.0.2 SIFT.Hsapiens.dbSNP132_1.0.2
>>>   [7] RSQLite_0.11.4 DBI_0.2-7
>>>   [9] VariantAnnotation_1.7.46 Rsamtools_1.13.41
>>> [11] Biostrings_2.29.19 GenomicRanges_1.13.44
>>> [13] XVector_0.1.4 IRanges_1.19.37
>>> [15] BiocGenerics_0.7.5 vimcom_0.9-8
>>> [17] setwidth_1.0-3 colorout_1.0-0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] biomaRt_2.17.3      bitops_1.0-6        BSgenome_1.29.1
>>> RCurl_1.95-4.1      rtracklayer_1.21.12
>>> [6] stats4_3.0.1        tools_3.0.1         XML_3.95-0.2 zlibbioc_1.7.0
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>



More information about the Bioconductor mailing list