[BioC] BiomaRt Ensembl RefSeq query error
Davis, Wade
davisjwa at health.missouri.edu
Thu Jan 23 04:15:56 CET 2014
Georg,
Using your code and calling for only "ENSMUSG00000000567" does not result in NA for me, as you can see:
library(biomaRt)
ensembl <- useMart("ensembl", dataset = 'mmusculus_gene_ensembl')
getBM(attributes = c("ensembl_gene_id","refseq_mrna"), filter="ensembl_gene_id",
"ENSMUSG00000000567",mart = ensembl, uniqueRows = TRUE)
ensembl_gene_id refseq_mrna
1 ENSMUSG00000000567 NM_011448
You are running R 3.0.1 just like me, but your biomaRt is 2.18 (I'm running 2.16, see below). biomaRt 2.18 is part of BioC 2.13, which is meant for R 3.0.2 as noted here:
http://www.bioconductor.org/install/
That is the most likely cause.
Wade
sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] biomaRt_2.16.0
loaded via a namespace (and not attached):
[1] RCurl_1.95-4.1 XML_3.98-1.1
-----Original Message-----
From: Georg Otto [mailto:georg.otto at imm.ox.ac.uk]
Sent: Tuesday, January 21, 2014 6:49 AM
To: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] BiomaRt Ensembl RefSeq query error
as an amendment to my previous post, here is the sessionInfo():
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] biomaRt_2.18.0
loaded via a namespace (and not attached):
[1] annotate_1.40.0 AnnotationDbi_1.24.0 Biobase_2.22.0
[4] BiocGenerics_0.8.0 compiler_3.0.1 DBI_0.2-7
[7] DESeq_1.14.0 genefilter_1.44.0 geneplotter_1.40.0
[10] grid_3.0.1 IRanges_1.20.6 lattice_0.20-24
[13] parallel_3.0.1 RColorBrewer_1.0-5 RCurl_1.95-4.1
[16] RSQLite_0.11.4 splines_3.0.1 stats4_3.0.1
[19] survival_2.37-4 tools_3.0.1 XML_3.98-1.1
[22] xtable_1.7-1
Georg Otto <georg.otto at imm.ox.ac.uk> writes:
> Dear Bioconductors,
>
> I am trying to query 14005 Ensembl gene IDs for their Refseq
> annotations using this code (I can send the gene IDs upon request):
>
> ensembl <- useMart("ensembl", dataset = 'mmusculus_gene_ensembl')
>
> getBM(attributes = c("ensembl_gene_id",
> "refseq_mrna"), filter="ensembl_gene_id",
> ensembl.ids,
> mart = ensembl, uniqueRows = TRUE)
>
>
> If I query for the full gene set, many RefSeq IDs are missing (NA),
> for example for the gene ENSMUSG00000000567 (sox9), whereas if I query
> for a subset, say ensembl.ids[1:12000], all the RefSeq IDs are there.
> It does not seem to matter which subset I use, but the size of the
> subset has to be smaller than ca. 12000 genes.
>
> Any idea what is going on?
>
> Best wishes,
>
> Georg
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list