[BioC] BioMaRt query
Kasper Daniel Hansen
kasperdanielhansen at gmail.com
Mon Oct 4 17:08:25 CEST 2010
When you use biomaRt you are querying Ensembl.
Ensembl remaps all probesets independently of Affymetrix. The *.db
package reflects the (current at the time of build) annotation
available from Affymetrix.
So for some reason Ensembl has decided that this particular probeset
does not map to a gene. You will need to track down how Ensembl
decides to do the probeset->gene (which is not trivial) mapping in
order to understand why, but my guess is that they are in some sense
stricter than Affymetrix.
While this is not related to Ensembl, you might want to read this
paper describing some of the problems with probe->probeset->gene
mappings:
http://nar.oxfordjournals.org/cgi/content/full/33/20/e175?ijkey=zaJMV7qU1XANIci&keytype=ref
Kasper
On Mon, Oct 4, 2010 at 4:44 AM, René Dreos <talponer at gmail.com> wrote:
> Dear BioC mailing list,
>
> I am trying to annotate Arabidopsis ATH1 genome array results using biomaRt,
> but it looks like some of the probesets are not annotated in biomaRt
> database. Here is one example:
>
>> library(biomaRt)
>> AT.db <- useMart(biomart="plant_mart_6", dataset="athaliana_eg_gene")
>> getBM(attributes = c("affy_ath1_121501","ensembl_gene_id","description"),
> filters = "affy_ath1_121501", values = "254998_at", mart = AT.db)
> [1] affy_ath1_121501 ensembl_gene_id description
> <0 rows> (or 0-length row.names)
>
> But if I use ath1121501.db library to annotate the same probeset:
>
>> library(annotate)
>> library(ath1121501.db)
>
>> mget("254998_at", env=ath1121501GENENAME)
> $`254998_at`
> [1] "encodes a choline synthase whose gene expression is induced by high
> salt and mannitol."
>
>> mget("254998_at", env=ath1121501ACCNUM)
> $`254998_at`
> [1] "AT4G09760"
>
> Why is this happening?
>
> Thank you for any advice,
> best regards
> r
>
>> sessionInfo()
> R version 2.11.1 (2010-05-31)
> x86_64-apple-darwin9.8.0
>
> locale:
> [1] C
>
> attached base packages:
> [1] grid stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] ath1121501.db_2.4.1 org.At.tair.db_2.4.3
> [3] RSQLite_0.9-2 annotate_1.26.1
> [5] ath1121501cdf_2.6.0 biomaRt_2.4.0
> [7] genefilter_1.30.0 marray_1.26.0
> [9] gplots_2.8.0 caTools_1.10
> [11] bitops_1.0-4.1 gdata_2.7.2
> [13] gtools_2.6.2 bradiar1b520742cdf_1.24.0
> [15] arrayQualityMetrics_2.6.0 affyPLM_1.24.1
> [17] gcrma_2.20.0 preprocessCore_1.10.0
> [19] matchprobes_1.20.0 Biostrings_2.16.9
> [21] IRanges_1.6.15 AnnotationDbi_1.10.2
> [23] affxparser_1.20.0 makecdfenv_1.26.0
> [25] lattice_0.18-8 RMySQL_0.7-5
> [27] DBI_0.2-5 affy_1.26.1
> [29] Biobase_2.8.0 limma_3.4.4
>
> loaded via a namespace (and not attached):
> [1] RColorBrewer_1.0-2 RCurl_1.4-2 XML_3.1-1
> [4] affyio_1.16.0 beadarray_1.16.0 hwriter_1.2
> [7] latticeExtra_0.6-14 simpleaffy_2.24.0 splines_2.11.1
> [10] stats4_2.11.1 survival_2.35-8 tools_2.11.1
> [13] vsn_3.16.0 xtable_1.5-6
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list