[BioC] Incomplete EntrezID annotations for the Mouse 430 v2.0 probe-set
Martin Morgan
mtmorgan at fhcrc.org
Wed Nov 3 03:40:22 CET 2010
On 11/02/2010 11:20 AM, ANJAN PURKAYASTHA wrote:
> Hi Martin,
> Session Info:
> R version 2.11.1 (2010-05-31)
> i386-apple-darwin9.8.0
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] affy_1.26.1 GOstats_2.14.0 graph_1.28.0
> Category_2.14.0 mouse4302.db_2.4.1 org.Mm.eg.db_2.4.1
> RSQLite_0.9-2
> [8] DBI_0.2-5 AnnotationDbi_1.10.2 Biobase_2.8.0
>
> loaded via a namespace (and not attached):
> [1] affyio_1.16.0 annotate_1.26.1 genefilter_1.30.0
> GO.db_2.4.1 GSEABase_1.10.0 preprocessCore_1.10.0
> [7] RBGL_1.26.0 splines_2.11.1 survival_2.35-8
> tools_2.11.1 XML_3.1-1 xtable_1.5-6
>
>
> Commands used to create the mapping:
> Library(mouse4302.db)
> id <- rownames(allMtb.rma.data.frame)
> map <- mouse4302ENTREZID
> probe_entrezid <- unlist(mget(id, map))
> p <- as.data.frame(probe_entrezid)
> p now has the probeID_entrezID mappings
With R-2-11 I see
> mouse4302()
[...snip...]
mouse4302ENTREZID has 37316 mapped keys (of 45101 keys)
[...snip...]
Date for NCBI data: 2010-Mar1
The current version of R / Bioconductor is R-2-12, where there are 37413
mapped probes from NCBI data of 2010-Sep7. Using biomaRt I get
> library(biomaRt)
> mart = useMart("ensembl", "mmusculus_gene_ensembl")
> attrs = listAttributes(mart)
> attrs[grep("(Entrez|Affy mouse)", attrs[[2]]),]
name description
47 entrezgene EntrezGene ID
95 affy_mouse430_2 Affy mouse430 2
96 affy_mouse430a_2 Affy mouse430a 2
> filts = listFilters(mart)
> filts[grep("(Entrez|Affy mouse)", filts[[2]]),]
name description
52 with_entrezgene with EntrezGene ID(s)
84 entrezgene EntrezGene ID(s) [e.g. 100287163]
121 affy_mouse430_2 Affy mouse430 2 ID(s) [e.g. 1426088_at]
122 affy_mouse430a_2 Affy mouse430a 2 ID(s) [e.g. 1426088_at]
> res = getBM(c("affy_mouse430_2","entrezgene"), "with_entrezgene",
TRUE, mart)
> head(res)
affy_mouse430_2 entrezgene
1 338371
2 238944
3 208431
4 1430582_at 268281
5 1458594_at 268281
6 1455882_x_at 319922
> head(table(table(res[[1]])))
1 2 3 4 5 6
24627 1746 374 96 62 34
which tells me there are 24627 uniquely mapping probes, and some more
that could be retrieved with some work (I haven't checked my biomaRt
work very carefully here, so could have made mistakes, and I don't know
biomaRt well enough to get the provenance of the probes I have
identified, unlike with mouse4302.db where ?mouse4302ENTREZID is
helpful). I could remap the probes using chromosome coordinates from the
mouse4302 package and BSgenome / Biostrings, and then use org.Mm.eg.db
to map coordinates to genes, too. So I think the best you can do easily
are the ~37,000 probes that are mapped.
Martin
>
> Thanks,
> Anjan
>
>
> On Tue, Nov 2, 2010 at 2:16 PM, Martin Morgan <mtmorgan at fhcrc.org
> <mailto:mtmorgan at fhcrc.org>> wrote:
>
> On 11/02/2010 11:14 AM, ANJAN PURKAYASTHA wrote:
> > Hi,
> > I have run into the following problem. I created a
> probeID-EntrezID mapping
> > for the Affy mouse array from the cognate annotation file
> Mouse4302.db.
> > Unfortunately about 10000 genes do not have corresponding EntrezID.
> > Many of these are genes with known functions. If I cannot map a
> EntrezID to
> > these then I cannot retrieve GO annotations and consequently I
> cannot do a
> > Gene Set Enrichment analysis using GOstats.
> > Does anyone have an update annotation file?
>
> Hi Anjan
>
> What is your sessionInfo() (else how could we know what an 'updated'
> annotation file is?) and how did you preform the mapping (short,
> hopefully reproducible, code)?
>
> Martin
>
> > Many thanks in advance,
> > Anjan
> >
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>
>
>
>
> --
> ===================================
> anjan purkayastha, phd.
> research associate
> fas center for systems biology,
> harvard university
> 52 oxford street
> cambridge ma 02138
> phone-703.740.6939
> ===================================
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioconductor
mailing list