[BioC] hgu133a annotation and discontinued Entrez IDs
David Fermin
ferm0007 at umn.edu
Thu Jun 12 04:56:51 CEST 2008
I discovered a problem in annotating some expression analysis
results. I am using CEL files from the HG-U133A and, as an initial
step after creating an rmaset, I filter out the probesets without
Entrez gene IDs as follows:
arrayset <- ReadAffy()
rmaset <- rma(arrayset)
entrezIds <- mget(featureNames(rmaset), envir = hgu133aENTREZID)
haveEntrezId <- names(entrezIds)[sapply(entrezIds, function(x) !is.na
(x))]
numNoEntrezId <- length(featureNames(rmaset)) - length(haveEntrezId)
rmaset <- rmaset[haveEntrezId, ]
After doing my limma analysis I use aafTableAnn() to grab the data. I
expect to get a list of probesets which all have annotation
information. However, when I manually scanned the annotated table I
discovered a number of probesets with an Entrez ID but no other
annotation. Most of these, it turns out, have been discontinued as of
a 2005 build of Entrez, some of which being mapped to other IDs and
some being dropped altogether. See sessionInfo() below for the
versions I am using. When I query "? hgu133a" I get the following
hgu133a {hgu133a}
The annotation package was built using a downloadable R package -
AnnBuilder (download and build your own) from www.bioconductor.org
using the following public data sources:
Entrez Gene:ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/. Built: Source data
downloaded from Entrez Gene on Fri Aug 24 18:20:19 2007...
[Package hgu133a version 2.0.1 Index]
My question is, why am I getting probesets with discontinued Entrez
IDs? Thank you for your help.
Best Regards,
David
> sessionInfo()
R version 2.6.2 (2008-02-08)
i386-apple-darwin8.10.1
locale:
en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] splines tools stats graphics grDevices utils
datasets methods
[9] base
other attached packages:
[1] GOstats_2.4.0 Category_2.4.0 RBGL_1.14.0
GO.db_2.0.2
[5] graph_1.16.1 genefilter_1.16.0 survival_2.34
annotate_1.16.1
[9] xtable_1.5-2 AnnotationDbi_1.0.6 RSQLite_0.6-8
DBI_0.2-4
[13] ALL_1.4.3 hgu133a_2.0.1 annaffy_1.10.1
KEGG_2.0.1
[17] GO_2.0.1 limma_2.12.0 affy_1.16.0
preprocessCore_1.0.0
[21] affyio_1.6.1 Biobase_1.16.3
loaded via a namespace (and not attached):
[1] cluster_1.11.9
More information about the Bioconductor
mailing list