[BioC] hgu133a annotation and discontinued Entrez IDs

David Fermin ferm0007 at umn.edu
Thu Jun 12 04:56:51 CEST 2008


I discovered a problem in annotating some expression analysis  
results. I am using CEL files from the HG-U133A and, as an initial  
step after creating an rmaset, I filter out the probesets without  
Entrez gene IDs as follows:

arrayset <- ReadAffy()
rmaset <- rma(arrayset)
entrezIds <- mget(featureNames(rmaset), envir = hgu133aENTREZID)
haveEntrezId <- names(entrezIds)[sapply(entrezIds, function(x) !is.na 
(x))]
numNoEntrezId <- length(featureNames(rmaset)) - length(haveEntrezId)
rmaset <- rmaset[haveEntrezId, ]

After doing my limma analysis I use aafTableAnn() to grab the data. I  
expect to get a list of probesets which all have annotation  
information. However, when I manually scanned the annotated table I  
discovered a number of probesets with an Entrez ID but no other  
annotation. Most of these, it turns out, have been discontinued as of  
a 2005 build of Entrez, some of which being mapped to other IDs and  
some being dropped altogether.  See sessionInfo() below for the  
versions I am using. When I query "? hgu133a" I get the following

hgu133a {hgu133a}

The annotation package was built using a downloadable R package -  
AnnBuilder (download and build your own) from www.bioconductor.org  
using the following public data sources:
Entrez Gene:ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/. Built: Source data  
downloaded from Entrez Gene on Fri Aug 24 18:20:19 2007...

[Package hgu133a version 2.0.1 Index]

My question is, why am I getting probesets with discontinued Entrez  
IDs? Thank you for your help.

Best Regards,
David

 > sessionInfo()
R version 2.6.2 (2008-02-08)
i386-apple-darwin8.10.1

locale:
en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] splines   tools     stats     graphics  grDevices utils      
datasets  methods
[9] base

other attached packages:
  [1] GOstats_2.4.0        Category_2.4.0       RBGL_1.14.0           
GO.db_2.0.2
  [5] graph_1.16.1         genefilter_1.16.0    survival_2.34         
annotate_1.16.1
  [9] xtable_1.5-2         AnnotationDbi_1.0.6  RSQLite_0.6-8         
DBI_0.2-4
[13] ALL_1.4.3            hgu133a_2.0.1        annaffy_1.10.1        
KEGG_2.0.1
[17] GO_2.0.1             limma_2.12.0         affy_1.16.0           
preprocessCore_1.0.0
[21] affyio_1.6.1         Biobase_1.16.3

loaded via a namespace (and not attached):
[1] cluster_1.11.9



More information about the Bioconductor mailing list