Hello

I have had to review recently an analysis I did some time ago. This was done on affymetrix hgu133plus2 chips with R 2.4 and BioC 1.9 I have re-run the analyses using R 2.9 and BioC 2.4 (sessionInfo below).
I have been surprised by the changes in the annotations: Many probesets that had had an annotation have become NA's whereas some have changed their symbol and their Entrez gene.

To be specific I summarize my question with the top genes of my list

The list I obtained 2 years ago is:

probeset    locuslink    symbol
238900_at 3123 HLA-DRB1
232583_at 8440 NCK2
236307_at 60468 BACH2
223620_at 2857 GPR34
219759_at 64167 LRAP
201702_s_at 5514 PPP1R10
232882_at 2308 FOXO1A
213446_s_at 8826 IQGAP1
234033_at 9693 RAPGEF2
243006_at 2534 FYN
244648_at 54520 CCDC93
243691_at 23142 DCUN1D4
239264_at 60412 EXOC4
243546_at 143686 SESN3
205239_at 374 AREG
1565703_at 55520 ELAC1
244061_at 55843 ARHGAP15
230505_at 26037 SIPA1L1
242688_at 9320 TRIP12
1556474_a_at 285097 FLJ38379
232614_at 596 BCL2
1565689_at 3839 KPNA3
236685_at NA NA
225173_at 93663 ARHGAP18
241893_at 4249 MGAT5

I used the following code to reproduce the issue with the annotations:


#####################################################################
## Verification using R 2.9 & BioC 2.4
#####################################################################

> probes<-c("238900_at" , "232583_at", "236307_at" ,"223620_at" , "219759_at" ,
+  "201702_s_at" , "232882_at"  ,  "213446_s_at",  "234033_at",    "243006_at" ,  
+  "244648_at" ,   "243691_at" ,   "239264_at" ,   "243546_at" ,   "205239_at" ,
+  "1565703_at" ,  "244061_at"  ,  "230505_at" ,   "242688_at" ,   "1556474_a_at",
+  "232614_at"  ,  "1565689_at" ,  "236685_at"  ,  "225173_at" ,   "241893_at")
> 
> library(hgu133plus2.db)
> library(annotate)
> 
> entrezs<- getEG(probes, "hgu133plus2")
> symbols<- getSYMBOL(probes, "hgu133plus2")
> sel2<- cbind(probes, entrezs, symbols)
> sel2
             probes         entrezs     symbols       
238900_at    "238900_at"    "100133484" "LOC100133484"
232583_at    "232583_at"    NA          NA           
236307_at    "236307_at"    NA          NA           
223620_at    "223620_at"    "2857"      "GPR34"       
219759_at    "219759_at"    "64167"     "ERAP2"       
201702_s_at  "201702_s_at"  "5514"      "PPP1R10"     
232882_at    "232882_at"    NA          NA           
213446_s_at  "213446_s_at"  "8826"      "IQGAP1"      
234033_at    "234033_at"    NA          NA           
243006_at    "243006_at"    NA          NA           
244648_at    "244648_at"    NA          NA           
243691_at    "243691_at"    NA          NA           
239264_at    "239264_at"    NA          NA           
243546_at    "243546_at"    NA          NA           
205239_at    "205239_at"    "374"       "AREG"        
1565703_at   "1565703_at"   "4089"      "SMAD4"       
244061_at    "244061_at"    NA          NA           
230505_at    "230505_at"    "145474"    "LOC145474"   
242688_at    "242688_at"    NA          NA           
1556474_a_at "1556474_a_at" "285097"    "FLJ38379"    
232614_at    "232614_at"    NA          NA           
1565689_at   "1565689_at"   NA          NA           
236685_at    "236685_at"    NA          NA           
225173_at    "225173_at"    "93663"     "ARHGAP18"    
241893_at    "241893_at"    NA          NA           
> sessionInfo()
R version 2.9.0 (2009-04-17) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] annotate_1.22.0       hgu133plus2.db_2.2.11 RSQLite_0.7-1         DBI_0.2-4             AnnotationDbi_1.6.0   Biobase_2.4.1       

loaded via a namespace (and not attached):
[1] xtable_1.5-5
#############################################

Many probesets seem to have changed.
Can someone explain to me what is happening (or what may I be doing wrong)?

The same code does not work with R 2.4 but if I change hgu133plus2.db by hgu133plus2 and getEG by getLL I obtain the original results:

###############################################
### Review of annotatons with R 2.4 and BioC 1.9
###############################################

### This code is executed on a clean new session with R 2. and BioC 1.9

> probes<-c("238900_at" , "232583_at", "236307_at" ,"223620_at" , "219759_at" ,
+  "201702_s_at" , "232882_at"  ,  "213446_s_at",  "234033_at",    "243006_at" ,  
+  "244648_at" ,   "243691_at" ,   "239264_at" ,   "243546_at" ,   "205239_at" ,
+  "1565703_at" ,  "244061_at"  ,  "230505_at" ,   "242688_at" ,   "1556474_a_at",
+  "232614_at"  ,  "1565689_at" ,  "236685_at"  ,  "225173_at" ,   "241893_at")
> 
>LLs<- getLL(rownames(sel), "hgu133plus2")
>symbols<- getSYMBOL(rownames(sel), "hgu133plus2")
>sel1<- cbind(probes, LLs, symbols)
>sel1
             probes         LLs      symbols    
238900_at    "238900_at"    "3123"   "HLA-DRB1" 
232583_at    "232583_at"    "8440"   "NCK2"     
236307_at    "236307_at"    "60468"  "BACH2"    
223620_at    "223620_at"    "2857"   "GPR34"    
219759_at    "219759_at"    "64167"  "ERAP2"    
201702_s_at  "201702_s_at"  "5514"   "PPP1R10"  
232882_at    "232882_at"    "2308"   "FOXO1"    
213446_s_at  "213446_s_at"  "8826"   "IQGAP1"   
234033_at    "234033_at"    "9693"   "RAPGEF2"  
243006_at    "243006_at"    "2534"   "FYN"      
244648_at    "244648_at"    "54520"  "CCDC93"   
243691_at    "243691_at"    "23142"  "DCUN1D4"  
239264_at    "239264_at"    "60412"  "EXOC4"    
243546_at    "243546_at"    "143686" "SESN3"    
205239_at    "205239_at"    "374"    "AREG"     
1565703_at   "1565703_at"   "4089"   "SMAD4"    
244061_at    "244061_at"    "55843"  "ARHGAP15" 
230505_at    "230505_at"    "145474" "LOC145474"
242688_at    "242688_at"    "9320"   "TRIP12"   
1556474_a_at "1556474_a_at" "285097" "FLJ38379" 
232614_at    "232614_at"    "596"    "BCL2"     
1565689_at   "1565689_at"   "3839"   "KPNA3"    
236685_at    "236685_at"    NA       NA         
225173_at    "225173_at"    "93663"  "ARHGAP18" 
241893_at    "241893_at"    "4249"   "MGAT5"    

> sessionInfo()
R version 2.4.1 (2006-12-18) 
i386-pc-mingw32 

locale:
LC_COLLATE=Spanish_Spain.1252;LC_CTYPE=Spanish_Spain.1252;LC_MONETARY=Spanish_Spain.1252;LC_NUMERIC=C;LC_TIME=Spanish_Spain.1252

attached base packages:
[1] "tools"     "stats"     "graphics"  "grDevices"
[5] "utils"     "datasets"  "methods"   "base"     

other attached packages:
   annotate     Biobase hgu133plus2 
   "1.12.1"    "1.12.2"    "1.14.0" 

########################################################

In summary. If I use R 2.4/BioC 1.9 I obtain the same results I ibtained 2 years ago, but If I do the same steps using R2.9/BioC2.4 the results change dramatically.
I have repeated the analyses using BioC 2.01 in R 2.7 and BioC 2.2 in R 2.8 (results not shown here). BioC 2.0 yield the same as 1.9 and BioC 2.2 the same as 2.4,

Any help to understand what's happening would be appreciated

Alex Sanchez

-----------------------------------------------------------------------------------------------------
Dr. Alex  Sánchez. Statistics Department. University of Barcelona.
Facultat de Biologia UB. Avda Diagonal 645. 08028 Barcelona. Spain
asanchez_at_ub.edu
Statistics and Bioinformatics Unit
Institut de Recerca. Hospital Universitari Vall 'Hebron
Passeig Vall d'Hebron 112-119. 08034 Barcelona
asanchez_at_ir.vhebron.net
----------------------------------------------------------------------------------------------------




	[[alternative HTML version deleted]]

