[BioC] Affymetrix probeset ids to gene symbols

Wed Jul 9 12:47:13 CEST 2008

Hi Kurt,

There is not a one-to-one mapping between Affy probeset and gene. There 
can be many reasons for this. For instance, there may be splice variants 
that could be interrogated by different probesets (not that likely IMO, 
since they target the first 600 bp of the transcript). Another 
possibility could be different transcripts that were originally 
considered to be ESTs that have subsequently been mapped to the same 
gene. I am sure there are other reasons for the one-to-many mapping of 
probeset to gene as well.

Best,

Jim

Kurt Vanhoutte wrote:
> Dear Tom & co,
> 
> I used getSymbol but retrieved a limited and variable number of 
> probes (1-5) with the same name. What could be the reason for this? 
> (in the context of >10 MisMatch/PerfectMatch probes for each gene)
> 
> Some background:
>          We are applying a contrast analysis to a pathological Affy 
> micro-array dataset.
> The dataset is available in GEO as a series matrix txt file ( 22645 
> probes/ 35 samples-http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE2240).
> 
>          We interrogated the set with the open access R/Bioconductor 
> packages (hgu133b and KEGG, annotate).
> 
> Short code:
> Loading the libraries:
>           library(hgu133b.db)
>           library(KEGG.db)
>           library(annotate)
> In particular we wanted to analyse the apoptosis genes from the KEGG 
> apoptosis pathway.
>          xx <- as.list(hgu133bPATH2PROBE)  Alternatives ?
>          xx$'04210' #genes
> However when we retrieve the gene names,
>           listtemp<-getSYMBOL(xx$'04210',"hgu133b.db")
> we get a variable number of probes (1-5) with the same name, see 
> appendix below and we do not retrieve all genes from the KEGG pathway.
> Though the probes are all apoptosis genes, I did not anticipate 
> finding 5 XIAP probes for example.
> 
> Any suggestions to resolve this issue (ic the difference between the probes)?
> 
> 
> Kind regards,
> Kurt
> 
> Appendix:
>   listtemp<-getSYMBOL(xx$'04210',"hgu133b.db")
>   > listtemp
> 225471_s_at   226156_at   236664_at 
> 225858_s_at   225859_at   228363_at 235222_x_at
>       "AKT2"      "AKT2"      "AKT2"      "XIAP"      "XIAP"   "XIAP" 
>       "XIAP"
> 243026_x_at   237522_at   232660_at   231228_at   232012_at 
> 231218_at   223518_at
>       "XIAP"       "FAS"       "BAD"    "BCL2L1"     "CAPN1" 
> "CASP8"      "DFFA"
>    228465_at   244383_at   231779_at   231699_at   235980_at 
> 229392_s_at   229606_at
>      "IRAK1"     "IRAK1"     "IRAK2"    "NFKBIA"    "PIK3CA" 
> "PIK3R2"    "PPP3CA"
>    231304_at   244782_at   235780_at   225000_at   225011_at 
> 230202_at   241325_at
>     "PPP3R2"    "PPP3R2"    "PRKACB"   "PRKAR2A"   "PRKAR2A" "RELA"    "PIK3R3"
>    226551_at   227345_at   231775_at 
> 237367_x_at   239629_at   222880_at 224229_s_at
>      "RIPK1" "TNFRSF10D" 
> "TNFRSF10A"     "CFLAR"     "CFLAR"    "AKT3"      "AKT3"
>    242876_at   227553_at   227645_at   229415_at   244546_at
>   "AKT3"    "PIK3R5"    "PIK3R5"      "CYCS"      "CYCS"
> 
> 
>  > sessionInfo()
> R version 2.7.1 (2008-06-23)
> i386-pc-mingw32
> 
> locale:
> LC_COLLATE=Dutch_Belgium.1252;LC_CTYPE=Dutch_Belgium.1252;LC_MONETARY=Dutch_Belgium.1252;LC_NUMERIC=C;LC_TIME=Dutch_Belgium.1252
> 
> attached base packages:
> [1] tools     stats     graphics  grDevices utils     datasets  methods
> [8] base
> 
> other attached packages:
>   [1] annotate_1.18.0      xtable_1.5-2         KEGG.db_2.2.0
>   [4] hgu133b.db_2.2.0     AnnotationDbi_1.2.2  RSQLite_0.6-9
>   [7] DBI_0.2-4            affy_1.18.2          preprocessCore_1.2.0
> [10] affyio_1.8.0         Biobase_2.0.1
> 
> /////////////////////////////////////////Archive postings on the 
> subject July 2008
> getSYMBOL in package annotate is a nice way to handle this.
> 
> I found it easier, at least.
> 
> Cheers
> 
> Tom
> On Jul 3, 2008, at 4:31 PM, Peter Robinson wrote:
> 
>  > Dear all,
>  >
>  > I have a list of affymetrix probeset ids from another program and
>  > would like to use annaffy to extract the corresponding gene names.
>  > I am still something of a novice at R and am probably doing
>  > something silly, but found no answer in the package vignette. My
>  > script:
>  >
>  >
>  > library(annaffy)
>  >
>  > dat <- read.table('sign.txt.cdt',header=T)
>  > psets<-dat[,3]
>  > symbols<-aafSymbol(as.character(psets),"moe430b.db")
>  > s<-as.character(symbols)
>  >
>  > I was surprisied that so few of the probeset ids got identified by
>  > this script. WHat am I doing wrong?
>  >
>  > THanks Peter
>  > s<-as.character(symbols)
>  > > s
>  >  [1] "character(0)"       "character(0)"       "character(0)"
>  >  [4] "character(0)"       "character(0)"       "character(0)"
>  >  [7] "character(0)"       "character(0)"       "character(0)"
>  > [10] "character(0)"       "character(0)"       "character(0)"
>  > [13] "character(0)"       "character(0)"       "Egr3"
>  > [16] "character(0)"       "character(0)"       "character(0)"
>  > [19] "character(0)"       "character(0)"       "character(0)"
>  > [22] "character(0)"       "character(0)"       "character(0)"
>  > [25] "Irak2"              "character(0)"       "Coq10b"
>  > [28] "character(0)"       "BC063749"           "character(0)"
>  > [31] "4631422O05Rik"      "character(0)"       "Coq10b"
>  > [34] "character(0)"       "character(0)"       "AI452195"
>  > [37] "character(0)"       "character(0)"       "character(0)"
>  > [40] "Mobkl2a"            "character(0)"       "character(0)"
>  >
>  > (...snip....)
>  >
> 
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, MS
Biostatistician
UMCCC cDNA and Affymetrix Core
University of Michigan
1500 E Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623