[BioC] Affymetrix probeset ids to gene symbols
James MacDonald
jmacdon at med.umich.edu
Wed Jul 9 12:47:13 CEST 2008
Hi Kurt,
There is not a one-to-one mapping between Affy probeset and gene. There
can be many reasons for this. For instance, there may be splice variants
that could be interrogated by different probesets (not that likely IMO,
since they target the first 600 bp of the transcript). Another
possibility could be different transcripts that were originally
considered to be ESTs that have subsequently been mapped to the same
gene. I am sure there are other reasons for the one-to-many mapping of
probeset to gene as well.
Best,
Jim
Kurt Vanhoutte wrote:
> Dear Tom & co,
>
> I used getSymbol but retrieved a limited and variable number of
> probes (1-5) with the same name. What could be the reason for this?
> (in the context of >10 MisMatch/PerfectMatch probes for each gene)
>
> Some background:
> We are applying a contrast analysis to a pathological Affy
> micro-array dataset.
> The dataset is available in GEO as a series matrix txt file ( 22645
> probes/ 35 samples-http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE2240).
>
> We interrogated the set with the open access R/Bioconductor
> packages (hgu133b and KEGG, annotate).
>
> Short code:
> Loading the libraries:
> library(hgu133b.db)
> library(KEGG.db)
> library(annotate)
> In particular we wanted to analyse the apoptosis genes from the KEGG
> apoptosis pathway.
> xx <- as.list(hgu133bPATH2PROBE) Alternatives ?
> xx$'04210' #genes
> However when we retrieve the gene names,
> listtemp<-getSYMBOL(xx$'04210',"hgu133b.db")
> we get a variable number of probes (1-5) with the same name, see
> appendix below and we do not retrieve all genes from the KEGG pathway.
> Though the probes are all apoptosis genes, I did not anticipate
> finding 5 XIAP probes for example.
>
> Any suggestions to resolve this issue (ic the difference between the probes)?
>
>
> Kind regards,
> Kurt
>
> Appendix:
> listtemp<-getSYMBOL(xx$'04210',"hgu133b.db")
> > listtemp
> 225471_s_at 226156_at 236664_at
> 225858_s_at 225859_at 228363_at 235222_x_at
> "AKT2" "AKT2" "AKT2" "XIAP" "XIAP" "XIAP"
> "XIAP"
> 243026_x_at 237522_at 232660_at 231228_at 232012_at
> 231218_at 223518_at
> "XIAP" "FAS" "BAD" "BCL2L1" "CAPN1"
> "CASP8" "DFFA"
> 228465_at 244383_at 231779_at 231699_at 235980_at
> 229392_s_at 229606_at
> "IRAK1" "IRAK1" "IRAK2" "NFKBIA" "PIK3CA"
> "PIK3R2" "PPP3CA"
> 231304_at 244782_at 235780_at 225000_at 225011_at
> 230202_at 241325_at
> "PPP3R2" "PPP3R2" "PRKACB" "PRKAR2A" "PRKAR2A" "RELA" "PIK3R3"
> 226551_at 227345_at 231775_at
> 237367_x_at 239629_at 222880_at 224229_s_at
> "RIPK1" "TNFRSF10D"
> "TNFRSF10A" "CFLAR" "CFLAR" "AKT3" "AKT3"
> 242876_at 227553_at 227645_at 229415_at 244546_at
> "AKT3" "PIK3R5" "PIK3R5" "CYCS" "CYCS"
>
>
> > sessionInfo()
> R version 2.7.1 (2008-06-23)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=Dutch_Belgium.1252;LC_CTYPE=Dutch_Belgium.1252;LC_MONETARY=Dutch_Belgium.1252;LC_NUMERIC=C;LC_TIME=Dutch_Belgium.1252
>
> attached base packages:
> [1] tools stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] annotate_1.18.0 xtable_1.5-2 KEGG.db_2.2.0
> [4] hgu133b.db_2.2.0 AnnotationDbi_1.2.2 RSQLite_0.6-9
> [7] DBI_0.2-4 affy_1.18.2 preprocessCore_1.2.0
> [10] affyio_1.8.0 Biobase_2.0.1
>
> /////////////////////////////////////////Archive postings on the
> subject July 2008
> getSYMBOL in package annotate is a nice way to handle this.
>
> I found it easier, at least.
>
> Cheers
>
> Tom
> On Jul 3, 2008, at 4:31 PM, Peter Robinson wrote:
>
> > Dear all,
> >
> > I have a list of affymetrix probeset ids from another program and
> > would like to use annaffy to extract the corresponding gene names.
> > I am still something of a novice at R and am probably doing
> > something silly, but found no answer in the package vignette. My
> > script:
> >
> >
> > library(annaffy)
> >
> > dat <- read.table('sign.txt.cdt',header=T)
> > psets<-dat[,3]
> > symbols<-aafSymbol(as.character(psets),"moe430b.db")
> > s<-as.character(symbols)
> >
> > I was surprisied that so few of the probeset ids got identified by
> > this script. WHat am I doing wrong?
> >
> > THanks Peter
> > s<-as.character(symbols)
> > > s
> > [1] "character(0)" "character(0)" "character(0)"
> > [4] "character(0)" "character(0)" "character(0)"
> > [7] "character(0)" "character(0)" "character(0)"
> > [10] "character(0)" "character(0)" "character(0)"
> > [13] "character(0)" "character(0)" "Egr3"
> > [16] "character(0)" "character(0)" "character(0)"
> > [19] "character(0)" "character(0)" "character(0)"
> > [22] "character(0)" "character(0)" "character(0)"
> > [25] "Irak2" "character(0)" "Coq10b"
> > [28] "character(0)" "BC063749" "character(0)"
> > [31] "4631422O05Rik" "character(0)" "Coq10b"
> > [34] "character(0)" "character(0)" "AI452195"
> > [37] "character(0)" "character(0)" "character(0)"
> > [40] "Mobkl2a" "character(0)" "character(0)"
> >
> > (...snip....)
> >
>
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, MS
Biostatistician
UMCCC cDNA and Affymetrix Core
University of Michigan
1500 E Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
More information about the Bioconductor
mailing list