[BioC] How to go from affymetrix to Ensembl transcript IDs
Peter Robinson
peter.robinson at t-online.de
Thu Apr 9 23:40:39 CEST 2009
Hi all,
sorry if this is a dumb question, but rtfm has not helped so far.
I would like to get the Ensembl transcript IDs that correspond to
affymetrix probeset ids using biomaRt. As a test case, I am using the
ALL data set from bioconductor. My code:
library("biomaRt")
library("ALL")
data("ALL") ## Note this dataset uses hgu95av2 Affymetrix chip
dat <- exprs(ALL)
affyids = rownames(dat)
## get mapping data from Ensembl via bioMaRt
ensembl <- useMart("ensembl")
ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl)
mapping <- getBM(attributes = c("affy_hg_u95av2",
"ensembl_transcript_id"), filters = "affy_hg_u95av2",
values = affyids, mart = ensembl)
Here is where the problem is. The "mapping" seems to be a random
collection of transcript IDs.
> which(mapping=="32337_at")
[1] 8 46 139 155 203 267 320 327 7385 8701 18769 20533
[13] 23728 23969 23972 24241 24242 24243 24244 25236 26157 26204 26218 26231
[25] 26240 26321 26404
> mapping[which(mapping=="32337_at"),]
affy_hg_u95av2 ensembl_transcript_id
8 32337_at ENST00000404812
46 32337_at ENST00000393574
139 32337_at ENST00000403842
155 32337_at ENST00000397467
203 32337_at ENST00000407990
267 32337_at ENST00000399007
320 32337_at ENST00000404500
327 32337_at ENST00000399891
7385 32337_at ENST00000396599
8701 32337_at ENST00000403916
18769 32337_at ENST00000334328
20533 32337_at ENST00000377603
23728 32337_at ENST00000401418
23969 32337_at ENST00000046640
23972 32337_at ENST00000381870
24241 32337_at ENST00000326092
24242 32337_at ENST00000319826
24243 32337_at ENST00000272274
24244 32337_at ENST00000311549
25236 32337_at ENST00000404512
26157 32337_at ENST00000404609
26204 32337_at ENST00000402713
26218 32337_at ENST00000401464
26231 32337_at ENST00000407389
26240 32337_at ENST00000406161
26321 32337_at ENST00000402658
26404 32337_at ENST00000401595
At the end of the day, I would like to write the data matrix as a CSV
file for further analysis, whereby the affy ID is replaced by an Ensembl
ID.
Thanks Peter
More information about the Bioconductor
mailing list