[BioC] How to go from affymetrix to Ensembl transcript IDs
Steve Lianoglou
mailinglist.honeypot at gmail.com
Fri Apr 10 00:01:03 CEST 2009
Hi Peter,
On Apr 9, 2009, at 5:40 PM, Peter Robinson wrote:
> Hi all,
>
> sorry if this is a dumb question, but rtfm has not helped so far.
>
> I would like to get the Ensembl transcript IDs that correspond to
> affymetrix probeset ids using biomaRt. As a test case, I am using
> the ALL data set from bioconductor. My code:
>
>
> library("biomaRt")
> library("ALL")
> data("ALL") ## Note this dataset uses hgu95av2 Affymetrix chip
>
> dat <- exprs(ALL)
> affyids = rownames(dat)
>
>
> ## get mapping data from Ensembl via bioMaRt
> ensembl <- useMart("ensembl")
> ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl)
>
> mapping <- getBM(attributes = c("affy_hg_u95av2",
> "ensembl_transcript_id"), filters = "affy_hg_u95av2",
> values = affyids, mart = ensembl)
>
>
>
> Here is where the problem is. The "mapping" seems to be a random
> collection of transcript IDs.
Your query is right, so ... your results are not random. You can
double check by trying the small example in the ?getBM help.
Anyway: that probe looks a-weird one. Even affy maps it to several
locations. See:
https://www.affymetrix.com/analysis/netaffx/fullrecord.affx?pk=HG-U95AV2%3A32337_AT
#a_ensembl
You will need an Affy NetAffx account to see that. Some relevant stats
from that page are that the probe maps to 6 different ensembl IDs.
It even aligns to two different places:
chr13:26725913-26728689(+)
chr10:122104175-122104685(-)
You'll probably find this for many probes, so you'll need some policy
to deal with that.
Hope that helps,
-steve
--
Steve Lianoglou
Graduate Student: Physiology, Biophysics and Systems Biology
Weill Medical College of Cornell University
http://cbio.mskcc.org/~lianos
More information about the Bioconductor
mailing list