[BioC] Affymetrix mouse 430_2 array - annotation problem
Rao,Xiayu
XRao at mdanderson.org
Tue Jul 22 18:15:44 CEST 2014
Hi, Jim
Thanks a lot for your previous helps! I now have the annotation problems.
I used select to annotate as you suggested me to do.
> fData(eset) <- select(mouse4302.db, featureNames(eset),c("SYMBOL","GENENAME","ENTREZID"))
Warning message:
In .generateExtraRows(tab, keys, jointype) :
'select' resulted in 1:many mapping between keys and return rows
(1) Regarding the warning message, I read in the forum that you suggested to remove the duplicates or collapse them to comma-separated vectors and then incorporate. So for my condition, should I do
fData(eset) <- fData(eset)[!duplicated(fData(eset)$PROBEID),]
OR
eset2 <- tapply(fData(eset)$ENTREZID, fData(eset)[,1], paste, collapse = ",")
OR
Can I just ignore the warning and do nothing, as I want to leave everything there as generated by select()??
(2) It is strange to see that for the topTable, the row names and the first column (PROBEID) do not match. As you can see below, 1436717_x_at and 1435289_at are different for the 1st row. Why?
> topTableF(fit2, adjust="BH")
PROBEID SYMBOL GENENAME ENTREZID M129.15-M129.13
1436717_x_at 1435289_at Engase endo-beta-N-acetylglucosaminidase 217364 -1.946299
1436823_x_at 1435390_at Eri2 exoribonuclease 2 71151 -1.975441
M129.17-M129.15 AveExpr F P.Value adj.P.Val
1436717_x_at -6.32963614 11.009177 3145.6769 8.379499e-17 3.499204e-12
1436823_x_at -6.46817108 10.999412 2832.7874 1.551719e-16 3.499204e-12
Thanks,
Xiayu
-----Original Message-----
From: James W. MacDonald [mailto:jmacdon at uw.edu]
Sent: Monday, July 21, 2014 11:43 AM
To: Rao,Xiayu; 'bioconductor at r-project.org'
Subject: Re: [BioC] Affymetrix mouse 430_2 array - gene expression and annotation
Hi Xiayu,
> 2) and add annotation thereafter? For the transcript level annotation,
> I have used the following code before. But not sure for this mouse
> array, is there a similar way or similar transcript database to do
> such? I know there is a database called mouse4302.db.
> ID <- featureNames(geneCore2) Symbol <-
> getSYMBOL(ID,"hugene10sttranscriptcluster.db") fData(geneCore2) <-
> data.frame(ID=ID,Symbol=Symbol)
This is an old way of annotating things, and has been superceded (for like five years now) by a more compact API:
fData(geneCore2) <- select(mouse4302.db, featureNames(geneCore2), "SYMBOL")
And note you can add in other more useful things like the Gene ID as well (while biologists tend to like HUGO symbols, they are not, as advertized, actually unique things, so you always run the risk of thinking you have <a gene you care about> when in fact you are looking at the data for <some other gene with the same HUGO symbol>).
fData(geneCore2) <- select(mouse4302.db, featureNames(geneCore2),
c("SYMBOL","GENENAME","ENTREZID"))
Best,
Jim
More information about the Bioconductor
mailing list