[BioC] GO annotation in R
Marc Carlson
mcarlson at fhcrc.org
Tue Nov 6 20:57:17 CET 2012
Hi Priya,
If I assume that your initial extraction worked (and that you didn't
just slice out a column of scores). Then you should have something
like this:
ids <- c("244901","244902","244903")
Now, right off the bat, those IDs are not probe IDs, and they are not
standard TAIR IDs either. So what are they? I can't assume that they
are entrez gene IDs , because even though they look like entrez gene
IDs, they actually map to mouse (or at least these 1st three do). So
it's hard for me to help you with these IDs. But for the sake of giving
you some kind of answer that may help you, I will continue on.
## So lets suppose that you did have some real probe IDs, and that since
you are working on arabidopsis, you have probe IDs like this:
ids <- c("261585_at","261568_at","261584_at")
## And then lets suppose that you wanted to get the GO IDs, the TAIR IDs
and gene names. Well then I could just use select() like this:
res1 <- select(ath1121501.db, keys= ids, cols=c("GO","TAIR","GENENAME"),
keytype="PROBEID")
res1
## And then separately, I could use the GO.db package to also lookup the
term names for these GOIDs
library(GO.db)
res2 <- select(GO.db, keys = res1$GO, cols="TERM", keytype="GOID")
res2
## And then you could merge the two results together like this
## (please note that whenever you use merge you should try to specify
the merge columns)
res3 <- merge(res1, res2, by.x="GO", by.y="GOID")
res3
Anyhow, I hope this helps you, please let me know if it doesn't.
Marc
On 11/06/2012 06:15 AM, priya [guest] wrote:
> I have the matrix as follows :
>
> probes GSM362180 GSM362181 GSM362188 GSM362189 GSM362192
> 244901 5.094871713 4.626623079 4.554272515 4.748604391 4.759221647
> 244902 5.194528083 4.985930299 4.817426064 5.151654407 4.838741605
> 244903 5.412329253 5.352970877 5.06250609 5.305709079 8.365082403
> 244904 5.529220594 5.28134657 5.467445095 5.62968933 5.458388909
> 244905 5.024052699 4.714631878 4.792865831 4.843975286 4.657188246
> 244906 5.786557533 5.242403911 5.060605782 5.458148567 5.890061836
>
> I would like to extract only the first column as follows :
> ids<- scr[,1]
> and then
>
> biocLite("GO.db")
> library("AnnotationDbi")
> biocLite("org.At.tair.db")
> biocLite("ath1121501.db")
> library("ath1121501.db")
> genenames<- org.At.tairGENENAME[ids]
> number<-org.At.tairENTREZID[ids]
> xx<-toTable(entrez)
> yy<-toTable(number)
> complete<-merge(xx,yy)
>
> I get an error in this step and unable to proceed further. Is it because ids<- scr[,1] is a factor ?
>
> Iam not sure how to store the id names to carry out the annotation correctly .I would like to use GO.db to find the Terms associated with the go Ids, displaying the result as a data frame with my probes and their corresponding TAIR ID and TAIR genename and annotation.
>
>
> -- output of sessionInfo():
>
> R version 2.15
> Linux.
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list