[BioC] GO annotation in R

Tue Nov 6 20:57:17 CET 2012

Hi Priya,

If I assume that your initial extraction worked (and that you didn't 
just slice out a column of scores).   Then you should have something 
like this:

ids <- c("244901","244902","244903")

Now, right off the bat, those IDs are not probe IDs, and they are not 
standard TAIR IDs either.  So what are they?  I can't assume that they 
are entrez gene IDs , because even though they look like entrez gene 
IDs, they actually map to mouse (or at least these 1st three do).  So 
it's hard for me to help you with these IDs.  But for the sake of giving 
you some kind of answer that may help you, I will continue on.

## So lets suppose that you did have some real probe IDs, and that since 
you are working on arabidopsis, you have probe IDs like this:

ids <- c("261585_at","261568_at","261584_at")

## And then lets suppose that you wanted to get the GO IDs, the TAIR IDs 
and gene names.  Well then I could just use select() like this:

res1 <- select(ath1121501.db, keys= ids, cols=c("GO","TAIR","GENENAME"), 
keytype="PROBEID")
res1

## And then separately, I could use the GO.db package to also lookup the 
term names for these GOIDs

library(GO.db)
res2 <- select(GO.db, keys = res1$GO, cols="TERM", keytype="GOID")
res2

## And then you could merge the two results together like this
## (please note that whenever you use merge you should try to specify 
the merge columns)

res3 <- merge(res1, res2, by.x="GO", by.y="GOID")
res3

Anyhow, I hope this helps you, please let me know if it doesn't.

   Marc

On 11/06/2012 06:15 AM, priya [guest] wrote:
> I have the matrix as follows :
>
> probes GSM362180    GSM362181  GSM362188    GSM362189  GSM362192
> 244901 5.094871713 4.626623079 4.554272515 4.748604391 4.759221647
> 244902 5.194528083 4.985930299 4.817426064 5.151654407 4.838741605
> 244903 5.412329253 5.352970877 5.06250609  5.305709079 8.365082403
> 244904 5.529220594 5.28134657  5.467445095 5.62968933  5.458388909
> 244905 5.024052699 4.714631878 4.792865831 4.843975286 4.657188246
> 244906 5.786557533 5.242403911 5.060605782 5.458148567 5.890061836
>
> I would like to extract only the first column as follows :
> ids<- scr[,1]
> and then
>
> biocLite("GO.db")
> library("AnnotationDbi")
> biocLite("org.At.tair.db")
> biocLite("ath1121501.db")
> library("ath1121501.db")
> genenames<-  org.At.tairGENENAME[ids]
> number<-org.At.tairENTREZID[ids]
> xx<-toTable(entrez)
> yy<-toTable(number)
> complete<-merge(xx,yy)
>
> I get an error in this step and unable to proceed further. Is it because ids<- scr[,1] is a factor ?
>
> Iam not sure how to store the id names to carry out the annotation correctly .I would like to use GO.db to find the Terms associated with the go Ids, displaying the result as a data frame with my probes and their corresponding TAIR ID and TAIR genename and annotation.
>
>
>   -- output of sessionInfo():
>
> R version 2.15
> Linux.
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor