[BioC] How to cluster genes based on their GO term in Arabidopsis?

Fri Jun 25 15:31:32 CEST 2010

Hi Rae,

Xiaohui Wu wrote:
> Hi all,
> 
> I want to cluster some genes based on their GO term annotation information in Arabidopsis and plot the profile. I tried some packages in Bioconductor including 'goProfiles', 'org.Hs.eg.db', but it says "No ancestors found for this GOTermsList ".
> 
> The code I tested is as follows:
> ----------------------
> require(goProfiles)
> require(org.At.tair.db)
> 
> #first, transform tair ID to entrez ID
> x <- org.At.tairENTREZID
> mapped_genes <- mappedkeys(x) 
> tairENID <- as.list(x[mapped_genes])
> 
> #then, plot the profile
> arab.MF <- basicProfile(genelist=tairENID, onto = "MF", level = 2, orgPackage = "org.At.tair.db")
> ---------------------
> 
> For the example of 'basicProfile' in goProfiles, it works, but it is for human, the code is:
> ----------------------
> require(goProfiles)
> data(prostateIds)
> welsh.MF <- basicProfile(welsh01EntrezIDs, onto = "MF", level = 2, orgPackage = "org.Hs.eg.db")
> singh.MF <- basicProfile(singh01EntrezIDs, onto = "MF",level = 2, orgPackage = "org.Hs.eg.db")
> welsh.singh.MF <- mergeProfilesLists(welsh.MF,singh.MF, profNames = c("Welsh", "Singh"))
> printProfiles(welsh.singh.MF, percentage = TRUE)
> plotProfiles(welsh.MF, aTitle = "Welsh (2001). Prostate cancer data")
> ----------------------
> 
> Can you see what the problem is? Or is there any other way to cluster the genes and plot the profile? 

Yes. The help page for this function is in error. The argument 
'genelist' isn't actually supposed to be a list, even though you might 
surmise that from the argument name, and be confirmed by the argument 
description:

genelist: List of genes on which the Profile has to be based

If you look at the example you show above

 > class(singh01EntrezIDs)
[1] "character"

and the example for basicProfile

 > class(CD4LLids)
[1] "character"
 > head(CD4LLids)
[1] "10160" "10461" "10579" "10611" "10629" "10786"

You can see that the input is actually supposed to be a character 
vector. Or maybe a list of character vectors (not really sure, and don't 
have the time or inclination to dig deeper), but certainly not a list of 
character vectors of length one.

If I replace tairENID with mapped.genes from your code above, I get

 > print(arab.MF)
$MF
                         Description       GOID Frequency
7              antioxidant activity GO:0016209       132
4                           binding GO:0005488     10036
1                catalytic activity GO:0003824      7767
8        channel regulator activity GO:0016247         0
13         chemoattractant activity GO:0042056         0
15          chemorepellent activity GO:0045499         0
5         electron carrier activity GO:0009055       484
10        enzyme regulator activity GO:0030234       332
9         metallochaperone activity GO:0016530         4
17    molecular transducer activity GO:0060089       374
16      nutrient reservoir activity GO:0045735        58
6     proteasome regulator activity GO:0010860         0
12                      protein tag GO:0031386         4
2      structural molecule activity GO:0005198       499
11 transcription regulator activity GO:0030528      1886
14   translation regulator activity GO:0045182         5
3              transporter activity GO:0005215      1216

Best,

Jim

> 
> Thanking you,
> 
> Rae
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues