[BioC] How to cluster genes based on their GO term in Arabidopsis?
James W. MacDonald
jmacdon at med.umich.edu
Fri Jun 25 15:31:32 CEST 2010
Hi Rae,
Xiaohui Wu wrote:
> Hi all,
>
> I want to cluster some genes based on their GO term annotation information in Arabidopsis and plot the profile. I tried some packages in Bioconductor including 'goProfiles', 'org.Hs.eg.db', but it says "No ancestors found for this GOTermsList ".
>
> The code I tested is as follows:
> ----------------------
> require(goProfiles)
> require(org.At.tair.db)
>
> #first, transform tair ID to entrez ID
> x <- org.At.tairENTREZID
> mapped_genes <- mappedkeys(x)
> tairENID <- as.list(x[mapped_genes])
>
> #then, plot the profile
> arab.MF <- basicProfile(genelist=tairENID, onto = "MF", level = 2, orgPackage = "org.At.tair.db")
> ---------------------
>
> For the example of 'basicProfile' in goProfiles, it works, but it is for human, the code is:
> ----------------------
> require(goProfiles)
> data(prostateIds)
> welsh.MF <- basicProfile(welsh01EntrezIDs, onto = "MF", level = 2, orgPackage = "org.Hs.eg.db")
> singh.MF <- basicProfile(singh01EntrezIDs, onto = "MF",level = 2, orgPackage = "org.Hs.eg.db")
> welsh.singh.MF <- mergeProfilesLists(welsh.MF,singh.MF, profNames = c("Welsh", "Singh"))
> printProfiles(welsh.singh.MF, percentage = TRUE)
> plotProfiles(welsh.MF, aTitle = "Welsh (2001). Prostate cancer data")
> ----------------------
>
> Can you see what the problem is? Or is there any other way to cluster the genes and plot the profile?
Yes. The help page for this function is in error. The argument
'genelist' isn't actually supposed to be a list, even though you might
surmise that from the argument name, and be confirmed by the argument
description:
genelist: List of genes on which the Profile has to be based
If you look at the example you show above
> class(singh01EntrezIDs)
[1] "character"
and the example for basicProfile
> class(CD4LLids)
[1] "character"
> head(CD4LLids)
[1] "10160" "10461" "10579" "10611" "10629" "10786"
You can see that the input is actually supposed to be a character
vector. Or maybe a list of character vectors (not really sure, and don't
have the time or inclination to dig deeper), but certainly not a list of
character vectors of length one.
If I replace tairENID with mapped.genes from your code above, I get
> print(arab.MF)
$MF
Description GOID Frequency
7 antioxidant activity GO:0016209 132
4 binding GO:0005488 10036
1 catalytic activity GO:0003824 7767
8 channel regulator activity GO:0016247 0
13 chemoattractant activity GO:0042056 0
15 chemorepellent activity GO:0045499 0
5 electron carrier activity GO:0009055 484
10 enzyme regulator activity GO:0030234 332
9 metallochaperone activity GO:0016530 4
17 molecular transducer activity GO:0060089 374
16 nutrient reservoir activity GO:0045735 58
6 proteasome regulator activity GO:0010860 0
12 protein tag GO:0031386 4
2 structural molecule activity GO:0005198 499
11 transcription regulator activity GO:0030528 1886
14 translation regulator activity GO:0045182 5
3 transporter activity GO:0005215 1216
Best,
Jim
>
> Thanking you,
>
> Rae
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
More information about the Bioconductor
mailing list