[BioC] clustering genes in GO categories
James MacDonald
jmacdon at med.umich.edu
Thu Jan 6 22:09:50 CET 2011
Hi Assa,
I don't think you need a package for that. A call to tapply() followed by a call to do.call() should get you where you want to go.
Say you read your table into R, and call it 'dat'.
thelist <- tapply(1:nrow(dat), dat$GOMF, function(x) dat[x, 3])
then you will have a list, with the names being the GOMF and the list items being all the gene ids. Collapsing that to a matrix is difficult because you will have different numbers of columns. So you can either collapse all the list items using commas, or directly write out to a file. Collapsing with commas is easy:
commalist <- lapply(thelist, paste, collapse = ",")
avector <- do.call("c", commalist)
names(vector) <- names(commalist)
or you could just write out to a file using something like
con <- file("mydata.txt", "w")
for(i in seq(along = commalist)) cat(names(commalist)[i], commalist[[i]], "\n", sep = "\t", file = con)
close(con)
All untested, so you might have to fiddle a bit to get the results you want.
Best,
Jim
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
>>> Assa Yeroslaviz 01/06/11 1:02 PM >>>
Hi, everybody,
I was wondering whether there is a package to cluster a list of genes to
different GO categories
my problem is as such:
i have a list of genes (a tab delimited file):
id flybasename_gene flybase_gene_id entrezgene GOMF
1616608_a_at Gpdh FBgn0001128 33824 carboxylesterase activity
hydrolase activity 3',5'-cyclic-nucleotide phosphodiesterase activity
protein binding
1622892_s_at CG33057 FBgn0053057 318833 nucleotide binding
protein binding ATP binding chaperone binding ammonium
transmembrane transporter activity
1622892_s_at mkg-p FBgn0035889 38955 nucleotide binding
protein binding ATP binding chaperone binding ammonium
transmembrane transporter activity
1622893_at IM3 FBgn0040736 50209 aminopeptidase activity
metalloexopeptidase activity hydrolase activity manganese ion bindin
1622894_at CG15120 FBgn0034454 37248 protein binding
I would like to try and group the genes in various GO categories, which are
mentioned here in the last columns. The GO categories take more than one
column and the number is not equal in each line, deending on the depth of
the annotation for each gene.
Is there a way of transforming the table, so that I in the first column a
list of my GO categories and than on each line a list with gene IDs (the
right ID are not important as I can change them as I wish).
I would like to have something like that:
GO genes
protein binding FBgn0001128 FBgn0053057 FBgn0035889 etc.
ammonium transmembrane transporter activity FBgn0053057 FBgn0035889
hydrolayse activity FBgn0040736 FBgn0001128
I would appriciate any kind of help or ideas
Thanks
Assa
[[alternative HTML version deleted]]
_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
More information about the Bioconductor
mailing list