[BioC] Bioconductor Digest, Vol 83, Issue 4
Lavinia Gordon
lavinia.gordon at mcri.edu.au
Thu Jan 7 02:49:39 CET 2010
Hi Waverley
In your input you also have Ensembl protein ids (e.g. ENSP00000231338). You
could extract these, use them as your input list, use biomaRt to match these
up with molecular function GO ids, calculate the frequency of the ids, e.g
molecular_function%binding%protein binding 10
molecular_function%molecular transducer activity 6
then:
pie.mf <- c(10,6,...)
names(pie.mf) <- c("binding%protein binding", "molecular transducer
activity", ...)
pie(pie.mf)
You could also use your SWISS-PROT or TREMBL ids (again, via biomaRt). Note
that genes can often have multiple GO terms associated to them.
Have a look at some of the other Bioconductor GO packages
([1]http://bioconductor.org/packages/2.5/GO.html and
http://www.bioconductor.org/packages/release/bioc/html/topGO.html)which
suggest some other ways of visualizing GOs.
regards
Lavinia Gordon.
Waverley @ Palo Alto wrote:
> Hi,
>
> I have a list of IPI gene IDs. ?I want to find out whether there is a
> package which can map the gene ontology to these IPIs, and plot the
> pie chart to demonstrate the molecular function distributions.
>
> The input is like the following gene IPI IDs:
>
IPI:IPI00008860.1|SWISS-PROT:Q9BXJ4-1|TREMBL:Q542Y2|ENSEMBL:ENSP0000023133
8;EN
>
IPI:IPI00019922.5|SWISS-PROT:Q8N0Y2-1|TREMBL:Q53F81|ENSEMBL:ENSP0000033886
0;ENSP00000375594|REFSEQ:NP_060807|H-INV:HIT000028861|VEGA:OTTHUMP00000078
377
> Tax_Id=9606 Gene_Symbol=ZN
>
IPI:IPI00647423.2|SWISS-PROT:Q8N819-1|REFSEQ:NP_001073870|VEGA:OTTHUMP0000
0076687
> Tax_Id=9606 Gene_Symbol=FLJ40125 Isoform 1 of
>
IPI:IPI00219000.2|SWISS-PROT:P27658|TREMBL:Q53XI6|ENSEMBL:ENSP00000261037|
REFS
>
IPI:IPI00291878.4|SWISS-PROT:P35247|ENSEMBL:ENSP00000361366|REFSEQ:NP_0030
10|H-INV:HIT000039466|VEGA:OTTHUMP00000019944
>
IPI:IPI00013945.1|SWISS-PROT:P07911-1|TREMBL:Q8NHW8|ENSEMBL:ENSP0000030627
9|RE
>
IPI:IPI00000634.1|SWISS-PROT:Q16204|TREMBL:Q6GSG7|ENSEMBL:ENSP00000263102|
REFS
>
> I want to plot the pie chart of these gene distribution in the GO
> molecular function as a pie chart. ?An example is shown in the
> following link
[2]http://www.proteomesci.com/content/7/1/6/figure/F2?highres=y
>
>
> Can some one help?
Not sure that it is this easy. The IPI are protein identifiers. GO
categories classify genes. Neither the mapping from protein to gene or
gene to GO category is 1:1. GO categories form a hierarchy. So there are
significant decisions to be made in representing IPI identifiers in a
pie chart of GO terms.
Bioconductor maintains 'org' and 'GO' database packages that provide the
necessary link between IPI protein ids and GO gene ontology categories,
via ENTREZ gene ids. Code might look like
?## once only, to install packages
?source('http://bioconductor.org/biocLite.R')
?biocLite('org.Hs.eg.db', 'GO.db')
?## from IPI to ENTREZ id, not 1:1
?library(org.Hs.eg.db)
?ipi2eg = revmap(eapply(org.Hs.eg.db, names)) ## NOT 1:1 map
?## Assume ipiIds is, e.g., c('IPI00008860', 'IPI00019922')
?egIds = revmap(ipi2eg[ipiIds])
?## get GO terms, also not 1:1
?goIds = eapply(org.Hs.egGO[names(egIds)], names)
You're still left with the problem of resolving multiple mappings and
the hierarchical relationship between GO terms. Asking on the
Bioconductor mailing list
?[3]http://bioconductor.org/docs/mailList.html
is likely to lead to helpful answers.
Martin
Lavinia Gordon
Research Officer
Bioinformatics
Murdoch Childrens Research Institute
Royal Children's Hospital
Flemington Road Parkville Victoria 3052 Australia
telephone: +61 3 8341 6221
[4]www.mcri.edu.au
This e-mail and any attachments to it (the "Communication") are, unless
otherwise stated, confidential, may contain copyright material and is for
the use only of the intended recipient. If you receive the Communication in
error, please notify the sender immediately by return e-mail, delete the
Communication and the return e-mail, and do not read, copy, retransmit or
otherwise deal with it. Any views expressed in the Communication are those
of the individual sender only, unless expressly stated to be those of
Murdoch Childrens Research Institute (MCRI) ABN 21 006 566 972 or any of its
related entities. MCRI does not accept liability in connection with the
integrity of or errors in the Communication, computer virus, data
corruption, interference or delay arising from or in respect of the
Communication.
Please consider the environment before printing this email
References
1. http://bioconductor.org/packages/2.5/GO.html%20and%20http://www.bioconductor.org/packages/release/bioc/html/topGO.html
2. http://www.proteomesci.com/content/7/1/6/figure/F2?highres=y
3. http://bioconductor.org/docs/mailList.html
4. http://www.mcri.edu.au/
More information about the Bioconductor
mailing list