[BioC] goTools: ontoCompare question
Paquet, Agnes
apaquet at medsfgh.ucsf.edu
Sat Jun 30 04:34:04 CEST 2007
Hi Dave,
The current algorithm in ontoCompare is the following:
- for each probe id in your list, retrieve all GO ids corresponding to this probe id
- then, map these Go ids up to the end nodes provided as argument to the function (or the default ones)
- Once the mapping is finished, add 1 to the count of each end node which was reached at least once (and not the number of times a node was hit, which explains the discrepancy in your example)
For example, if I use only 1 Affy probe, and restrict everything to MF to simplify your example, ontoCompare will give me the following results;
1) using the default end nodes:
> ontoCompare(list("1415670_at"),probeType="mouse4302",goType="MF",method="none")
[1] "Starting ontoCompare..."
[1] "Number of lists = 1"
[1] "Using method: none"
binding structural molecule activity transporter activity NotFound
1 1 1 0
(we have 1 count for each end node which was reached at least once)
2) Using your endlist
> ontoCompare(list("1415670_at"),probeType="mouse4302",goType="MF",method="none",endnode=endlist)
[1] "Starting ontoCompare..."
[1] "Number of lists = 1"
[1] "Using method: none"
molecular_function NotFound
1 0
(same here, only 1 count for MF, and not 3)
We made this choice because some nodes/probes may be more annotated than others, and it could make the relative comparison of 2 lists of probes appear more different based on the availability of annotations, and not true biological difference. You could also use the other methods to get number of hits relative to the number of probes or the number of GO in your list.
I hope this will help, don't hesitate to email me again if you have more questions.
Best,
Agnes
________________________________
From: bioconductor-bounces at stat.math.ethz.ch on behalf of davidl at unr.nevada.edu
Sent: Fri 6/29/2007 8:01 AM
To: Bioconductor
Subject: [BioC] goTools: ontoCompare question
Hello,
I ran ontoCompare on the full list of probes in the mouse4302 genechip both
with the default EndNodeList() and with a custom end node list containing only
the antioxidant activity, biological_process, cellular_component, and
molecular_function GO terms and found what appears to be a discrepency:
> length(sviData$svi$ID)
[1] 45101
> sviData$svi$ID[1:5]
[1] "1452670_at" "1422340_a_at" "1452114_s_at" "1422644_at" "1423359_at"
> listall<-list("allprobes"=sviData$svi$ID)
> endlist<-c("GO:0003674", "GO:0005575", "GO:0008150", "GO:0016209")
> totalAnnotations<-ontoCompare(listall, probeType="mouse4302", method="none")
> write.table(totalAnnotations, file="totalAnnotations.txt")
> totalAnnotations2<-ontoCompare(listall, probeType="mouse4302", method="none",
endnode=endlist)
> write.table(totalAnnotations2, file="totalAnnotations_reduced.txt")
When finding the total possible number of annotations for the top level GO terms
(BP, MF, CC), I got different numbers for the two approaches, but I got the
same numbers for "NotFound" and "antioxidant activity":
from totalAnnotations.txt
antioxidant activity 127
biological_process 2594
cellular_component 2365
molecular_function 2414
NotFound 11120
...others
from totalAnnotations_reduced.txt
antioxidant activity 127
biological_process 28020
cellular_component 28509
molecular_function 30875
NotFound 11120
I was just wondering if anyone knew why this might happen since it affects the
interpretation of a comparison I was going to do. These data appear to reflect
the histogram output from ontoPlot (so I don't think its an R->txt->excel
thing). Is the output with method="none" the total number of times all probes
are annotated at the endnode or at a child of the end node? Does it have
something to do with the "isa" values in EndNodeList() or my method of creating
endlist?
R v.2.5.0
goTools v1.8.0
Cheers,
Dave
--and thank you Dick for recommending topGO. I found what I needed through that
package.
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list