[BioC] Gene Ontology: Shortest path from root to node
Marc Carlson
mcarlson at fhcrc.org
Tue Jan 15 20:35:31 CET 2013
A few more comments on this. Because to answer your questions you can
also use some of the nice graph stuff in the project:
## So you can load a few useful libraries
library(GO.db) ## for GO
library(graph) ## for basic graph containers etc.
library(RBGL) ## for algorithms to compute distance etc.
## 1st lets just make the PARENTS object into a nice tabular format with
the toTable() method like this:
xx = toTable(GOBPPARENTS)
## For now lets assume that we don't care what kind of relationship is
represented, and just make that whole table into a graph with all edge
weights set = 1.
## The reason I am setting the weights to 1 is so that later on I can
more easily compute the distances.
gg = ftM2graphNEL(as.matrix(xx[, 1:2]), W=rep(1,dim(xx)[1])) ## 'from /
to' columns, with weights 'W' all set to 1
## And if you want to visualize your graph it you can use Rgraphviz
library(Rgraphviz) ## for plotting graph objects
## But lets not draw everything. Instead lets grab a subgraph
containing only the nodes that we care about...
sg <- subGraph(c(get("GO:0006955", GOBPANCESTOR), "GO:0006955"), gg)
## Then plot it
plot(sg)
## And then we can use tools from RBGL to compute distance in terms of
the number of edges...
dijkstra.sp(sg, "GO:0006955")$distances["GO:0008150"]
dijkstra.sp(sg, "GO:0006955")$distances["all"]
Hope this helps,
Marc
On 01/14/2013 11:19 AM, Marc Carlson wrote:
> Also I forgot to mention,
>
> You can get the immediate parent to any term by using the appropriate
> PARENTS object. So for example:
>
> get("GO:0006955", GOBPPARENTS)
>
> Will tell you that there are TWO immediate parents for this term.
> Which is the situation I was describing earlier.
>
>
> Marc
>
>
> On 01/14/2013 11:09 AM, Marc Carlson wrote:
>> Hi Nicos,
>>
>> You could use the GO.db package to get at this. In there you will
>> find an object called GOBPANCESTOR which acts like a classic R
>> environment object and can be used with the get() method to pull out
>> the ancestor terms of a given term all the way back to the root.
>>
>> So for your example you could have done this:
>>
>> library(GO.db)
>> get("GO:0008150", GOBPANCESTOR)
>>
>> And you can see that the only ancestor to this term is in fact the
>> root node: "all"
>>
>>
>> What about terms further down? Well the same trick works for all the
>> terms to get their ancestor terms:
>> get("GO:0006955", GOBPANCESTOR)
>>
>>
>>
>> So you probably want to do something a bit like this:
>>
>> length(get("GO:0006955", GOBPANCESTOR))
>>
>> And (for example) compare that to:
>>
>> length(get("GO:0008150", GOBPANCESTOR))
>>
>> etc.
>>
>>
>> Of course it's all a little bit more complicated than that because
>> the gene ontologies are actually DAGs (so terms can have more than
>> one route back to the main node), and so your ancestors list may be
>> longer than just the simple path back to the "all" node. And in fact
>> in the example I gave above this is true for the further down term
>> "GO:0006955", which has two routes back to the main node, and hence
>> it's "distance" (as hinted at by length) has been inflated by one in
>> this case.
>>
>>
>> Anyhow, I hope this helps,
>>
>>
>> Marc
>>
>>
>>
>>
>>
>> On 01/14/2013 07:47 AM, WoA [guest] wrote:
>>> Given some GO BP terms for a gene I wish to find out, which of the
>>> terms has more specific meaning. I wish to find out the length of
>>> the shortest path between the BP Root term(GO:0008150) and the given
>>> term. Is there any suitable way to do that using any R package?
>>>
>>> Like something equivalent to:
>>> my $length = $node->lengthOfShortestPathToRoot;
>>>
>>> in Perl's "GO-TermFinder" package.
>>>
>>> Thanks in advance
>>>
>>> -- output of sessionInfo():
>>>
>>>> sessionInfo()
>>> R version 2.13.1 (2011-07-08)
>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>
>>> locale:
>>> [1] LC_COLLATE=English_United States.1252
>>> [2] LC_CTYPE=English_United States.1252
>>> [3] LC_MONETARY=English_United States.1252
>>> [4] LC_NUMERIC=C
>>> [5] LC_TIME=English_United States.1252
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> --
>>> Sent via the guest posting facility at bioconductor.org.
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list