[BioC] Gene Ontology: Shortest path from root to node

Marc Carlson mcarlson at fhcrc.org
Tue Jan 15 20:35:31 CET 2013


A few more comments on this.  Because to answer your questions you can 
also use some of the nice graph stuff in the project:

##  So you can load a few useful libraries
library(GO.db)  ## for GO
library(graph)  ## for basic graph containers etc.
library(RBGL)  ## for algorithms to compute distance etc.


## 1st lets just make the PARENTS object into a nice tabular format with 
the toTable() method like this:

xx = toTable(GOBPPARENTS)

## For now lets assume that we don't care what kind of relationship is 
represented, and just make that whole table into a graph with all edge 
weights set = 1.
## The reason I am setting the weights to 1 is so that later on I can 
more easily compute the distances.

gg = ftM2graphNEL(as.matrix(xx[, 1:2]), W=rep(1,dim(xx)[1]))  ## 'from / 
to' columns, with weights 'W' all set to 1

## And if you want to visualize your graph it you can use Rgraphviz
library(Rgraphviz)  ## for plotting graph objects

## But lets not draw everything.  Instead lets grab a subgraph 
containing only the nodes that we care about...

sg <- subGraph(c(get("GO:0006955", GOBPANCESTOR), "GO:0006955"), gg)

## Then plot it

plot(sg)


## And then we can use tools from RBGL to compute distance in terms of 
the number of edges...

dijkstra.sp(sg, "GO:0006955")$distances["GO:0008150"]

dijkstra.sp(sg, "GO:0006955")$distances["all"]



Hope this helps,


   Marc



On 01/14/2013 11:19 AM, Marc Carlson wrote:
> Also I forgot to mention,
>
> You can get the immediate parent to any term by using the appropriate 
> PARENTS object.  So for example:
>
> get("GO:0006955", GOBPPARENTS)
>
> Will tell you that there are TWO immediate parents for this term.  
> Which is the situation I was describing earlier.
>
>
>   Marc
>
>
> On 01/14/2013 11:09 AM, Marc Carlson wrote:
>> Hi Nicos,
>>
>> You could use the GO.db package to get at this.  In there you will 
>> find an object called GOBPANCESTOR which acts like a classic R 
>> environment object and can be used with the get() method to pull out 
>> the ancestor terms of a given term all the way back to the root.
>>
>> So for your example you could have done this:
>>
>> library(GO.db)
>> get("GO:0008150", GOBPANCESTOR)
>>
>> And you can see that the only ancestor to this term is in fact the 
>> root node: "all"
>>
>>
>> What about terms further down?  Well the same trick works for all the 
>> terms to get their ancestor terms:
>> get("GO:0006955", GOBPANCESTOR)
>>
>>
>>
>> So you probably want to do something a bit like this:
>>
>> length(get("GO:0006955", GOBPANCESTOR))
>>
>> And (for example) compare that to:
>>
>> length(get("GO:0008150", GOBPANCESTOR))
>>
>> etc.
>>
>>
>> Of course it's all a little bit more complicated than that because 
>> the gene ontologies are actually DAGs (so terms can have more than 
>> one route back to the main node), and so your ancestors list may be 
>> longer than just the simple path back to the "all" node.  And in fact 
>> in the example I gave above this is true for the further down term 
>> "GO:0006955", which has two routes back to the main node, and hence 
>> it's "distance" (as hinted at by length) has been inflated by one in 
>> this case.
>>
>>
>> Anyhow, I hope this helps,
>>
>>
>>   Marc
>>
>>
>>
>>
>>
>> On 01/14/2013 07:47 AM, WoA [guest] wrote:
>>> Given some GO BP terms for a gene I wish to find out, which of the 
>>> terms has more specific meaning. I wish to find out the length of 
>>> the shortest path between the BP Root term(GO:0008150) and the given 
>>> term. Is there any suitable way to do that using any R package?
>>>
>>> Like something equivalent to:
>>> my $length = $node->lengthOfShortestPathToRoot;
>>>
>>> in Perl's "GO-TermFinder" package.
>>>
>>> Thanks in advance
>>>
>>>   -- output of sessionInfo():
>>>
>>>> sessionInfo()
>>> R version 2.13.1 (2011-07-08)
>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>
>>> locale:
>>> [1] LC_COLLATE=English_United States.1252
>>> [2] LC_CTYPE=English_United States.1252
>>> [3] LC_MONETARY=English_United States.1252
>>> [4] LC_NUMERIC=C
>>> [5] LC_TIME=English_United States.1252
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> -- 
>>> Sent via the guest posting facility at bioconductor.org.
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: 
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list