[Bioc-devel] GSEABase::getOBOCollection() missing children

Robert Castelo robert.castelo at upf.edu
Fri Jun 5 17:51:10 CEST 2015


hi,

importing an OBO file with GSEABase::getOBOCollection() I have observed 
missing children in the imported ontology. Here is an example with the 
Sequence Ontology:

library(GSEABase)

oboSOXP <- 
getOBOCollection("http://sourceforge.net/p/song/svn/HEAD/tree/trunk/so-xp.obo")
Warning message:
In readLines(src) :
   incomplete final line found on 
'http://sourceforge.net/p/song/svn/HEAD/tree/trunk/so-xp.obo'
gSOXP <- as(oboSOXP, "graphNEL")
edges(gSOXP)[["SO:0001622"]]
[1] "SO:0001968"

so the term SO:0001622 in principle has only one child term SO:0001968. 
However, a free text search for this entry in the OBO file shows the 
following:

[Term]
id: SO:0001622
name: UTR_variant
def: "A transcript variant that is located within the UTR." [SO:ke]
synonym: "UTR variant" EXACT []
synonym: "UTR_" EXACT ebi_variants 
[http://ensembl.org/info/docs/variation/index.html]
is_a: SO:0001791 ! exon_variant
is_a: SO:0001968 ! coding_transcript_variant
created_by: kareneilbeck
creation_date: 2010-03-23T11:22:58Z

that is, it has two children, not just one. The child SO:0001791 is 
missing. Actually, looking to the distribution of the number of children 
per term, they all have at most one child:

nchild <- sapply(edges(gSOXP), length)
table(nchild)
nchild
    0    1
  206 2072

I have not found in the manual page of getOBOCollection() that this 
function cannot import more than one child per term, so I guess this is 
either a bug or an oversight issue.

cheers,

robert.



More information about the Bioc-devel mailing list