[Bioc-devel] GSEABase::getOBOCollection() missing children

Martin Morgan mtmorgan at fredhutch.org
Fri Jun 5 20:05:34 CEST 2015


On 06/05/2015 08:51 AM, Robert Castelo wrote:
> hi,
>
> importing an OBO file with GSEABase::getOBOCollection() I have observed missing
> children in the imported ontology. Here is an example with the Sequence Ontology:

Thanks Robert, the import went ok, but the coercion to graphNEL was flawed. This 
is fixed in 1.31.2 in devel, and will be ported to release / available via 
biocLite tomorrow afternoon (all being well...)


Martin

>
> library(GSEABase)
>
> oboSOXP <-
> getOBOCollection("http://sourceforge.net/p/song/svn/HEAD/tree/trunk/so-xp.obo")
> Warning message:
> In readLines(src) :
>    incomplete final line found on
> 'http://sourceforge.net/p/song/svn/HEAD/tree/trunk/so-xp.obo'
> gSOXP <- as(oboSOXP, "graphNEL")
> edges(gSOXP)[["SO:0001622"]]
> [1] "SO:0001968"
>
> so the term SO:0001622 in principle has only one child term SO:0001968. However,
> a free text search for this entry in the OBO file shows the following:
>
> [Term]
> id: SO:0001622
> name: UTR_variant
> def: "A transcript variant that is located within the UTR." [SO:ke]
> synonym: "UTR variant" EXACT []
> synonym: "UTR_" EXACT ebi_variants
> [http://ensembl.org/info/docs/variation/index.html]
> is_a: SO:0001791 ! exon_variant
> is_a: SO:0001968 ! coding_transcript_variant
> created_by: kareneilbeck
> creation_date: 2010-03-23T11:22:58Z
>
> that is, it has two children, not just one. The child SO:0001791 is missing.
> Actually, looking to the distribution of the number of children per term, they
> all have at most one child:
>
> nchild <- sapply(edges(gSOXP), length)
> table(nchild)
> nchild
>     0    1
>   206 2072
>
> I have not found in the manual page of getOBOCollection() that this function
> cannot import more than one child per term, so I guess this is either a bug or
> an oversight issue.
>
> cheers,
>
> robert.
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-devel mailing list