[BioC] rBiopaxParser, Reactome and namespaces
Paul Shannon
paul.thurmond.shannon at gmail.com
Wed May 22 05:08:12 CEST 2013
Hi Frank,
I am most happiliy using the rBiopaxParser package, and your vignette, in order to extract detailed (but topologically simple) interaction data from the latest Reactome "Homosapiens.owl". Your package offers great power and convenience.
However, I run into difficulty with namespaces.
For a simple example, consider this one line from the method listIntances, found in the file R/selectBiopax.R:
sel = sel & (tolower(biopax$df$class) %in% tolower(stripns(class)))
As parsed from Homosapiens.owl, the class column of biopax$df has values like these, always containing a namespace prefix:
head(unique(biopax$df$class))
"bp:BiochemicalReaction" "bp:Protein"
"bp:CellularLocationVocabulary" "bp:UnificationXref"
"bp:ProteinReference" "bp:BioSource"
By stripping the namespace off of "bp:Protein" (the right hand side of the %in% clause) it cannot match the biopax$df$class value, as parsed from the owl file (which preserves the "bp:").
I believe I see similar logic in other places, with these methods specifically encountered so far:
selectInstances
listPathwayComponents
Namespaces are used with the "property" column as well:
head(table(biopax$df$property), n=3)
bp:author bp:cellularLocation bp:comment
55654 23838 123750
Speaking from the nickel seats, and not claiming to understand all of the implications: perhaps these could be neatly avoided if your readBiopax method could optionally eliminate namespaces when reading in an owl file?
Thanks,
- Paul
More information about the Bioconductor
mailing list