[BioC] "graphite" Biocarta 'native' graphs different from Biocarta web site?
Hamid Bolouri
hbolouri at fhcrc.org
Wed Jun 6 02:55:39 CEST 2012
Graphite's native Biocarta pathways seem to have a different node list than that given by the Biocarta "PROTEIN LIST" link on Biocarta pathway pages (presumably what the pathway authors consider the 'true' pathway membership).
There seem to be 2 categories of difference:
(1) Some genes listed by Biocarta are absent from graphite's version (see ??? marks in the example below).
(2) Because the native format nodes are annotated variously, it's necessary to do a node conversion. In particular, Biocarta's "PROTEIN LIST" gives _specific_ members of enzyme families, whereas graphite seems to replace EC numbers with all family members. However, I have trouble explaining how some enzymes are on/off the list (see --- marks in the example below).
Am I misinterpreting things? If not, is there any way to get pathway graphs with node lists more closely matching what Biocarta lists online?
Thanks,
Hamid Bolouri
--
http://labs.fhcrc.org/bolouri
Example:
> biocarta[["epo signaling pathway"]]
"epo signaling pathway" pathway from BioCarta
Number of nodes = 10
Number of edges = 24
Type of identifiers = native
Retrieved on = 2011-05-12
> nodes(biocarta[["epo signaling pathway"]])
[1] "EntrezGene:2056" "EntrezGene:2057"
[3] "EntrezGene:2885" "EntrezGene:3265"
[5] "EntrezGene:6464" "EntrezGene:6654"
[7] "EnzymeConsortium:2.7.1.112" "EnzymeConsortium:3.1.3.48"
[9] "EnzymeConsortium:3.1.4.11" "STAT5"
> PE <- convertIdentifiers(biocarta[["epo signaling pathway"]],type="entrez")
> nodes(PE)
[1] "2056" "2057" "2885" "3265" "6464" "6654" "52" "993"
[9] "994" "995" "1843" "1844" "1845" "1846" "1847" "1848"
[17] "1849" "1850" "1852" "5770" "5777" "5778" "5781" "5787"
[25] "5788" "5792" "5795" "5797" "5798" "5799" "5801" "5803"
[33] "8555" "8556" "11072" "11221" "56940" "80824" "84867" "5330"
[41] "5331" "5332" "5333" "5335" "5336" "23236" "84812" "113026"
> PS <- convertIdentifiers(biocarta[["epo signaling pathway"]],type="symbol")
> nodes(PS)
[1] "EPO" "EPOR" "GRB2" "HRAS" "SHC1" "SOS1" "ACP1" "CDC25A"
[9] "CDC25B" "CDC25C" "DUSP1" "DUSP2" "DUSP3" "DUSP4" "DUSP5" "DUSP6"
[17] "DUSP7" "DUSP8" "DUSP9" "PTPN1" "PTPN6" "PTPN7" "PTPN11" "PTPRB"
[25] "PTPRC" "PTPRF" "PTPRJ" "PTPRM" "PTPRN" "PTPRN2" "PTPRR" "PTPRZ1"
[33] "CDC14B" "CDC14A" "DUSP14" "DUSP10" "DUSP22" "DUSP16" "PTPN5" "PLCB2"
[41] "PLCB3" "PLCB4" "PLCD1" "PLCG1" "PLCG2" "PLCB1" "PLCD4" "PLCD3"
Compare the above with what I get from:
http://www.biocarta.com/pathfiles/PathwayProteinList.asp?showPFID=69
<NB The header is mine & I reordered the table to group similar cases>
<geneDescription EntrezID ***==HBcomment>
erythropoietin 2056 ***
erythropoietin receptor 2057 ***
growth factor receptor-bound protein 2 2885 ***
son of sevenless homolog 1 (Drosophila) 6654 ***
v-Ha-ras Harvey rat sarcoma viral oncogene homolog 3265 ***
signal transducer and activator of transcription 5A 6776 ***
signal transducer and activator of transcription 5B 6777 ***
SHC (Src homology 2 domain containing) transforming protein 1 6464 ***
v-fos FBJ murine osteosarcoma viral oncogene homolog 2353 ???
v-raf-1 murine leukemia viral oncogene homolog 1 5894 ???
ELK1, member of ETS oncogene family 2002 ???
jun oncogene 3725 ???
casein kinase 2, alpha 1 polypeptide 1457 ???
Janus kinase 2 (a protein tyrosine kinase) 3717 ???
mitogen-activated protein kinase 3 5595 ---
mitogen-activated protein kinase 8 5599 ---
mitogen-activated protein kinase kinase 1 5604 ---
phospholipase C, gamma 1 5335 ok
protein tyrosine phosphatase, non-receptor type 6 5777 ok
HBcomment: ***== in graphite, ???==missing from graphite,
---==specific enzymes in Biocarta are mapped to large (& urnrelated?) families in graphite
###
> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] graphite_1.2.0 AnnotationDbi_1.18.1 Biobase_2.16.0
[4] BiocGenerics_0.2.0 RSQLite_0.11.1 DBI_0.2-5
[7] graph_1.34.0
loaded via a namespace (and not attached):
[1] IRanges_1.14.3 org.Hs.eg.db_2.7.1 stats4_2.15.0 tools_2.15.0
###
More information about the Bioconductor
mailing list