[BioC] pathview puzzle
Luo Weijun
luo_weijun at yahoo.com
Thu Aug 22 23:01:49 CEST 2013
Hi Oleg,
You are right, the problem is due to ID type inconsistency.
You have to specify gene.idtype when calling pathview function, if your gene id type is not Entrez Gene. I don’t think b-numbers are recognized for sure. For your gene name example, if you mean official gene symbols by “gene name”, you should specify gene.idtype="SYMBOL" (lower case is fine):
eco2.out <- pathview(gene.data = T2.CEBF095.crt115.ASCH.DROP3.rel.gn, pathway.id = "02010", gene.idtype="SYMBOL", out.suffix = "T2ACSH", species = "eco", kegg.native=TRUE)
You may want to check the help info on pathview function for details:
?pathview
Pathview supports 10 different common ID types for a model organisms (plus KEGG orthology IDs). For the supported common ID types, type:
gene.idtype.list
For external IDs not in the supported common ID type lists, we may make use of the mol.sum function to do the ID and data mapping explicitly. Check the example in page 14 of the vignette or help info on the function:
?mol.sum
HTH.
Weijun
--------------------------------------------
On Wed, 8/21/13, Oleg Moskvin <moskvin at wisc.edu> wrote:
Subject: pathview: problem with coloring
Date: Wednesday, August 21, 2013, 6:12 PM
Hi Weijun,
Your pathview is very attractive package. While I can
reproduce the results with the human data provided in the
example, I am getting coloring problems with E.coli data.
This seems to be gene ID mismatch that comes from the
inconsistency in the ID handling by the package.
The KEGG pathways fro E.coli contains "b-numbers" as gene
IDs.
If I supply expression set based on b-numbers, it is not
recognized, if I supply expression set based on gene names,
it is (!) recognized but the resulting coloring is all-white
(#FFFFFF).
Details:
###### 1. Using b-numbers:
head(T2.CEBF095.crt115.ASCH.DROP3.rel)
ACSH_vs_synH
EKO11_2926 -1.3362079
b0019 0.9265879
b0032 -4.2007218
b0033 -3.6678436
b0058 1.1996750
b0060 0.8624787
eco.out <- pathview(gene.data =
T2.CEBF095.crt115.ASCH.DROP3.rel, pathway.id = "02010",
out.suffix = "T2ACSH", species = "eco", kegg.native=TRUE)
[1] "Downloading xml files for eco02010, 1/1 pathways.."
[1] "Downloading png files for eco02010, 1/1 pathways.."
Error in mol.data[as.character(items[hit]), ] : subscript
out of bounds
In addition: Warning messages:
1: In node.map(gene.data, node.data, node.types =
gene.node.type, node.sum = node.sum) :
NAs introduced by coercion
2: In FUN(1:153[[1L]], ...) : NAs introduced by coercion
###### 2. Using gene names:
head(T2.CEBF095.crt115.ASCH.DROP3.rel.gn)
ACSH_vs_synH
nhaA 0.9265879
carA -4.2007218
carB -3.6678436
caiF -1.4380677
folA -0.8914105
rluA 1.1996750
eco2.out <- pathview(gene.data =
T2.CEBF095.crt115.ASCH.DROP3.rel.gn, pathway.id = "02010",
out.suffix = "T2ACSH", species = "eco", kegg.native=TRUE)
Loading required package: org.EcK12.eg.db
Working in directory
/mnt/omdir/omoskvin/Projects/Ecoli/cMonkey
Writing image file eco02010.T2ACSH.png
There were 50 or more warnings (use warnings() to see the
first 50)
> head(eco2.out[[1]])
kegg.names labels type x y width height ACSH_vs_synH
mol.col
4 b1513 gene 339 1882 46 17 NA #FFFFFF
5 b1515 gene 293 1890 46 17 NA #FFFFFF
6 b1514 gene 293 1873 46 17 NA #FFFFFF
7 b1516 gene 247 1882 46 17 NA #FFFFFF
18 b4087 gene 339 1823 46 17 NA #FFFFFF
19 b4086 gene 293 1823 46 17 NA #FFFFFF
So, b-numbers cause an early "out of bounds" error while
gene names result in proceeding further but no coloring in
the result!
Please help.
Thank you,
Oleg
More information about the Bioconductor
mailing list