[BioC] Why are there different number of pathways in pathway2gene and in pathway2name (KEGG.db)?
Peng Yu
pengyu.ut at gmail.com
Wed Oct 12 23:39:02 CEST 2011
Hi,
There are 292 pathways according to pathway2gene, but there are 390
pathways according to pathway2name. I'm wondering why these two
numbers are not the same.
> library(KEGG.db)
> pathway2gene=dbGetQuery(KEGG_dbconn(), "SELECT * FROM pathway2gene")
>
> species=unique(substr(unique(pathway2gene$pathway_id),1,3))
> species
[1] "hsa" "ath" "dme" "mmu" "rno" "sce" "pfa" "dre" "eco" "ecs" "cfa" "bta"
[13] "cel" "ssc" "gga" "mcc" "xla" "aga" "ptr"
>
> tmp=lapply(
+ species
+ , function(x) {
+ unique(pathway2gene$pathway_id[grep(paste('^', x,sep=''),
pathway2gene$pathway_id)])
+ }
+ )
>
> sapply(tmp, length)
[1] 229 123 127 225 225 99 80 155 105 107 224 225 125 225 152 225 150 126 225
>
> tmp1=unique(
+ unlist(
+ lapply(
+ tmp
+ , function(x) {
+ substr(x, 4, 8)
+ }
+ )
+ )
+ )
>
> length(tmp1)
[1] 292
> pathway2name=dbGetQuery(KEGG_dbconn(), 'SELECT * FROM pathway2name')
> length(unique(pathway2name$path_id))
[1] 390
> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] smart.source_1.0 KEGG.db_2.5.0 RSQLite_0.9-4
[4] DBI_0.2-5 AnnotationDbi_1.14.1 Biobase_2.12.2
--
Regards,
Peng
More information about the Bioconductor
mailing list