[BioC] Why are there different number of pathways in pathway2gene and in pathway2name (KEGG.db)?
Marc Carlson
mcarlson at fhcrc.org
Tue Nov 1 01:13:07 CET 2011
Hi Peng,
It's because of the way that the database was built. The data in
pathway to gene is limited to those organisms that we produce annotation
packages for here.
Marc
On 10/12/2011 02:39 PM, Peng Yu wrote:
> Hi,
>
> There are 292 pathways according to pathway2gene, but there are 390
> pathways according to pathway2name. I'm wondering why these two
> numbers are not the same.
>
>> library(KEGG.db)
>> pathway2gene=dbGetQuery(KEGG_dbconn(), "SELECT * FROM pathway2gene")
>>
>> species=unique(substr(unique(pathway2gene$pathway_id),1,3))
>> species
> [1] "hsa" "ath" "dme" "mmu" "rno" "sce" "pfa" "dre" "eco" "ecs" "cfa" "bta"
> [13] "cel" "ssc" "gga" "mcc" "xla" "aga" "ptr"
>> tmp=lapply(
> + species
> + , function(x) {
> + unique(pathway2gene$pathway_id[grep(paste('^', x,sep=''),
> pathway2gene$pathway_id)])
> + }
> + )
>> sapply(tmp, length)
> [1] 229 123 127 225 225 99 80 155 105 107 224 225 125 225 152 225 150 126 225
>> tmp1=unique(
> + unlist(
> + lapply(
> + tmp
> + , function(x) {
> + substr(x, 4, 8)
> + }
> + )
> + )
> + )
>> length(tmp1)
> [1] 292
>
>
>> pathway2name=dbGetQuery(KEGG_dbconn(), 'SELECT * FROM pathway2name')
>> length(unique(pathway2name$path_id))
> [1] 390
>
>> sessionInfo()
> R version 2.13.0 (2011-04-13)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices datasets utils methods base
>
> other attached packages:
> [1] smart.source_1.0 KEGG.db_2.5.0 RSQLite_0.9-4
> [4] DBI_0.2-5 AnnotationDbi_1.14.1 Biobase_2.12.2
>
More information about the Bioconductor
mailing list