[BioC] domainsignatures with non-human KEGG pathways
Robert Castelo
robert.castelo at upf.edu
Tue Dec 15 12:49:07 CET 2009
dear list and, particularly, dear domainsignatures package maintainers
(Florian?),
i was trying to use the package domainsignatures from the current
BioC-devel version (see my sessionInfo at the end of this message) to
test for the enrichment of a gene list throughout the collection of
available KEGG pathways in mouse and found that the main function that
collects the KEGG data is tailored to be employed with human data only.
more concretely, the function 'getKEGGdata' contains the following
hardcoded line in its source:
ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
since this function already provides the possibility of restricting the
set of pathways to be tested through the 'pathways' argument i guess
that it is not the intention of the package to restrict itself to human.
so, i'd like to suggest the maintainers to try to make the function
general for any organism for which KEGG and ensembl provide the
necessary data.
to get inmediately going i've made a quick dirty fix which i paste
below, just in case it may be useful.
btw, the package function 'gseDomain' outputs in my R-devel installation
the following warning after being called:
Warning message:
In progress(message = mess, sub = sub) : Need tcltk for the status bar
which i guess has to do with the fact that i'm missing some software
component in my linux box because loading 'tcltk' gives the following
messsage:
library(tcltk)
Error in firstlib(which.lib.loc, package) :
Tcl/Tk support is not available on this system
Error in library(tcltk) : .First.lib failed for 'tcltk'
searching for documentation about how to properly install 'tcltk' i've
found out that this package seems to be removed from CRAN, see
http://cran.r-project.org/web/packages/tcltk/index.html
and i've seen another package called 'tcltk2' which sounds like a
replacement for 'tcltk'. i just wanted to comment this in case it may be
an issue to consider for the package maintainers.
thanks!!!
robert.
myGetKEGGdata <- function(universe=NULL, pathways=NULL,
ensemblMart=NULL) { ## add ensemblMart argument
op <- options(warn = -1)
on.exit(options(op))
if (class(try(readLines("http://www.bioconductor.org"), silent =
TRUE)) ==
"try-error")
stop("Active internet connection needed for this function")
options(op)
if (!is.null(pathways))
hKEGGids <- pathways
else hKEGGids <- grep("^hsa", ls(KEGGPATHID2EXTID), value = TRUE)
path2Genes <- mget(hKEGGids, KEGGPATHID2EXTID)
hKEGGgenes <- union(universe, unique(unlist(path2Genes, use.names =
FALSE)))
hKEGGgenes <- hKEGGgenes[!is.na(hKEGGgenes)]
if (is.null(ensemblMart)) ## if no specific ensembl mart is provided
then use human
ensemblMart <- "hsapiens_gene_ensembl"
ensembl <- useMart("ensembl", dataset = ensemblMart)
tmp <- getBM(attributes = c("entrezgene", "interpro"), filters =
"entrezgene",
values = hKEGGgenes, mart = ensembl)
gene2Domains <- split(tmp$interpro, tmp$entrezgene, drop = FALSE)
missing <- setdiff(hKEGGgenes, names(gene2Domains))
gene2Domains[missing] <- ""
hKEGGdomains <- unique(unlist(gene2Domains))
hKEGGdomains <- hKEGGdomains[!is.na(hKEGGdomains)]
path2Domains <- lapply(path2Genes, function(x, gene2Domains)
unique(unlist(gene2Domains[x],
use.names = FALSE)), gene2Domains)
dims <- c(pathway = length(hKEGGids), gene = length(hKEGGgenes),
domain = length(hKEGGdomains))
return(new("ipDataSource", genes = hKEGGgenes, pathways = hKEGGids,
domains = hKEGGdomains, gene2Domains = gene2Domains,
path2Domains = path2Domains, dims = dims, type = "KEGG"))
}
sessionInfo()
R version 2.11.0 Under development (unstable) (2009-10-06 r49948)
x86_64-unknown-linux-gnu
locale:
[1] C
attached base packages:
[1] grid stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] domainsignatures_1.7.0 biomaRt_2.3.0
prada_1.23.0
[4] rrcov_1.0-00 pcaPP_1.7
mvtnorm_0.9-8
[7] robustbase_0.5-0-1 RColorBrewer_1.0-2
KEGG.db_2.3.5
[10] RSQLite_0.7-3 DBI_0.2-4
AnnotationDbi_1.9.2
[13] Biobase_2.7.2
loaded via a namespace (and not attached):
[1] MASS_7.3-4 RCurl_1.3-0 XML_2.6-0 stats4_2.11.0
More information about the Bioconductor
mailing list