[BioC] goseq KEGG testing?
Jenny Drnevich
drnevich at illinois.edu
Tue Mar 22 21:06:38 CET 2011
Hi,
I'm learning how to use goseq for RNA-Seq data. The vignette says
that in addition to GO BP, MF and CC testing, there is native support
for KEGG category testing. When I look at the man page for goseq, it says:
"test.cats A vector specifying which categories to test for over
representation amongst DE genes. See details for vaild options.
...
Valid options for the test.cats arguement are any combination of
"GO:CC", "GO:BP", "GO:MF" & "KEGG". The three GO terms refer to the
Cellular Component, Biological Process and Molecular Function
respectively. "KEGG" refers to KEGG pathways. "
However, when I try to do just test.cats = "KEGG" or
test.cats=c("GO:MF","KEGG"), I get an error. I'm using the demo data
from the vignette, all codes and sessionInfo below. I'm trying to
include this in a workshop I'm teaching on Thursday, and hope to find
a quick answer!
Thanks,
Jenny
> library(goseq)
>
> library(edgeR)
> table.summary <- read.table(system.file("extdata", "Li_sum.txt",
package = "goseq"),
+ sep = "\t", header = TRUE, stringsAsFactors = FALSE)
> counts <- table.summary[, -1]
> rownames(counts) <- table.summary[, 1]
> grp <- factor(rep(c("Control", "Treated"), times = c(4, 3)))
> summarized <- DGEList(counts, lib.size = colSums(counts), group = grp)
> disp <- estimateCommonDisp(summarized)
> tested <- exactTest(disp)
Comparison of groups: Treated - Control
> topTags(tested)
Comparison of groups: Treated-Control
logConc logFC PValue FDR
ENSG00000127954 -31.02991 37.972297 6.800102e-79 3.366458e-74
ENSG00000151503 -12.96052 5.404687 9.143635e-66 2.263324e-61
ENSG00000096060 -11.77715 4.899691 4.866396e-60 8.030527e-56
ENSG00000091879 -15.35999 5.771018 5.469701e-55 6.769575e-51
ENSG00000132437 -14.14850 -5.896416 3.487845e-52 3.453385e-48
ENSG00000166451 -12.61713 4.567569 4.410990e-52 3.626363e-48
ENSG00000131016 -14.80016 5.274233 5.127568e-52 3.626363e-48
ENSG00000163492 -17.28041 7.296034 1.231576e-44 7.621299e-41
ENSG00000113594 -12.24687 4.053165 6.002790e-44 3.301935e-40
ENSG00000116285 -13.01524 4.112220 4.005898e-43 1.983160e-39
> genes.Seq <-
as.integer(p.adjust(tested$table$p.value[tested$table$logFC != 0],
+ method = "BH") < 0.05)
> names(genes.Seq) = row.names(tested$table[tested$table$logFC != 0, ])
> length(genes.Seq)
[1] 22743
> #This is the number of genes in the universe
> table(genes.Seq)
genes.Seq
0 1
19344 3399
> #This shows how many have 1s, and are "selected"
>
> pwf.Seq <- nullp(genes.Seq, "hg18", "ensGene")
Loading hg18 length data...
>
> GO.MF.Seq = goseq(pwf.Seq, "hg18", "ensGene", test.cats = c("GO:MF"))
Fetching GO annotations...
Calculating the p-values...
> dim(GO.MF.Seq)
[1] 2688 3
> names(GO.MF.Seq)
[1] "category" "upval" "dpval"
> GO.MF.Seq[1:10,]
category upval dpval
1034 GO:0005515 2.881933e-50 1.0000000
18 GO:0000166 2.493016e-15 1.0000000
1042 GO:0005524 5.321059e-13 1.0000000
1628 GO:0016787 1.557386e-09 1.0000000
2328 GO:0046872 8.558029e-09 1.0000000
1611 GO:0016740 2.732780e-07 1.0000000
122 GO:0003700 3.179382e-07 1.0000000
147 GO:0003735 3.793941e-06 0.9999985
1033 GO:0005509 4.306285e-06 0.9999978
155 GO:0003779 5.667785e-06 0.9999976
>
> ?goseq
starting httpd help server ... done
> KEGG.Seq = goseq(pwf.Seq, "hg18", "ensGene", test.cats = c("KEGG"))
Fetching GO annotations...
Error in getgo(names(DEgenes), genome, id, fetch.cats = test.cats) :
object 'go' not found
>
> KEGG.Seq = goseq(pwf.Seq, "hg18", "ensGene", test.cats = c("GO:MF","KEGG"))
Fetching GO annotations...
Error in getgo(names(DEgenes), genome, id, fetch.cats = test.cats) :
object 'go' not found
>
> sessionInfo()
R version 2.12.2 (2011-02-25)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252
[3] LC_MONETARY=English_United States.1252
LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1]
org.Hs.eg.db_2.4.6 edgeR_2.0.4 goseq_1.2.0
geneLenDataBase_0.99.5
[5]
BiasedUrn_1.03 GOstats_2.16.0 graph_1.28.0
Category_2.16.1
[9]
drosgenome1.db_2.4.5 org.Dm.eg.db_2.4.6 biomaRt_2.6.0
limma_3.6.9
[13]
affycoretools_1.22.0 KEGG.db_2.4.5 GO.db_2.4.5
RSQLite_0.9-4
[17]
DBI_0.2-5 AnnotationDbi_1.12.0 affy_1.28.0
Biobase_2.10.0
loaded via a namespace (and not attached):
[1]
affyio_1.18.0 affyPLM_1.26.1 annaffy_1.22.0
annotate_1.28.1
[5]
Biostrings_2.18.4 BSgenome_1.18.3 gcrma_2.22.0
genefilter_1.32.0
[9] GenomicFeatures_1.2.3
GenomicRanges_1.2.3 grid_2.12.2 GSEABase_1.12.2
[13]
IRanges_1.8.9 lattice_0.19-17 Matrix_0.999375-46
mgcv_1.7-3
[17] nlme_3.1-98 preprocessCore_1.12.0
RBGL_1.26.0 RCurl_1.5-0.1
[21]
rtracklayer_1.10.6 splines_2.12.2 survival_2.36-5
tools_2.12.2
[25] XML_3.2-0.2 xtable_1.5-6
More information about the Bioconductor
mailing list