[BioC] Error in extracting genes from Goterms using GOStats package

James W. MacDonald jmacdon at uw.edu
Tue Jul 30 16:57:50 CEST 2013


Hi Sandy,

Unfortunately, the GOstats package expects that the IDs used will be 
Entrez Gene IDs, and this is hard-coded in a few places, one being 
probeSetSummary(). Luckily, it doesn't take much to do an end-around on 
this problem. Paste the following function into your R session, after 
generating your hgOver object, and it should work (it did for me).

You can then call probeSetSummary2(hgOver).

probeSetSummary2 <- function (result, pvalue, categorySize, sigProbesets)
{
if (!is(result, "GOHyperGResult"))
stop("result must be a GOHyperGResult instance (or subclass)")
ps2egEnv <- annotate:::getAnnMap("ACCNUM", annotation(result))
psids <- ls(ps2egEnv)
havePsids <- eapply(ps2egEnv, function(ids) {
if (length(ids) == 1 && is.na(ids))
FALSE
else TRUE
})
ord <- order(names(havePsids))
psids <- psids[unlist(havePsids)[ord]]
eg2ps <- split(psids, unlist(mget(psids, ps2egEnv)))
if (missing(pvalue))
pvalue <- pvalueCutoff(result)
if (missing(categorySize))
categorySize <- NULL
summary <- Category:::XXX_getSummaryGeneric_XXX()
goids <- summary(result, pvalue, categorySize)[, 1]
sigegids <- geneIds(result)
egids <- geneIdUniverse(result)[goids]
psetids <- lapply(egids, function(ids) {
ids <- as.character(ids)
have <- ids %in% sigegids
ans <- eg2ps[ids[have]]
if (length(ans))
unlist(ans)
else NULL
})
psetidsNULL <- sapply(psetids, is.null)
psetids <- psetids[!psetidsNULL]
if (missing(sigProbesets)) {
selectedProbeSetIDs <- names(geneIds(result))
if (is.null(selectedProbeSetIDs))
warning(paste("The vector of geneIds used to create the GOHyperGParams",
"object was not a named vector.\nIf you want to know the",
"probesets that contributed to this result\neither use",
"a named vector for geneIds, or pass a vector of probeset IDs\n",
"via sigProbesets.\n", sep = ""), call. = FALSE)
}
else {
selectedProbeSetIDs <- sigProbesets
}
selectedInd <- lapply(psetids, function(ids) {
ids %in% selectedProbeSetIDs
})
egids <- lapply(psetids, function(ids) unlist(mget(ids, ps2egEnv)))
out <- vector(mode = "list", length = length(psetids))
for (i in seq(along = psetids)) {
out[[i]] <- data.frame(EntrezID = egids[[i]], ProbeSetID = psetids[[i]],
selected = as.integer(selectedInd[[i]]), row.names = NULL,
stringsAsFactors = FALSE)
}
names(out) <- names(psetids)
out
}

Best,

Jim



On 7/28/2013 8:55 AM, Sandy [guest] wrote:
> Hi,
>
> I used the GOStats package on the following data:
>
> locus<-c("261166_s_at", "263208_at", "263204_at", "263203_at",
> +          "263206_at", "260998_at", "264320_at", "264165_at", "264162_at",
> +          "263786_at", "262347_at", "261407_at", "263869_at", "260562_at",
> +          "260801_at", "258988_at", "258983_at", "262407_at", "260315_at",
> +          "259125_at", "261371_at", "261418_at", "255821_at", "255784_at",
> +          "255498_at", "259191_at", "259137_at", "262360_at", "262071_at",
> +          "255684_at", "261511_at", "263729_at", "264413_s_at", "264381_at",
> +          "264532_at", "260860_at", "263233_at", "263959_at", "255868_at",
> +          "262780_at", "246214_at", "260782_at", "260756_at", "260875_at",
> +          "259247_at", "263943_at", "260112_at", "262010_at", "255851_at",
> +          "246557_at", "261444_at", "264562_at", "264556_at", "259257_at",
> +          "261731_s_at", "260217_at", "260192_at", "255861_at", "264582_at",
> +          "261639_at", "261669_at", "255928_at", "255920_at", "259250_at",
> +          "255734_at", "263950_at", "264625_at", "264623_at", "259935_at",
> +          "261892_at", "261498_at", "256086_at", "263993_at", "261859_at",
> +          "260033_at", "259999_at", "261309_at", "259375_at", "260374_at",
> +          "260346_at", "262693_at", "263827_at", "262453_at", "262505_at",
> +          "265220_at", "265221_s_at", "265338_at", "266167_at", "263030_at",
> +          "262897_at", "265511_at", "265956_at", "263146_at", "264850_at",
> +          "264107_s_at", "264838_at", "266163_at", "264787_at", "264758_at",
> +          "263552_x_at", "263550_at", "263284_at", "263304_at", "263093_at",
> +          "263412_at", "262978_at", "266292_at", "265783_at", "265723_at",
> +          "265674_at", "263599_at", "265761_at", "266744_at", "266900_at",
> +          "266596_at", "266594_at", "262806_at", "265159_at", "267027_at",
> +          "245166_at", "245162_at", "245167_s_at", "266885_at", "266472_at",
> +          "266456_at", "266101_at", "262838_at", "263458_at", "263461_at",
> +          "264711_at", "263376_at", "265575_at", "265881_at", "266431_at",
> +          "260101_at", "267060_at", "267064_at", "265913_at", "260151_at",
> +          "260159_at", "266377_at", "245173_at", "258895_at", "258586_s_at",
> +          "258491_at", "258120_at", "258123_at", "258253_at", "258446_at",
> +          "267084_at", "267075_at", "258223_at", "267076_at", "267077_at",
> +          "267083_at", "258723_at", "258722_at", "258739_s_at", "258528_at",
> +          "258481_at", "258190_at", "258873_at", "258312_at", "258634_at",
> +          "258637_at", "258607_at", "258346_at", "258291_at", "258283_at",
> +          "258565_at", "258773_at", "255957_at", "258407_at", "255949_at",
> +          "257351_at", "257262_at", "257260_at", "257855_at", "257853_at",
> +          "256885_at", "256887_at", "256882_at", "257871_at", "257449_at",
> +          "258949_at", "258948_at", "258079_at", "258056_at", "257808_at",
> +          "257801_at", "257798_at", "257400_s_at", "257585_at", "256321_at",
> +          "256796_at", "257325_at", "256451_s_at", "257983_at", "256518_at",
> +          "256582_at", "257732_at", "256819_at", "252668_at", "252500_at",
> +          "252510_at", "252398_at", "252357_at", "252241_at", "252208_at",
> +          "252135_at", "252132_at", "252070_at", "252074_at", "246254_at",
> +          "256372_at", "256369_at", "251900_at", "251858_at", "251867_at",
> +          "251839_at", "251784_at", "251692_s_at", "251686_at", "251642_at",
> +          "251633_at", "251647_at", "251583_at", "251516_s_at", "251449_at",
> +          "251387_at", "251251_at", "251197_at", "251178_at", "251125_at",
> +          "255402_at", "255357_at", "255298_at", "255280_at", "255203_at",
> +          "255186_at", "254997_s_at", "254827_at", "254844_at", "254718_at",
> +          "254663_at", "254606_at", "254588_at", "254449_at", "245350_at",
> +          "245309_at", "245264_at", "245285_s_at", "245386_at", "245287_at",
> +          "245293_at", "254465_at", "254228_at", "254012_at", "254002_at",
> +          "253953_at", "253935_at", "253922_at", "253868_at", "253693_at",
> +          "253660_at", "253503_at", "253482_at", "253498_at", "253407_at",
> +          "253326_at", "253291_at", "253242_at", "253121_at", "253087_at",
> +          "253045_at", "253062_at", "253035_at", "252990_at", "252978_at",
> +          "252833_at", "252818_at", "252793_at", "252707_at", "252655_at",
> +          "252648_at", "250853_s_at", "250854_at", "250832_at", "250791_at",
> +          "250689_at", "250530_at", "250445_at", "245996_at", "250399_at",
> +          "250367_s_at", "267074_s_at", "267073_at", "267080_at", "267072_at",
> +          "250323_at", "267082_at", "267081_at", "250305_at", "250250_at",
> +          "250242_at", "250199_at", "250182_at", "250101_at", "250103_at",
> +          "250110_at", "250121_at", "250077_at", "250074_at", "250012_x_at",
> +          "246485_at", "250028_at", "246450_at", "246477_at", "246442_at",
> +          "249994_at", "249925_at", "245912_at", "245902_at", "246031_at",
> +          "246029_at", "245965_at", "246140_at", "246136_at", "249786_at",
> +          "249701_at", "249688_at", "249689_at", "246843_at", "246746_at",
> +          "246775_at", "246751_at", "246661_at", "246656_at", "246547_at",
> +          "249583_at", "249550_at", "249503_at", "249393_at", "249394_at",
> +          "249349_at", "249346_at", "249225_at", "249155_at", "249118_at",
> +          "249068_at", "249062_at", "249073_at", "248960_at", "248936_at",
> +          "248712_at", "248609_at", "248618_at", "248573_at", "248501_at",
> +          "248372_at", "248262_at", "248192_at", "248187_at", "248171_at",
> +          "248122_at", "247954_at", "247571_at", "247520_at", "247407_at",
> +          "247378_at", "247349_at", "247297_at", "247246_at", "247215_at",
> +          "247128_at", "247098_at", "246994_at", "246976_s_at", "246978_at",
> +          "246925_at", "244943_at", "244925_at"))
>
>
> seltn<- seltpn<-c("263208_at", "263204_at", "260998_at", "264320_at", "264165_at",
> +                             "264162_at", "263786_at", "261407_at", "263869_at", "260801_at",
> +                              "258988_at", "258983_at", "262407_at", "259125_at", "261371_at",
> +                              "261418_at", "255821_at", "259191_at", "259137_at", "262360_at",
> +                              "262071_at", "255684_at", "261511_at", "264413_s_at", "264381_at",
> +                              "260860_at", "263233_at", "263959_at", "246214_at", "260782_at",
> +                              "259247_at", "262010_at", "246557_at", "261444_at", "264562_at",
> +                              "259257_at", "261731_s_at", "255861_at", "255928_at", "259250_at",
> +                              "255734_at", "264625_at", "264623_at", "261892_at", "261498_at",
> +                              "259999_at", "261309_at", "259375_at", "260346_at", "263827_at",
> +                              "265220_at", "265221_s_at", "265338_at", "266167_at", "263030_at",
> +                              "262897_at", "265511_at", "263146_at", "264850_at", "264107_s_at")
>
>
> seltxxn<-unlist(mget(seltn,ath1121501ACCNUM,ifnotfound=NA))
> locus<-unlist(mget(locus,ath1121501ACCNUM,ifnotfound=NA))
> seltpbn<- new("GOHyperGParams", geneIds = seltxxn, universeGeneIds =locus, annotation="ath1121501",
>                  ontology = "BP", pvalueCutoff = 0.05, conditional = FALSE, testDirection = "over")
> hgOver<- hyperGTest(seltpbn)
>
>
> When I use ne to GO BP  test for over-representation
> 677 GO BP ids tested (30 have p<  0.05)
> Selected gene set size: 46
>      Gene universe size: 315
>      Annotation package: ath1121501
>
>
> But when I would like to extract the genes associated with a particular GO term, I get the following error:
>
> probeSetSummary(hgOver)
>
> Error in (function (classes, fdef, mtable)  :
>    unable to find an inherited method for function ‘cols’ for signature ‘"function"’
>
> I do not understand where am going wrong. I just need to extract those genes that are associated with the GO Term from hgOver. I tried using
>
> geneIdsByCategory(hgOver, 'GO id') but this does not give me the list of genes from the selected list ( seltn) but instead that of the universe that are associated with the term
>
>
>
>   -- output of sessionInfo():
>
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-redhat-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=C                 LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] grid      parallel  stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>   [1] gplots_2.11.3        MASS_7.3-27          KernSmooth_2.23-10   caTools_1.14         gdata_2.13.2
>   [6] gtools_3.0.0         plotrix_3.5          GO.db_2.9.0          ath1121501.db_2.9.0  org.At.tair.db_2.9.0
> [11] GOstats_2.26.0       RSQLite_0.11.4       DBI_0.2-5            graph_1.38.3         Category_2.26.0
> [16] AnnotationDbi_1.22.1 affy_1.38.1          Biobase_2.20.0       BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
>   [1] affyio_1.28.0         annotate_1.38.0       AnnotationForge_1.2.2 BiocInstaller_1.10.3  bitops_1.0-5
>   [6] genefilter_1.42.0     GSEABase_1.22.0       IRanges_1.18.0        preprocessCore_1.22.0 RBGL_1.36.2
> [11] splines_3.0.1         stats4_3.0.1          survival_2.37-4       tools_3.0.1           XML_3.98-1.1
> [16] xtable_1.7-1          zlibbioc_1.6.0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list