[BioC] Extracting genes associated to GO terms
James W. MacDonald
jmacdon at uw.edu
Fri Oct 25 16:56:08 CEST 2013
Hi Paul,
On Friday, October 25, 2013 9:22:06 AM, paul [guest] wrote:
>
> I went through the following post https://stat.ethz.ch/pipermail/bioconductor/2013-July/054125.html. I have similar problems in extracting genes associated to GO terms but probesetsummary2 does not work as well.
>
>
> This is what I did:
>
> #### selected list of genes
> structure(c("245062_at", "245107_at", "245137_at", "245178_at",
> "245189_at", "245305_at", "245339_at", "245381_at", "245392_at",
> "245411_at", "245412_at", "245414_at", "245504_at", "245505_at",
> "245516_at", "245558_at", "245603_at", "245611_at", "245705_at",
> "245749_at", "245758_at", "245863_s_at", "245877_at", "246002_at",
> "246119_at", "246203_at", "246216_at", "246252_s_at", "246301_at",
> "246312_at", "246338_s_at", "246448_at", "246499_at", "246535_at",
> "246627_s_at", "246796_at", "246811_at", "246818_at", "246947_at",
> "246984_at", "247033_at", "247132_at", "247134_at", "247210_at",
> "247354_at", "247418_at", "247427_at", "247544_at", "247620_at",
> "247637_at", "247797_at", "247950_at", "248040_at", "248071_at",
> "248094_at", "248104_at", "248167_at", "248178_at", "248364_at",
> "248424_at", "248428_at", "248462_at", "248472_at", "248604_at",
> "248722_at", "248761_at", "248770_at", "248848_at", "249098_at",
> "249102_at", "249104_at", "249199_at", "249359_at", "249601_at",
> "249672_at", "249741_at", "249767_at", "249791_at", "249925_at",
> "249995_at", "250046_at", "250094_at", "250107_at", "250182_at",
> "250200_at", "250230_at", "250239_at", "250253_at", "250285_at",
> "250330_at", "250363_at", "250380_at", "250395_at", "250458_s_at",
> "250474_at", "250512_at", "250570_at", "250602_s_at", "250691_at",
> "250697_at", "250717_at", "250770_at", "250781_at", "250828_at",
> "250906_at", "250919_at", "250975_at", "251008_at", "251015_at",
> "251042_at", "251259_at", "251301_at", "251328_at", "251479_at",
> "251509_at", "251531_at", "251578_at", "251605_at", "251704_at",
> "251824_at", "251862_at", "251866_at", "251918_at", "252209_at",
> "252375_at", "252453_at", "252638_at", "252639_at", "252817_at",
> "252950_at", "252983_at", "252984_at", "253040_at", "253075_at",
> "253142_at", "253160_at", "253224_at", "253255_at", "253386_at",
> "253401_at", "253468_at", "253494_at", "253572_at", "253589_at",
> "253614_at", "253758_at", "253827_at", "253851_at", "253957_at",
> "254015_at", "254048_at", "254121_at", "254155_at", "254197_at",
> "254292_at", "254342_at", "254425_at", "254457_at", "254527_at",
> "254536_at", "254556_at", "254560_at", "254667_at", "254693_at",
> "254715_at", "254729_at", "254774_at", "254797_at", "254847_at",
> "254867_at", "254914_at", "255053_at", "255120_x_at", "255297_x_at",
> "255307_at", "255311_at", "255365_at", "255528_at", "255602_at",
> "255625_at", "255649_at", "255684_at", "255723_at", "255739_at",
> "255866_at", "255873_at", "255880_at", "255898_at", "255925_at",
> "255994_at", "255995_at", "256119_at", "256217_at", "256326_at",
> "256351_at", "256392_at", "256458_at", "256534_at", "256590_at",
> "256724_at", "256746_at", "256823_at", "256906_at", "256919_at",
> "256940_at", "256964_at", "256981_at", "256989_at", "257045_at",
> "257088_at", "257171_at", "257205_at", "257268_at", "257335_at",
> "257371_at", "257476_at", "257592_at", "257633_at", "257735_at",
> "257738_at", "257765_at", "257772_at", "257858_at", "257972_at",
> "258005_at", "258119_at", "258156_at", "258181_at", "258227_at",
> "258269_at", "258289_at", "258557_at", "258565_at", "258574_at",
> "258625_at", "258627_at", "258640_at", "258664_at", "258673_at",
> "258743_s_at", "258793_at", "258794_at", "259036_at", "259182_at",
> "259198_at", "259273_s_at", "259318_at", "259331_at", "259345_s_at",
> "259359_at", "259379_at", "259390_at", "259409_at", "259414_at",
> "259439_at", "259447_s_at", "259463_at", "259485_at", "259498_at",
> "259500_at", "259518_at", "259574_at", "259668_at", "259682_at",
> "259780_at", "259814_at", "259834_at", "259992_at", "260210_at",
> "260328_at", "260334_at", "260335_at", "260347_at", "260393_at",
> "260419_at", "260432_at", "260470_at", "260580_at", "260643_at",
> "260903_at", "260936_at", "261107_at", "261110_at", "261127_at",
> "261154_at", "261161_at", "261229_at", "261269_at", "261378_at",
> "261426_at", "261481_at", "261498_at", "261577_at", "261640_at",
> "261778_at", "261928_at", "261938_at", "261960_at", "261995_at",
> "262130_at", "262202_at", "262220_at", "262225_at", "262250_at",
> "262317_at", "262342_at", "262406_at", "262408_at", "262464_at",
> "262522_at", "262643_at", "262708_at", "262770_at", "262788_at",
> "262821_at", "262825_at", "262904_at", "263005_at", "263063_s_at",
> "263075_at", "263096_at", "263127_at", "263212_at", "263225_at",
> "263240_s_at", "263241_at", "263323_at", "263336_x_at", "263387_at",
> "263570_at", "263586_at", "263593_at", "263805_at", "263946_at",
> "263991_at", "264010_at", "264122_at", "264127_at", "264146_at",
> "264200_at", "264202_at", "264266_at", "264524_at", "264560_at",
> "264571_at", "264587_at", "264633_at", "264656_at", "264698_at",
> "264794_at", "264832_at", "264852_at", "264937_at", "264979_s_at",
> "265014_at", "265238_s_at", "265376_at", "265793_at", "266044_s_at",
> "266048_at", "266065_at", "266126_at", "266207_at", "266268_at",
> "266344_at", "266366_at", "266382_at", "266441_at", "266547_at",
> "266597_at", "266607_at", "266651_at", "266694_at", "266776_at",
> "266816_at", "266881_at", "266911_at", "266967_at", "267009_at",
> "267010_at", "267136_at", "267330_at", "267375_at", "267457_at",
> "267477_at", "267555_at", "267639_at", "AFFX-r2-At-Actin-M_s_at"
> ), .Names = c("245062_at", "245107_at", "245137_at", "245178_at",
> "245189_at", "245305_at", "245339_at", "245381_at", "245392_at",
> "245411_at", "245412_at", "245414_at", "245504_at", "245505_at",
> "245516_at", "245558_at", "245603_at", "245611_at", "245705_at",
> "245749_at", "245758_at", "245863_s_at", "245877_at", "246002_at",
> "246119_at", "246203_at", "246216_at", "246252_s_at", "246301_at",
> "246312_at", "246338_s_at", "246448_at", "246499_at", "246535_at",
> "246627_s_at", "246796_at", "246811_at", "246818_at", "246947_at",
> "246984_at", "247033_at", "247132_at", "247134_at", "247210_at",
> "247354_at", "247418_at", "247427_at", "247544_at", "247620_at",
> "247637_at", "247797_at", "247950_at", "248040_at", "248071_at",
> "248094_at", "248104_at", "248167_at", "248178_at", "248364_at",
> "248424_at", "248428_at", "248462_at", "248472_at", "248604_at",
> "248722_at", "248761_at", "248770_at", "248848_at", "249098_at",
> "249102_at", "249104_at", "249199_at", "249359_at", "249601_at",
> "249672_at", "249741_at", "249767_at", "249791_at", "249925_at",
> "249995_at", "250046_at", "250094_at", "250107_at", "250182_at",
> "250200_at", "250230_at", "250239_at", "250253_at", "250285_at",
> "250330_at", "250363_at", "250380_at", "250395_at", "250458_s_at",
> "250474_at", "250512_at", "250570_at", "250602_s_at", "250691_at",
> "250697_at", "250717_at", "250770_at", "250781_at", "250828_at",
> "250906_at", "250919_at", "250975_at", "251008_at", "251015_at",
> "251042_at", "251259_at", "251301_at", "251328_at", "251479_at",
> "251509_at", "251531_at", "251578_at", "251605_at", "251704_at",
> "251824_at", "251862_at", "251866_at", "251918_at", "252209_at",
> "252375_at", "252453_at", "252638_at", "252639_at", "252817_at",
> "252950_at", "252983_at", "252984_at", "253040_at", "253075_at",
> "253142_at", "253160_at", "253224_at", "253255_at", "253386_at",
> "253401_at", "253468_at", "253494_at", "253572_at", "253589_at",
> "253614_at", "253758_at", "253827_at", "253851_at", "253957_at",
> "254015_at", "254048_at", "254121_at", "254155_at", "254197_at",
> "254292_at", "254342_at", "254425_at", "254457_at", "254527_at",
> "254536_at", "254556_at", "254560_at", "254667_at", "254693_at",
> "254715_at", "254729_at", "254774_at", "254797_at", "254847_at",
> "254867_at", "254914_at", "255053_at", "255120_x_at", "255297_x_at",
> "255307_at", "255311_at", "255365_at", "255528_at", "255602_at",
> "255625_at", "255649_at", "255684_at", "255723_at", "255739_at",
> "255866_at", "255873_at", "255880_at", "255898_at", "255925_at",
> "255994_at", "255995_at", "256119_at", "256217_at", "256326_at",
> "256351_at", "256392_at", "256458_at", "256534_at", "256590_at",
> "256724_at", "256746_at", "256823_at", "256906_at", "256919_at",
> "256940_at", "256964_at", "256981_at", "256989_at", "257045_at",
> "257088_at", "257171_at", "257205_at", "257268_at", "257335_at",
> "257371_at", "257476_at", "257592_at", "257633_at", "257735_at",
> "257738_at", "257765_at", "257772_at", "257858_at", "257972_at",
> "258005_at", "258119_at", "258156_at", "258181_at", "258227_at",
> "258269_at", "258289_at", "258557_at", "258565_at", "258574_at",
> "258625_at", "258627_at", "258640_at", "258664_at", "258673_at",
> "258743_s_at", "258793_at", "258794_at", "259036_at", "259182_at",
> "259198_at", "259273_s_at", "259318_at", "259331_at", "259345_s_at",
> "259359_at", "259379_at", "259390_at", "259409_at", "259414_at",
> "259439_at", "259447_s_at", "259463_at", "259485_at", "259498_at",
> "259500_at", "259518_at", "259574_at", "259668_at", "259682_at",
> "259780_at", "259814_at", "259834_at", "259992_at", "260210_at",
> "260328_at", "260334_at", "260335_at", "260347_at", "260393_at",
> "260419_at", "260432_at", "260470_at", "260580_at", "260643_at",
> "260903_at", "260936_at", "261107_at", "261110_at", "261127_at",
> "261154_at", "261161_at", "261229_at", "261269_at", "261378_at",
> "261426_at", "261481_at", "261498_at", "261577_at", "261640_at",
> "261778_at", "261928_at", "261938_at", "261960_at", "261995_at",
> "262130_at", "262202_at", "262220_at", "262225_at", "262250_at",
> "262317_at", "262342_at", "262406_at", "262408_at", "262464_at",
> "262522_at", "262643_at", "262708_at", "262770_at", "262788_at",
> "262821_at", "262825_at", "262904_at", "263005_at", "263063_s_at",
> "263075_at", "263096_at", "263127_at", "263212_at", "263225_at",
> "263240_s_at", "263241_at", "263323_at", "263336_x_at", "263387_at",
> "263570_at", "263586_at", "263593_at", "263805_at", "263946_at",
> "263991_at", "264010_at", "264122_at", "264127_at", "264146_at",
> "264200_at", "264202_at", "264266_at", "264524_at", "264560_at",
> "264571_at", "264587_at", "264633_at", "264656_at", "264698_at",
> "264794_at", "264832_at", "264852_at", "264937_at", "264979_s_at",
> "265014_at", "265238_s_at", "265376_at", "265793_at", "266044_s_at",
> "266048_at", "266065_at", "266126_at", "266207_at", "266268_at",
> "266344_at", "266366_at", "266382_at", "266441_at", "266547_at",
> "266597_at", "266607_at", "266651_at", "266694_at", "266776_at",
> "266816_at", "266881_at", "266911_at", "266967_at", "267009_at",
> "267010_at", "267136_at", "267330_at", "267375_at", "267457_at",
> "267477_at", "267555_at", "267639_at", "AFFX-r2-At-Actin-M_s_at"
> ))
I assume the above is the object called 'new' that you pass to mget
below?
So if I run your code, I get the same warning. But then I _read_ the
warning, in particular the part that says
"If you want to know the probesets that contributed to this result
either use a named vector for geneIds, or pass a vector of probeset IDs
via sigProbesets."
Which I think is pretty clear (but I should, since I wrote that). Is
there something about that sentence that is confusing? You didn't get
the probesets that contributed to your result returned, so you have to
pass the vector of probeset IDs. Note that this is an altered version
of some code that I posted on the list, so no guarantees that it will
work the same.
Anyway, if I run the code exactly as you did, I can then do some
testing:
sumry2 <- probeSetSummary2(posss8)
xmry2 <- summary(posss8)
Note that htmlReport() calls summary() and then writes the results to
an HTML page, so I am doing exactly what you have done, only without
making the HTML page.
And now
> length(sumry2)
[1] 89
> dim(xmry2)
[1] 89 7
> all(xmry2[,1] %in% names(sumry2))
[1] TRUE
So you are wrong - all the GO terms in the summary table are also in
the output from probeSetSummary2(). However:
> table(sapply(sumry2, function(x) sum(x$selected)))
0
89
So as the warning says, we don't know which of the probesets were
selected. But if we follow the prescription outlined by the warning,
sumry <- probeSetSummary2(posss8, sigProbesets = names(pos8))
> table(sapply(sumry, function(x) sum(x$selected)))
1 2 3 4 6 7 8 9 11 12 15 16 18 19 21 33 64 122
35 13 7 5 3 2 1 4 3 1 8 1 1 1 1 1 1 1
We do get results as advertized.
Best,
Jim
>
>
> #### Universe is the entire set of genes from the ATH1 chip (22810 genes)
> pos8<-unlist(mget(new,ath1121501ACCNUM,ifnotfound=NA))
> poss8 <- new("GOHyperGParams", geneIds = pos8, universeGeneIds =locus, annotation="ath1121501",
> ontology = "BP", pvalueCutoff = 0.05, conditional = FALSE, testDirection = "over")
> posss8 <- hyperGTest(poss8)
> htmlReport(posss8, file = "pos8.html")
>
> When I run this and do probeSetSummary2(posss8)
>
> I get the warning message:
>
> Warning message:
> The vector of geneIds used to create the GOHyperGParamsobject was not a named vector.
> If you want to know theprobesets that contributed to this result either usea named vector for geneIds, or pass a vector of probeset IDs via sigProbesets.
>
> And the result has GO terms which were not present in the HTML table.
>
>
>
>
> -- output of sessionInfo():
>
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-redhat-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
> [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C
> [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] ath1121501.db_2.9.0 org.At.tair.db_2.9.0 GO.db_2.9.0 GOstats_2.26.0
> [5] RSQLite_0.11.4 DBI_0.2-5 graph_1.38.3 Category_2.26.0
> [9] AnnotationDbi_1.22.6 Biobase_2.20.1 BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
> [1] annotate_1.38.0 AnnotationForge_1.2.2 genefilter_1.42.0 GSEABase_1.22.0
> [5] IRanges_1.18.4 RBGL_1.36.2 splines_3.0.1 stats4_3.0.1
> [9] survival_2.37-4 tools_3.0.1 XML_3.98-1.1 xtable_1.7-1
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list