[BioC] Extracting genes associated to GO terms

James W. MacDonald jmacdon at uw.edu
Fri Oct 25 16:56:08 CEST 2013


Hi Paul,

On Friday, October 25, 2013 9:22:06 AM, paul [guest] wrote:
>
> I went through the following post https://stat.ethz.ch/pipermail/bioconductor/2013-July/054125.html. I have similar problems in extracting genes associated to GO terms but probesetsummary2 does not work as well.
>
>
> This is what I did:
>
> #### selected list of genes
> structure(c("245062_at", "245107_at", "245137_at", "245178_at",
> "245189_at", "245305_at", "245339_at", "245381_at", "245392_at",
> "245411_at", "245412_at", "245414_at", "245504_at", "245505_at",
> "245516_at", "245558_at", "245603_at", "245611_at", "245705_at",
> "245749_at", "245758_at", "245863_s_at", "245877_at", "246002_at",
> "246119_at", "246203_at", "246216_at", "246252_s_at", "246301_at",
> "246312_at", "246338_s_at", "246448_at", "246499_at", "246535_at",
> "246627_s_at", "246796_at", "246811_at", "246818_at", "246947_at",
> "246984_at", "247033_at", "247132_at", "247134_at", "247210_at",
> "247354_at", "247418_at", "247427_at", "247544_at", "247620_at",
> "247637_at", "247797_at", "247950_at", "248040_at", "248071_at",
> "248094_at", "248104_at", "248167_at", "248178_at", "248364_at",
> "248424_at", "248428_at", "248462_at", "248472_at", "248604_at",
> "248722_at", "248761_at", "248770_at", "248848_at", "249098_at",
> "249102_at", "249104_at", "249199_at", "249359_at", "249601_at",
> "249672_at", "249741_at", "249767_at", "249791_at", "249925_at",
> "249995_at", "250046_at", "250094_at", "250107_at", "250182_at",
> "250200_at", "250230_at", "250239_at", "250253_at", "250285_at",
> "250330_at", "250363_at", "250380_at", "250395_at", "250458_s_at",
> "250474_at", "250512_at", "250570_at", "250602_s_at", "250691_at",
> "250697_at", "250717_at", "250770_at", "250781_at", "250828_at",
> "250906_at", "250919_at", "250975_at", "251008_at", "251015_at",
> "251042_at", "251259_at", "251301_at", "251328_at", "251479_at",
> "251509_at", "251531_at", "251578_at", "251605_at", "251704_at",
> "251824_at", "251862_at", "251866_at", "251918_at", "252209_at",
> "252375_at", "252453_at", "252638_at", "252639_at", "252817_at",
> "252950_at", "252983_at", "252984_at", "253040_at", "253075_at",
> "253142_at", "253160_at", "253224_at", "253255_at", "253386_at",
> "253401_at", "253468_at", "253494_at", "253572_at", "253589_at",
> "253614_at", "253758_at", "253827_at", "253851_at", "253957_at",
> "254015_at", "254048_at", "254121_at", "254155_at", "254197_at",
> "254292_at", "254342_at", "254425_at", "254457_at", "254527_at",
> "254536_at", "254556_at", "254560_at", "254667_at", "254693_at",
> "254715_at", "254729_at", "254774_at", "254797_at", "254847_at",
> "254867_at", "254914_at", "255053_at", "255120_x_at", "255297_x_at",
> "255307_at", "255311_at", "255365_at", "255528_at", "255602_at",
> "255625_at", "255649_at", "255684_at", "255723_at", "255739_at",
> "255866_at", "255873_at", "255880_at", "255898_at", "255925_at",
> "255994_at", "255995_at", "256119_at", "256217_at", "256326_at",
> "256351_at", "256392_at", "256458_at", "256534_at", "256590_at",
> "256724_at", "256746_at", "256823_at", "256906_at", "256919_at",
> "256940_at", "256964_at", "256981_at", "256989_at", "257045_at",
> "257088_at", "257171_at", "257205_at", "257268_at", "257335_at",
> "257371_at", "257476_at", "257592_at", "257633_at", "257735_at",
> "257738_at", "257765_at", "257772_at", "257858_at", "257972_at",
> "258005_at", "258119_at", "258156_at", "258181_at", "258227_at",
> "258269_at", "258289_at", "258557_at", "258565_at", "258574_at",
> "258625_at", "258627_at", "258640_at", "258664_at", "258673_at",
> "258743_s_at", "258793_at", "258794_at", "259036_at", "259182_at",
> "259198_at", "259273_s_at", "259318_at", "259331_at", "259345_s_at",
> "259359_at", "259379_at", "259390_at", "259409_at", "259414_at",
> "259439_at", "259447_s_at", "259463_at", "259485_at", "259498_at",
> "259500_at", "259518_at", "259574_at", "259668_at", "259682_at",
> "259780_at", "259814_at", "259834_at", "259992_at", "260210_at",
> "260328_at", "260334_at", "260335_at", "260347_at", "260393_at",
> "260419_at", "260432_at", "260470_at", "260580_at", "260643_at",
> "260903_at", "260936_at", "261107_at", "261110_at", "261127_at",
> "261154_at", "261161_at", "261229_at", "261269_at", "261378_at",
> "261426_at", "261481_at", "261498_at", "261577_at", "261640_at",
> "261778_at", "261928_at", "261938_at", "261960_at", "261995_at",
> "262130_at", "262202_at", "262220_at", "262225_at", "262250_at",
> "262317_at", "262342_at", "262406_at", "262408_at", "262464_at",
> "262522_at", "262643_at", "262708_at", "262770_at", "262788_at",
> "262821_at", "262825_at", "262904_at", "263005_at", "263063_s_at",
> "263075_at", "263096_at", "263127_at", "263212_at", "263225_at",
> "263240_s_at", "263241_at", "263323_at", "263336_x_at", "263387_at",
> "263570_at", "263586_at", "263593_at", "263805_at", "263946_at",
> "263991_at", "264010_at", "264122_at", "264127_at", "264146_at",
> "264200_at", "264202_at", "264266_at", "264524_at", "264560_at",
> "264571_at", "264587_at", "264633_at", "264656_at", "264698_at",
> "264794_at", "264832_at", "264852_at", "264937_at", "264979_s_at",
> "265014_at", "265238_s_at", "265376_at", "265793_at", "266044_s_at",
> "266048_at", "266065_at", "266126_at", "266207_at", "266268_at",
> "266344_at", "266366_at", "266382_at", "266441_at", "266547_at",
> "266597_at", "266607_at", "266651_at", "266694_at", "266776_at",
> "266816_at", "266881_at", "266911_at", "266967_at", "267009_at",
> "267010_at", "267136_at", "267330_at", "267375_at", "267457_at",
> "267477_at", "267555_at", "267639_at", "AFFX-r2-At-Actin-M_s_at"
> ), .Names = c("245062_at", "245107_at", "245137_at", "245178_at",
> "245189_at", "245305_at", "245339_at", "245381_at", "245392_at",
> "245411_at", "245412_at", "245414_at", "245504_at", "245505_at",
> "245516_at", "245558_at", "245603_at", "245611_at", "245705_at",
> "245749_at", "245758_at", "245863_s_at", "245877_at", "246002_at",
> "246119_at", "246203_at", "246216_at", "246252_s_at", "246301_at",
> "246312_at", "246338_s_at", "246448_at", "246499_at", "246535_at",
> "246627_s_at", "246796_at", "246811_at", "246818_at", "246947_at",
> "246984_at", "247033_at", "247132_at", "247134_at", "247210_at",
> "247354_at", "247418_at", "247427_at", "247544_at", "247620_at",
> "247637_at", "247797_at", "247950_at", "248040_at", "248071_at",
> "248094_at", "248104_at", "248167_at", "248178_at", "248364_at",
> "248424_at", "248428_at", "248462_at", "248472_at", "248604_at",
> "248722_at", "248761_at", "248770_at", "248848_at", "249098_at",
> "249102_at", "249104_at", "249199_at", "249359_at", "249601_at",
> "249672_at", "249741_at", "249767_at", "249791_at", "249925_at",
> "249995_at", "250046_at", "250094_at", "250107_at", "250182_at",
> "250200_at", "250230_at", "250239_at", "250253_at", "250285_at",
> "250330_at", "250363_at", "250380_at", "250395_at", "250458_s_at",
> "250474_at", "250512_at", "250570_at", "250602_s_at", "250691_at",
> "250697_at", "250717_at", "250770_at", "250781_at", "250828_at",
> "250906_at", "250919_at", "250975_at", "251008_at", "251015_at",
> "251042_at", "251259_at", "251301_at", "251328_at", "251479_at",
> "251509_at", "251531_at", "251578_at", "251605_at", "251704_at",
> "251824_at", "251862_at", "251866_at", "251918_at", "252209_at",
> "252375_at", "252453_at", "252638_at", "252639_at", "252817_at",
> "252950_at", "252983_at", "252984_at", "253040_at", "253075_at",
> "253142_at", "253160_at", "253224_at", "253255_at", "253386_at",
> "253401_at", "253468_at", "253494_at", "253572_at", "253589_at",
> "253614_at", "253758_at", "253827_at", "253851_at", "253957_at",
> "254015_at", "254048_at", "254121_at", "254155_at", "254197_at",
> "254292_at", "254342_at", "254425_at", "254457_at", "254527_at",
> "254536_at", "254556_at", "254560_at", "254667_at", "254693_at",
> "254715_at", "254729_at", "254774_at", "254797_at", "254847_at",
> "254867_at", "254914_at", "255053_at", "255120_x_at", "255297_x_at",
> "255307_at", "255311_at", "255365_at", "255528_at", "255602_at",
> "255625_at", "255649_at", "255684_at", "255723_at", "255739_at",
> "255866_at", "255873_at", "255880_at", "255898_at", "255925_at",
> "255994_at", "255995_at", "256119_at", "256217_at", "256326_at",
> "256351_at", "256392_at", "256458_at", "256534_at", "256590_at",
> "256724_at", "256746_at", "256823_at", "256906_at", "256919_at",
> "256940_at", "256964_at", "256981_at", "256989_at", "257045_at",
> "257088_at", "257171_at", "257205_at", "257268_at", "257335_at",
> "257371_at", "257476_at", "257592_at", "257633_at", "257735_at",
> "257738_at", "257765_at", "257772_at", "257858_at", "257972_at",
> "258005_at", "258119_at", "258156_at", "258181_at", "258227_at",
> "258269_at", "258289_at", "258557_at", "258565_at", "258574_at",
> "258625_at", "258627_at", "258640_at", "258664_at", "258673_at",
> "258743_s_at", "258793_at", "258794_at", "259036_at", "259182_at",
> "259198_at", "259273_s_at", "259318_at", "259331_at", "259345_s_at",
> "259359_at", "259379_at", "259390_at", "259409_at", "259414_at",
> "259439_at", "259447_s_at", "259463_at", "259485_at", "259498_at",
> "259500_at", "259518_at", "259574_at", "259668_at", "259682_at",
> "259780_at", "259814_at", "259834_at", "259992_at", "260210_at",
> "260328_at", "260334_at", "260335_at", "260347_at", "260393_at",
> "260419_at", "260432_at", "260470_at", "260580_at", "260643_at",
> "260903_at", "260936_at", "261107_at", "261110_at", "261127_at",
> "261154_at", "261161_at", "261229_at", "261269_at", "261378_at",
> "261426_at", "261481_at", "261498_at", "261577_at", "261640_at",
> "261778_at", "261928_at", "261938_at", "261960_at", "261995_at",
> "262130_at", "262202_at", "262220_at", "262225_at", "262250_at",
> "262317_at", "262342_at", "262406_at", "262408_at", "262464_at",
> "262522_at", "262643_at", "262708_at", "262770_at", "262788_at",
> "262821_at", "262825_at", "262904_at", "263005_at", "263063_s_at",
> "263075_at", "263096_at", "263127_at", "263212_at", "263225_at",
> "263240_s_at", "263241_at", "263323_at", "263336_x_at", "263387_at",
> "263570_at", "263586_at", "263593_at", "263805_at", "263946_at",
> "263991_at", "264010_at", "264122_at", "264127_at", "264146_at",
> "264200_at", "264202_at", "264266_at", "264524_at", "264560_at",
> "264571_at", "264587_at", "264633_at", "264656_at", "264698_at",
> "264794_at", "264832_at", "264852_at", "264937_at", "264979_s_at",
> "265014_at", "265238_s_at", "265376_at", "265793_at", "266044_s_at",
> "266048_at", "266065_at", "266126_at", "266207_at", "266268_at",
> "266344_at", "266366_at", "266382_at", "266441_at", "266547_at",
> "266597_at", "266607_at", "266651_at", "266694_at", "266776_at",
> "266816_at", "266881_at", "266911_at", "266967_at", "267009_at",
> "267010_at", "267136_at", "267330_at", "267375_at", "267457_at",
> "267477_at", "267555_at", "267639_at", "AFFX-r2-At-Actin-M_s_at"
> ))

I assume the above is the object called 'new' that you pass to mget 
below?

So if I run your code, I get the same warning. But then I _read_ the 
warning, in particular the part that says

"If you want to know the probesets that contributed to this result 
either use a named vector for geneIds, or pass a vector of probeset IDs 
via sigProbesets."

Which I think is pretty clear (but I should, since I wrote that). Is 
there something about that sentence that is confusing? You didn't get 
the probesets that contributed to your result returned, so you have to 
pass the vector of probeset IDs. Note that this is an altered version 
of some code that I posted on the list, so no guarantees that it will 
work the same.

Anyway, if I run the code exactly as you did, I can then do some 
testing:

sumry2 <- probeSetSummary2(posss8)
xmry2 <- summary(posss8)

Note that htmlReport() calls summary() and then writes the results to 
an HTML page, so I am doing exactly what you have done, only without 
making the HTML page.

And now

> length(sumry2)
[1] 89
> dim(xmry2)
[1] 89  7
> all(xmry2[,1] %in% names(sumry2))
[1] TRUE

So you are wrong - all the GO terms in the summary table are also in 
the output from probeSetSummary2(). However:

> table(sapply(sumry2, function(x) sum(x$selected)))

 0
89

So as the warning says, we don't know which of the probesets were 
selected. But if we follow the prescription outlined by the warning,

sumry <- probeSetSummary2(posss8, sigProbesets = names(pos8))

> table(sapply(sumry, function(x) sum(x$selected)))

  1   2   3   4   6   7   8   9  11  12  15  16  18  19  21  33  64 122
 35  13   7   5   3   2   1   4   3   1   8   1   1   1   1   1   1   1

We do get results as advertized.

Best,

Jim



>
>
> #### Universe is the entire set of genes from the ATH1 chip (22810 genes)
> pos8<-unlist(mget(new,ath1121501ACCNUM,ifnotfound=NA))
> poss8 <- new("GOHyperGParams", geneIds = pos8, universeGeneIds =locus, annotation="ath1121501",
>               ontology = "BP", pvalueCutoff = 0.05, conditional = FALSE, testDirection = "over")
> posss8 <- hyperGTest(poss8)
> htmlReport(posss8, file = "pos8.html")
>
> When I run this and do probeSetSummary2(posss8)
>
> I get the warning message:
>
> Warning message:
> The vector of geneIds used to create the GOHyperGParamsobject was not a named vector.
> If you want to know theprobesets that contributed to this result either usea named vector for geneIds, or pass a vector of probeset IDs via sigProbesets.
>
> And the result has GO terms which were not present in the HTML table.
>
>
>
>
>   -- output of sessionInfo():
>
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-redhat-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8
>   [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=C                 LC_NAME=C                  LC_ADDRESS=C
> [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>   [1] ath1121501.db_2.9.0  org.At.tair.db_2.9.0 GO.db_2.9.0          GOstats_2.26.0
>   [5] RSQLite_0.11.4       DBI_0.2-5            graph_1.38.3         Category_2.26.0
>   [9] AnnotationDbi_1.22.6 Biobase_2.20.1       BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
>   [1] annotate_1.38.0       AnnotationForge_1.2.2 genefilter_1.42.0     GSEABase_1.22.0
>   [5] IRanges_1.18.4        RBGL_1.36.2           splines_3.0.1         stats4_3.0.1
>   [9] survival_2.37-4       tools_3.0.1           XML_3.98-1.1          xtable_1.7-1
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list