[BioC] goseq analysis - extracting list of my genes which are DE in the enriched GO category
Helen Wright
hlwright.uk at gmail.com
Fri Sep 9 15:30:23 CEST 2011
Hello everyone
This is my first message to Bioconductor, I hope someone out there can
help me. I have been working through the goseq vignette and adapting
it to my own dataset. I have got my list of enriched go categories,
which looks like this:
> head(GO.wall)
category over_represented_pvalue
under_represented_pvalue
3217 GO:0006954 1.496746e-10 1
3548 GO:0008009 4.502492e-10 1
9081 GO:0042379 1.010877e-09 1
2160 GO:0005126 1.033292e-09 1
2159 GO:0005125 1.534272e-09 1
4251 GO:0009611 1.983311e-09 1
I can get a list of the genes which are contained in each GO category using :
> ensembl.gene.ids=get("GO:0006954", org.Hs.egGO2ALLEGS)
> unlist(mget(unique(ensembl.gene.ids), org.Hs.egGENENAME))
but this just gives me a list of ALL the genes attached to this GO
category. I would like to find out which of my DE genes are in this
GO category.
I have referred to a previous Bioconductor post
<https://stat.ethz.ch/pipermail/bioconductor/attachments/20110308/92b27df4/attachment.pl>
which shows how to get a list of all the genes in each GO category,
like this:
> data(genes)
> genes2go=getgo(names(genes),'hg19','ensGene')
> go2genes=goseq:::reversemapping(genes2go)
Here is an extract from go2genes:
> go2genes$'GO:0006954'
[1] "ENSG00000088812" "ENSG00000078747" "ENSG00000172216"
"ENSG00000196839" "ENSG00000243646" "ENSG00000128271"
"ENSG00000100014" "ENSG00000128335" "ENSG00000100292"
"ENSG00000128284"
[11] "ENSG00000240972" "ENSG00000117215" "ENSG00000050628"
"ENSG00000187554" "ENSG00000188257" "ENSG00000159377"
"ENSG00000117525" "ENSG00000116288" "ENSG00000158769"
"ENSG00000160712"
[21] "ENSG00000135744" "ENSG00000117335" "ENSG00000163435"
"ENSG00000177807" "ENSG00000117586" "ENSG00000078900"
"ENSG00000178585" "ENSG00000168297" "ENSG00000134070"
"ENSG00000144802"
[31] "ENSG00000113889" "ENSG00000172936" "ENSG00000074416"
"ENSG00000085276" "ENSG00000121807" "ENSG00000157017"
"ENSG00000233276" "ENSG00000175040" "ENSG00000072274"
"ENSG00000132170"
> class(go2genes)
[1] "list"
I have my list of genes in a data.frame called "pwf" which looks like this"
> head(pwf)
DEgenes bias.data pwf
ENSG00000000003 0 1563.5 0.003486092
ENSG00000000005 0 1353.0 0.003287926
ENSG00000000419 0 1079.5 0.002955906
ENSG00000000457 0 3128.5 0.003943449
ENSG00000000460 0 3126.0 0.003943445
ENSG00000000938 0 2437.0 0.003896895
The number 1 in pwf$DEgenes indicates a differentially expressed gene.
So what (I think) I need to do is:
1) get the GOTERM (e.g. GO:0006954) from GO.wall$category
2) look in go2genes for $'GO:0006954' and get the list of all genes
associated with this GOTERM
3) look for these genes in rownames(pwf), and if pwf$DEgenes==1 then
print GO.wall$category, GO.wall$over_represented_pvalue, rownames(pwf)
I have no idea if this is possible. The sort of output I would expect
would be something like (GOTERM, p-value, list of genes from this
category enriched in my dataset):
GO:0006954 1.496746e-10
"ENSG00000078747" "ENSG00000177807" "ENSG00000117586"
"ENSG00000078900" "ENSG00000175040" "ENSG00000072274"
"ENSG00000132170"
Thank you for taking the time to read my email. I hope someone out
there will be able to help me.
Many thanks and have a nice weekend everybody.
Helen Wright
University of Liverpool, UK
> sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] C/en_US.UTF-8/C/C/C/C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] GO.db_2.5.0 org.Hs.eg.db_2.5.0 RSQLite_0.9-4
DBI_0.2-5 AnnotationDbi_1.14.1 edgeR_2.2.5
goseq_1.4.0 geneLenDataBase_0.99.7
[9] BiasedUrn_1.04 DESeq_1.4.1 locfit_1.5-6
lattice_0.19-30 akima_0.5-4 Biobase_2.12.2
Rsamtools_1.4.3 Biostrings_2.20.3
[17] GenomicFeatures_1.4.4 GenomicRanges_1.4.8 IRanges_1.10.6
loaded via a namespace (and not attached):
[1] BSgenome_1.20.0 Matrix_0.999375-50 RColorBrewer_1.0-5
RCurl_1.6-10 XML_3.4-2 annotate_1.30.1 biomaRt_2.8.1
genefilter_1.34.0 geneplotter_1.30.0
[10] grid_2.13.1 limma_3.8.3 mgcv_1.7-6
nlme_3.1-101 rtracklayer_1.12.4 splines_2.13.1
survival_2.36-9 tools_2.13.1 xtable_1.5-6
More information about the Bioconductor
mailing list