[BioC] Which genes are in the GO count column?
James W. MacDonald
jmacdon at med.umich.edu
Tue Aug 21 15:56:15 CEST 2007
Hi Ingrid,
Ingrid H. G. Østensen wrote:
> Hi
>
> Thanks for tips but I still a bit lost. This is what I have done:
>
>
> # Lots of QC and limma things.
> :-)
>
> probe <- top2[,1] #top2 is from topTable
>
> sigLL <- unique(unlist(mget(probe, env=illuminaHumanv2ENTREZID,
> ifnotfound=NA)))
It appears that unique() strips off the names, so you should probably
substitute something like this:
sigLL <- unlist(mget(probe, illuminaHumanv2ENTREZID))
sigLL <- sigLL[!duplicated(sigLL)]
> sigLL <- as.character(sigLL[!is.na(sigLL)])
>
> params <- new("GOHyperGParams", geneIds= sigLL,
> annotation="illuminaHumanv2", ontology="CC", pvalueCutoff= 0.05,
> conditional=FALSE, testDirection="under")
> hgOver <- hyperGTest(params)
> res_filNavn <- paste(1, "_GO_summary_CC_under.html", sep = "")
> htmlReport(hgOver,file=res_filNavn)
>
>
> summary(hgOver)
> GOCCID Pvalue OddsRatio ExpCount Count
> Size Term
> GO:0044422 GO:0044422 0.002652398 0.6469686 65.993365 46
> 2345 organelle part
> GO:0044446 GO:0044446 0.002652398 0.6469686 65.993365 46 2345
> intracellular organelle part
> GO:0005634 GO:0005634 0.002722435 0.7098244 112.259076 88
> 3989 nucleus
> GO:0030529 GO:0030529 0.002771273 0.2913123 13.114246 4 466
> ribonucleoprotein complex
> GO:0044428 GO:0044428 0.008533045 0.5194381 23.920836 13
> 850 nuclear part
> GO:0005840 GO:0005840 0.028727650 0.2791575 6.922971 2
> 246 ribosome
> GO:0005623 GO:0005623 0.037820314 0.7152750 340.717129 331
> 12107 cell
> GO:0044464 GO:0044464 0.038306007 0.7160755 340.688987 331
> 12106 cell part
> GO:0031981 GO:0031981 0.046628778 0.5403759 14.352502 8
> 510 nuclear lumen
>
>
> # Find the ID in the count colunm
> probeSetSummary(hgOver)
>
> # This gives me all the genes (some entrez id are dublicatet because
> of their linkage to different probes) but I get a
> warning message:
>
> Warning message:
> The vector of geneIds used to create the GOHyperGParams object was
> not a named vector.
> If you want to know the probesets that contributed to this result
> you need to pass a named vector for geneIds.
>
>
>
>
> I have tried to make a named vector but apparently I do not understand
> what it is, how can I make it work?
> And how can I get the probeSetSummary into a file? Any suggestions?
Sure. As I mentioned in my first email, you can use hyperG2annaffy() in
affycoretools. Alternatively you can always use write.table().
Best,
Jim
>
> Regards,
> Ingrid
>
>
> "James W. MacDonald" <jmacdon at med.umich.edu> writes:
>
> > Hi Ingrid,
> >
> > Ingrid H. G. Østensen wrote:
> >> Hi
> >>
> >> I am testing for GO in my dataset and I am able to make html pages
> >> that contains different type of information. But I was wondering if
> >> there is some way to find out which genes are in the Count column? It
> >> might say 2, but not which 2 genes.
> >
> > See probeSetSummary() in GOstats and hyperG2annaffy() in affycoretools.
> > Note that for probeSetSummary() to work correctly you have to pass in a
> > *named* vector of Entrez Gene IDs, which you can get by using unlist():
> >
> > my.named.probeids <- unlist(mget(probeID.vector,
> > "chip.annotation.package.name"))
>
> So assuming the OP is using GOstats, R-2.5.x, and the latest available
> version installed using biocLite...
>
> Please try
>
> help("HyperGResult-accessors")
>
> + seth
>
> --
> Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
> BioC: http://bioconductor.org/
> Blog: http://userprimary.net/user/
>
--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
More information about the Bioconductor
mailing list