[BioC] Which genes are in the GO count column?
James W. MacDonald
jmacdon at med.umich.edu
Tue Aug 21 17:48:33 CEST 2007
Hi Ingrid,
Ingrid H. G. Østensen wrote:
> Hi
>
> I tried:
>
> sigLL <- unlist(mget(probe, illuminaHumanv2ENTREZID))
> sigLL <- sigLL[!duplicated(sigLL)]
>
> insted of:
>
> sigLL <- unique(unlist(mget(probe,
> env=illuminaHumanv2ENTREZID,ifnotfound=NA)))
>
> and then:
>
> probeSetSummary(hgOver, sigLL)
>
> Now I get (all?) the GO, not only the ones found in summary(hgOver), and
> I still get the error message regrading the named vector.
Did you read the help page for probeSetSummary()? If so, you should have
seen this:
Value:
A 'list' of 'data.frame'. Each element of the list corresponds to
one of the GO terms (the term is provides as the name of the
element). Each 'data.frame' has three columns: the Entrez Gene ID
('EntrezID'), the probe set ID ('ProbeSetID'), and a 0/1 indicator
of whether the probe set ID was provided as part of the initial
input ('selected')
Which should have helped explain the output you get. Anyway, I don't see
the same problems you do. Are you using a current version of R/BioC (the
posting guide _does_ ask you to give the output of sessionInfo()).
> sel <- unlist(mget(ls(illuminaHumanv2ENTREZID)[sample(1:48000,
300)],illuminaHumanv2ENTREZID, ifnotfound=NA))
> sel <- sel[!is.na(sel)]
> sel <- sel[!duplicated(sel)]
> univ <- unique(getLL(ls(illuminaHumanv2ENTREZID), "illuminaHumanv2"))
> p <- new("GOHyperGParams", geneIds=sel, universeGeneIds=univ,
ontology="BP", annotation="illuminaHumanv2", conditional=T)
> hyp <- hyperGTest(p)
> ps <- probeSetSummary(hyp, categorySize=10)
> ps[[1]]
EntrezID ProbeSetID selected
ILMN_2110 841 ILMN_2110 0
ILMN_29186 841 ILMN_29186 1
ILMN_29639 841 ILMN_29639 0
ILMN_5213 133396 ILMN_5213 1
> sessionInfo()
R version 2.5.1 (2007-06-27)
i386-pc-mingw32
locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] "splines" "tools" "stats"
[4] "graphics" "grDevices" "datasets"
[7] "utils" "methods" "base"
other attached packages:
illuminaHumanv2 hgu133plus2 GOstats
"1.2.0" "1.16.0" "2.2.6"
Category Matrix lattice
"2.2.3" "0.999375-0" "0.15-11"
genefilter survival KEGG
"1.14.1" "2.32" "1.16.0"
RBGL annotate Biobase
"1.12.0" "1.14.1" "1.14.0"
GO graph rcompgen
"1.16.0" "1.14.2" "0.1-13"
Best,
Jim
>
> And writing probeSetSummary object to file with write or write.table
> does not work.
>
> Maybe I have to look into the affycoretool package to morrow. :-)
>
> Regards,
> Ingrid
> Hi Ingrid,
>
> Ingrid H. G. Østensen wrote:
> > Hi
> >
> > Thanks for tips but I still a bit lost. This is what I have done:
> >
> >
> > # Lots of QC and limma things.
> >
> :-)
> >
> > probe <- top2[,1] #top2 is from topTable
> >
> > sigLL <- unique(unlist(mget(probe, env=illuminaHumanv2ENTREZID,
> > ifnotfound=NA)))
>
> It appears that unique() strips off the names, so you should probably
> substitute something like this:
>
> sigLL <- unlist(mget(probe, illuminaHumanv2ENTREZID))
> sigLL <- sigLL[!duplicated(sigLL)]
>
> > sigLL <- as.character(sigLL[!is.na(sigLL)])
> >
> > params <- new("GOHyperGParams", geneIds= sigLL,
> > annotation="illuminaHumanv2", ontology="CC", pvalueCutoff= 0.05,
> > conditional=FALSE, testDirection="under")
> > hgOver <- hyperGTest(params)
> > res_filNavn <- paste(1, "_GO_summary_CC_under.html", sep = "")
> > htmlReport(hgOver,file=res_filNavn)
> >
> >
> > summary(hgOver)
> > GOCCID Pvalue OddsRatio ExpCount Count
> > Size Term
> > GO:0044422 GO:0044422 0.002652398 0.6469686 65.993365 46
> > 2345 organelle part
> > GO:0044446 GO:0044446 0.002652398 0.6469686 65.993365 46 2345
> > intracellular organelle part
> > GO:0005634 GO:0005634 0.002722435 0.7098244 112.259076 88
> > 3989 nucleus
> > GO:0030529 GO:0030529 0.002771273 0.2913123 13.114246 4 466
> > ribonucleoprotein complex
> > GO:0044428 GO:0044428 0.008533045 0.5194381 23.920836 13
> > 850 nuclear part
> > GO:0005840 GO:0005840 0.028727650 0.2791575 6.922971 2
> > 246 ribosome
> > GO:0005623 GO:0005623 0.037820314 0.7152750 340.717129 331
> > 12107 cell
> > GO:0044464 GO:0044464 0.038306007 0.7160755 340.688987 331
> > 12106 cell part
> > GO:0031981 GO:0031981 0.046628778 0.5403759 14.352502 8
> > 510 nuclear lumen
> >
> >
> > # Find the ID in the count colunm
> > probeSetSummary(hgOver)
> >
> > # This gives me all the genes (some entrez id are dublicatet because
> > of their linkage to different probes) but I get a
> > warning message:
> >
> > Warning message:
> > The vector of geneIds used to create the GOHyperGParams object was
> > not a named vector.
> > If you want to know the probesets that contributed to this result
> > you need to pass a named vector for geneIds.
> >
> >
> >
> >
> > I have tried to make a named vector but apparently I do not understand
> > what it is, how can I make it work?
> > And how can I get the probeSetSummary into a file? Any suggestions?
>
> Sure. As I mentioned in my first email, you can use hyperG2annaffy() in
> affycoretools. Alternatively you can always use write.table().
>
> Best,
>
> Jim
>
>
> >
> > Regards,
> > Ingrid
> >
> >
> > "James W. MacDonald" <jmacdon at med.umich.edu> writes:
> >
> > > Hi Ingrid,
> > >
> > > Ingrid H. G. Østensen wrote:
> > >> Hi
> > >>
> > >> I am testing for GO in my dataset and I am able to make html pages
> > >> that contains different type of information. But I was wondering if
> > >> there is some way to find out which genes are in the Count column? It
> > >> might say 2, but not which 2 genes.
> > >
> > > See probeSetSummary() in GOstats and hyperG2annaffy() in
> affycoretools.
> > > Note that for probeSetSummary() to work correctly you have to pass
> in a
> > > *named* vector of Entrez Gene IDs, which you can get by using
> unlist():
> > >
> > > my.named.probeids <- unlist(mget(probeID.vector,
> > > "chip.annotation.package.name"))
> >
> > So assuming the OP is using GOstats, R-2.5.x, and the latest available
> > version installed using biocLite...
> >
> > Please try
> >
> > help("HyperGResult-accessors")
> >
> > + seth
> >
> > --
> > Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research
> Center
> > BioC: http://bioconductor.org/
> > Blog: http://userprimary.net/user/
> >
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> Affymetrix and cDNA Microarray Core
> University of Michigan Cancer Center
> 1500 E. Medical Center Drive
> 7410 CCGC
> Ann Arbor MI 48109
> 734-647-5623
>
--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
More information about the Bioconductor
mailing list