[BioC] Which genes are in the GO count column?
Cei Abreu-Goodger
cei at sanger.ac.uk
Tue Aug 21 15:34:30 CEST 2007
Hi Ingrid,
You're loosing the "named" part of your list when you use unique here:
sigLL <- unique(unlist(mget(probe, env=illuminaHumanv2ENTREZID, ifnotfound=NA)))
try looking at the object (head(sigLL) maybe) with and without the unique applied and you'll see what the named vector looks like.
how about something like this instead:
uniqGenes <- findLargest(top2$ID, top2$t, "illuminaHumanv2")
top2 <- top2[top2$ID %in% uniqGenes,]
check ?findLargest which is part of the genefilter package
or maybe think on something on the lines of:
myEntrez <- unlist(mget(top2$ID, env=illuminaHumanv2ENTREZID))
myEntrez <- myEntrez[!duplicated(myEntrez)]
which you can now use as an input for your GOHyperGParams object
In either case I would probably subset the top2 table first by removing any probe that doesn't have a ENTREZID:
entrezIds <- mget(top2$ID, env=illuminaHumanv2ENTREZID,ifnotfound=NA)
withEntrez <- top2$ID[!is.na(entrezIds)]
top2 <- top2[top2$ID %in% withEntrez,]
Hope it helps (since I didn't test the code),
Cei
Ingrid H. G. Østensen wrote:
> Hi
>
> Thanks for tips but I still a bit lost. This is what I have done:
>
>
> # Lots of QC and limma things. :-)
>
> probe <- top2[,1] #top2 is from topTable
>
> sigLL <- unique(unlist(mget(probe, env=illuminaHumanv2ENTREZID, ifnotfound=NA)))
> sigLL <- as.character(sigLL[!is.na(sigLL)])
>
> params <- new("GOHyperGParams", geneIds= sigLL, annotation="illuminaHumanv2", ontology="CC", pvalueCutoff= 0.05,
> conditional=FALSE, testDirection="under")
> hgOver <- hyperGTest(params)
> res_filNavn <- paste(1, "_GO_summary_CC_under.html", sep = "")
> htmlReport(hgOver,file=res_filNavn)
>
>
> summary(hgOver)
> GOCCID Pvalue OddsRatio ExpCount Count Size Term
> GO:0044422 GO:0044422 0.002652398 0.6469686 65.993365 46 2345 organelle part
> GO:0044446 GO:0044446 0.002652398 0.6469686 65.993365 46 2345 intracellular organelle part
> GO:0005634 GO:0005634 0.002722435 0.7098244 112.259076 88 3989 nucleus
> GO:0030529 GO:0030529 0.002771273 0.2913123 13.114246 4 466 ribonucleoprotein complex
> GO:0044428 GO:0044428 0.008533045 0.5194381 23.920836 13 850 nuclear part
> GO:0005840 GO:0005840 0.028727650 0.2791575 6.922971 2 246 ribosome
> GO:0005623 GO:0005623 0.037820314 0.7152750 340.717129 331 12107 cell
> GO:0044464 GO:0044464 0.038306007 0.7160755 340.688987 331 12106 cell part
> GO:0031981 GO:0031981 0.046628778 0.5403759 14.352502 8 510 nuclear lumen
>
>
> # Find the ID in the count colunm
> probeSetSummary(hgOver)
>
> # This gives me all the genes (some entrez id are dublicatet because of their linkage to different probes) but I get a
> warning message:
>
> Warning message:
> The vector of geneIds used to create the GOHyperGParams object was not a named vector.
> If you want to know the probesets that contributed to this result
> you need to pass a named vector for geneIds.
>
>
>
>
> I have tried to make a named vector but apparently I do not understand what it is, how can I make it work?
> And how can I get the probeSetSummary into a file? Any suggestions?
>
> Regards,
> Ingrid
>
>
> "James W. MacDonald" <jmacdon at med.umich.edu> writes:
>
>
>> Hi Ingrid,
>>
>> Ingrid H. G. Østensen wrote:
>>
>>> Hi
>>>
>>> I am testing for GO in my dataset and I am able to make html pages
>>> that contains different type of information. But I was wondering if
>>> there is some way to find out which genes are in the Count column? It
>>> might say 2, but not which 2 genes.
>>>
>> See probeSetSummary() in GOstats and hyperG2annaffy() in affycoretools.
>> Note that for probeSetSummary() to work correctly you have to pass in a
>> *named* vector of Entrez Gene IDs, which you can get by using unlist():
>>
>> my.named.probeids <- unlist(mget(probeID.vector,
>> "chip.annotation.package.name"))
>>
>
> So assuming the OP is using GOstats, R-2.5.x, and the latest available
> version installed using biocLite...
>
> Please try
>
> help("HyperGResult-accessors")
>
> + seth
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Cei Abreu-Goodger, PhD
Wellcome Trust Sanger Institute
Computational and Functional Genomics
Wellcome Trust Genome Campus
Hinxton, Cambridge, CB10 1SA, UK
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Bioconductor
mailing list