[BioC] Which genes are in the GO count column?

Tue Aug 21 15:34:30 CEST 2007

Hi Ingrid,

You're loosing the "named" part of your list when you use unique here:

  sigLL <- unique(unlist(mget(probe, env=illuminaHumanv2ENTREZID, ifnotfound=NA)))

try looking at the object (head(sigLL) maybe) with and without the unique applied and you'll see what the named vector looks like.

how about something like this instead:

   uniqGenes <- findLargest(top2$ID, top2$t, "illuminaHumanv2")
   top2 <- top2[top2$ID %in% uniqGenes,]

check ?findLargest which is part of the genefilter package

or maybe think on something on the lines of:

   myEntrez <- unlist(mget(top2$ID, env=illuminaHumanv2ENTREZID))
   myEntrez <- myEntrez[!duplicated(myEntrez)]

which you can now use as an input for your GOHyperGParams object

In either case I would probably subset the top2 table first by removing any probe that doesn't have a ENTREZID:

	entrezIds <- mget(top2$ID, env=illuminaHumanv2ENTREZID,ifnotfound=NA)
	withEntrez <- top2$ID[!is.na(entrezIds)]
	top2 <- top2[top2$ID %in% withEntrez,]

Hope it helps (since I didn't test the code),

Cei

Ingrid H. G. Østensen wrote:
> Hi
>
> Thanks for tips but I still a bit lost. This is what I have done:
>
>
>  # Lots of QC and limma things. :-)                                                                              
>   
>   probe <- top2[,1] #top2 is from topTable
>
>   sigLL <- unique(unlist(mget(probe, env=illuminaHumanv2ENTREZID, ifnotfound=NA)))
>   sigLL <- as.character(sigLL[!is.na(sigLL)])
>       
>   params <- new("GOHyperGParams", geneIds= sigLL, annotation="illuminaHumanv2", ontology="CC", pvalueCutoff= 0.05,   
>   conditional=FALSE, testDirection="under")
>   hgOver <- hyperGTest(params)
>   res_filNavn <- paste(1, "_GO_summary_CC_under.html", sep = "") 
>   htmlReport(hgOver,file=res_filNavn)
>
>
>   summary(hgOver)
>                GOCCID      Pvalue OddsRatio   ExpCount Count  Size                         Term
>   GO:0044422 GO:0044422 0.002652398 0.6469686  65.993365    46  2345               organelle part
>   GO:0044446 GO:0044446 0.002652398 0.6469686  65.993365    46  2345 intracellular organelle part
>   GO:0005634 GO:0005634 0.002722435 0.7098244 112.259076    88  3989                      nucleus
>   GO:0030529 GO:0030529 0.002771273 0.2913123  13.114246     4   466    ribonucleoprotein complex
>   GO:0044428 GO:0044428 0.008533045 0.5194381  23.920836    13   850                 nuclear part
>   GO:0005840 GO:0005840 0.028727650 0.2791575   6.922971     2   246                     ribosome
>   GO:0005623 GO:0005623 0.037820314 0.7152750 340.717129   331 12107                         cell
>   GO:0044464 GO:0044464 0.038306007 0.7160755 340.688987   331 12106                    cell part
>   GO:0031981 GO:0031981 0.046628778 0.5403759  14.352502     8   510                nuclear lumen
>
>
>    # Find the ID in the count colunm
>    probeSetSummary(hgOver)
>    
>    # This gives me all the genes (some entrez id are dublicatet because of their linkage to different probes) but I get a 
>    warning message:
>    
>    Warning message:
>    The vector of geneIds used to create the GOHyperGParams object was not a named vector.
>    If you want to know the probesets that contributed to this result
>    you need to pass a named vector for geneIds.  
>   
>
>
>
> I have tried to make a named vector but apparently I do not understand what it is, how can I make it work?
> And how can I get the probeSetSummary into a file? Any suggestions?
>   
> Regards,
> Ingrid
>
>  
> "James W. MacDonald" <jmacdon at med.umich.edu> writes:
>
>   
>> Hi Ingrid,
>>
>> Ingrid H. G. Østensen wrote:
>>     
>>> Hi
>>>
>>> I am testing for GO in my dataset and I am able to make html pages
>>> that contains different type of information. But I was wondering if
>>> there is some way to find out which genes are in the Count column? It
>>> might say 2, but not which 2 genes.
>>>       
>> See probeSetSummary() in GOstats and hyperG2annaffy() in affycoretools. 
>> Note that for probeSetSummary() to work correctly you have to pass in a 
>> *named* vector of Entrez Gene IDs, which you can get by using unlist():
>>
>> my.named.probeids <- unlist(mget(probeID.vector, 
>> "chip.annotation.package.name"))
>>     
>
> So assuming the OP is using GOstats, R-2.5.x, and the latest available
> version installed using biocLite...
>
>   Please try
>
>        help("HyperGResult-accessors")
>
> + seth
>
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Cei Abreu-Goodger, PhD

Wellcome Trust Sanger Institute
Computational and Functional Genomics
Wellcome Trust Genome Campus
Hinxton, Cambridge, CB10 1SA, UK

-- 
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE.