[BioC] HGNC annotation for use in GOstats

James W. MacDonald jmacdon at uw.edu
Tue Dec 10 15:33:42 CET 2013


Hi Boel,

On Tuesday, December 10, 2013 5:02:50 AM, Boel Brynedal wrote:
>
> Dear All,
>
> I am attempting a fairly simple thing: performing a hypergeometric test for gene sets using GOstats. My gene set is in HGNC symbols as is my 'gene universe’ vector. But GOstats seems to require entrez IDs. Could anyone point me to a hgnc annotation package that includes entrez IDs? Or any other way to run GOstats using HGNC symbols.

You can convert using the org.Hs.eg.db package.

genemap <- select(org.Hs.eg.db, geneset, "ENTREZID", "SYMBOL")
univmap <- select(org.Hs.eg.db, universe, "ENTREZID", "SYMBOL")

And you will probably get a warning like this:

Warning message:
In .generateExtraRows(tab, keys, jointype) :
  'select' resulted in 1:many mapping between keys and return rows

indicating that some of the Hugo symbols mapped to multiple Entrez Gene 
IDs, which you will then need to resolve in some fashion. Since this 
usually involves many genes, and I am A Hack (tm), I usually do 
something super naive like

geneset <- genemap[!duplicated(genemap[,1]), 2]
universe <- univmap[!duplicated(univmap[,1]), 2]

assuming (obviously) that the first instance of a HGNC -> EntrezID 
mapping is as good as another. That would also assume that a given HGNC 
-> EntrezID mapping will be consistent for both the genemap and 
univmap, so you will end up with consistent EntrezIDs for a given Hugo 
symbol. There are more sophisticated ways to do this, I am sure.

But note that HGNC attempts to come up with unique gene symbols, but 
there are lots of non-unique symbols in the wild, so there is always 
the possibility that you will get a symbol -> EntrezID mapping that is 
not only a multiple map, but that points to two (or more) completely 
different genes. As an example:

> select(org.Hs.eg.db, "HBD", c("ENTREZID","GENENAME"), "SYMBOL")
  SYMBOL  ENTREZID                      GENENAME
1    HBD      3045             hemoglobin, delta
2    HBD 100187828 hypophosphatemic bone disease

So you have the added wrinkle of not necessarily knowing which HBD you 
might be after.

Best,

Jim


>
> Thank you,
> Bo
>
> params <- new("GOHyperGParams", geneIds=geneset, universeGeneIds=universe, ontology="BP", pvalueCutoff=0.05, conditional=TRUE, testDirection="over")
>> hgOver <- hyperGTest(params)
> Error in eapply(ID2GO(datPkg), function(goids) { :
>    error in evaluating the argument 'env' in selecting a method for function 'eapply': Error in function (classes, fdef, mtable)  :
>    unable to find an inherited method for function ‘cols’ for signature ‘"function"’
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list