[BioC] HGNC annotation for use in GOstats
James W. MacDonald
jmacdon at uw.edu
Tue Dec 10 15:33:42 CET 2013
Hi Boel,
On Tuesday, December 10, 2013 5:02:50 AM, Boel Brynedal wrote:
>
> Dear All,
>
> I am attempting a fairly simple thing: performing a hypergeometric test for gene sets using GOstats. My gene set is in HGNC symbols as is my 'gene universe’ vector. But GOstats seems to require entrez IDs. Could anyone point me to a hgnc annotation package that includes entrez IDs? Or any other way to run GOstats using HGNC symbols.
You can convert using the org.Hs.eg.db package.
genemap <- select(org.Hs.eg.db, geneset, "ENTREZID", "SYMBOL")
univmap <- select(org.Hs.eg.db, universe, "ENTREZID", "SYMBOL")
And you will probably get a warning like this:
Warning message:
In .generateExtraRows(tab, keys, jointype) :
'select' resulted in 1:many mapping between keys and return rows
indicating that some of the Hugo symbols mapped to multiple Entrez Gene
IDs, which you will then need to resolve in some fashion. Since this
usually involves many genes, and I am A Hack (tm), I usually do
something super naive like
geneset <- genemap[!duplicated(genemap[,1]), 2]
universe <- univmap[!duplicated(univmap[,1]), 2]
assuming (obviously) that the first instance of a HGNC -> EntrezID
mapping is as good as another. That would also assume that a given HGNC
-> EntrezID mapping will be consistent for both the genemap and
univmap, so you will end up with consistent EntrezIDs for a given Hugo
symbol. There are more sophisticated ways to do this, I am sure.
But note that HGNC attempts to come up with unique gene symbols, but
there are lots of non-unique symbols in the wild, so there is always
the possibility that you will get a symbol -> EntrezID mapping that is
not only a multiple map, but that points to two (or more) completely
different genes. As an example:
> select(org.Hs.eg.db, "HBD", c("ENTREZID","GENENAME"), "SYMBOL")
SYMBOL ENTREZID GENENAME
1 HBD 3045 hemoglobin, delta
2 HBD 100187828 hypophosphatemic bone disease
So you have the added wrinkle of not necessarily knowing which HBD you
might be after.
Best,
Jim
>
> Thank you,
> Bo
>
> params <- new("GOHyperGParams", geneIds=geneset, universeGeneIds=universe, ontology="BP", pvalueCutoff=0.05, conditional=TRUE, testDirection="over")
>> hgOver <- hyperGTest(params)
> Error in eapply(ID2GO(datPkg), function(goids) { :
> error in evaluating the argument 'env' in selecting a method for function 'eapply': Error in function (classes, fdef, mtable) :
> unable to find an inherited method for function ‘cols’ for signature ‘"function"’
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list