[BioC] HGNC annotation for use in GOstats
Boel Brynedal
Boel.Brynedal at ki.se
Wed Dec 11 09:05:56 CET 2013
Hi Jim,
Worked like a charm. Thanks!
10 dec 2013 kl. 15:33 skrev James W. MacDonald <jmacdon at uw.edu>:
> Hi Boel,
>
> On Tuesday, December 10, 2013 5:02:50 AM, Boel Brynedal wrote:
>>
>> Dear All,
>>
>> I am attempting a fairly simple thing: performing a hypergeometric test for gene sets using GOstats. My gene set is in HGNC symbols as is my 'gene universe’ vector. But GOstats seems to require entrez IDs. Could anyone point me to a hgnc annotation package that includes entrez IDs? Or any other way to run GOstats using HGNC symbols.
>
> You can convert using the org.Hs.eg.db package.
>
> genemap <- select(org.Hs.eg.db, geneset, "ENTREZID", "SYMBOL")
> univmap <- select(org.Hs.eg.db, universe, "ENTREZID", "SYMBOL")
>
> And you will probably get a warning like this:
>
> Warning message:
> In .generateExtraRows(tab, keys, jointype) :
> 'select' resulted in 1:many mapping between keys and return rows
>
> indicating that some of the Hugo symbols mapped to multiple Entrez Gene IDs, which you will then need to resolve in some fashion. Since this usually involves many genes, and I am A Hack (tm), I usually do something super naive like
>
> geneset <- genemap[!duplicated(genemap[,1]), 2]
> universe <- univmap[!duplicated(univmap[,1]), 2]
>
> assuming (obviously) that the first instance of a HGNC -> EntrezID mapping is as good as another. That would also assume that a given HGNC -> EntrezID mapping will be consistent for both the genemap and univmap, so you will end up with consistent EntrezIDs for a given Hugo symbol. There are more sophisticated ways to do this, I am sure.
>
> But note that HGNC attempts to come up with unique gene symbols, but there are lots of non-unique symbols in the wild, so there is always the possibility that you will get a symbol -> EntrezID mapping that is not only a multiple map, but that points to two (or more) completely different genes. As an example:
>
>> select(org.Hs.eg.db, "HBD", c("ENTREZID","GENENAME"), "SYMBOL")
> SYMBOL ENTREZID GENENAME
> 1 HBD 3045 hemoglobin, delta
> 2 HBD 100187828 hypophosphatemic bone disease
>
> So you have the added wrinkle of not necessarily knowing which HBD you might be after.
>
> Best,
>
> Jim
>
>
>>
>> Thank you,
>> Bo
>>
>> params <- new("GOHyperGParams", geneIds=geneset, universeGeneIds=universe, ontology="BP", pvalueCutoff=0.05, conditional=TRUE, testDirection="over")
>>> hgOver <- hyperGTest(params)
>> Error in eapply(ID2GO(datPkg), function(goids) { :
>> error in evaluating the argument 'env' in selecting a method for function 'eapply': Error in function (classes, fdef, mtable) :
>> unable to find an inherited method for function ‘cols’ for signature ‘"function"’
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
More information about the Bioconductor
mailing list