[BioC] Can my problem be addressed with Bioconductor?

James W. MacDonald jmacdon at uw.edu
Tue Jun 17 20:39:51 CEST 2014


Hi Michela,

On 6/17/2014 2:14 PM, Michela Leonardi [guest] wrote:
> Dear All, I am very new to Bioconductor and to Gene Ontology
> analyses, so please forgive me if my question is trivial. I have as
> "universe" a list of SNPs (not all of them) from the Affymetrix 6.0
> SNPchip. After some population genetics analyses I defined a subset
> of particular interest to me (i.e. showing signal of selection). I
> would like to  analyze the subset of SNPs (or, better, associated
> genes) in order to test for gene enrichment for gene ontology
> categories.
>
> My first question is: are GOstats and topGO the right tools to
> perform this analysis on the kind of data I have (lists of genes as
> text files)?
>
> And if yes... I started "playing around" with Bioconductor and I got
> stuck with the association: I could not find the way to tell to the
> program that I used the Affymetrix 6.0 SNPchip. Could you point me
> towards some link or document helping me going through all passages
> needed to do the analyses I need?

You are doing something unconventional, so you will not likely find 
anything that shows what to do.

But note that (at least GOstats) is based on Gene IDs, so you need to 
map your SNPs to their 'associated' genes, and then get the Gene IDs 
(what used to be known as Entrez Gene IDs).

Your universe will be the set of Gene IDs for which your universe of 
SNPs are associated. I have no idea how you are associating SNPs with 
genes, but the org.Hs.eg.db package is your friend. Say you have gene 
symbols (you shouldn't be relying on such things, but bear with me).

symbols <- <some code to get symbols goes here>
library(org.Hs.eg.db)
univ <- unique(Lkeys(org.Hs.eg.db))
egs <- select(org.Hs.eg.db, symbols, "ENTREZID","ALIAS")

You may get a warning that you have one or more one-to-many mappings, 
which you may or may not decide to resolve.

Then you just do the 'usual';

p <- new("GOHyperGParams", geneIds = unique(as.character(egs$ENTREZID)), 
universeGeneIds = univ, ontology = "BP", annotation = "org.Hs.eg.db")

hyp <- hyperGTest(p)

Best,

Jim


>
> Thanks a lot for you help
>
> Michela Leonardi
>
> -- output of sessionInfo():
>
> R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin13.1.0
> (64-bit)
>
> locale: [1]
> it_IT.UTF-8/it_IT.UTF-8/it_IT.UTF-8/C/it_IT.UTF-8/it_IT.UTF-8
>
> attached base packages: [1] stats     graphics  grDevices utils
> datasets  methods   base
>
> loaded via a namespace (and not attached): [1] tools_3.1.0
>
> -- Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________ Bioconductor mailing
> list Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list