[BioC] Fwd: GO terms: Annotation for HumanMethylation450
Marc Carlson
mcarlson at fhcrc.org
Wed Apr 3 20:45:15 CEST 2013
Hi Jinyan,
1st of all, please do the following to update some of your very old
packages:
source("http://bioconductor.org/biocLite.R")
biocLite(c("AnnotationDbi","GO.db")
## Then you can load the libraries like this:
library(GO.db)
## and use the cols method to see what you can ask for like this:
cols(GO.db)
For more explanations, please look at this vignette here:
http://www.bioconductor.org/packages/2.11/bioc/vignettes/AnnotationDbi/inst/doc/IntroToAnnotationPackages.pdf
Thanks,
Marc
On 04/03/2013 11:36 AM, Jinyan Huang wrote:
> Thank you. It works now.
>
> result = select(GO.db, keys =k, cols=c("DEFINITION","TERM"))
>
> Here what kind of columns can I select? e.g if I do not want GO term's
> Evidence is IEA.
>
> In the help page, I cannot find such information.
>
> On Wed, Apr 3, 2013 at 2:11 PM, Tim Triche, Jr. <tim.triche at gmail.com> wrote:
>> May need to do
>>
>> require(AnnotationDbi)
>> require(Homo.sapiens) ## or GO.db, or whatever
>>
>> in order for that to work.
>>
>>
>>
>> On Wed, Apr 3, 2013 at 11:07 AM, Jinyan Huang <jhuang at hsph.harvard.edu>
>> wrote:
>>> Marc,
>>>
>>> When I update my R to 2.15.2, I still have the error.
>>>
>>> R
>>>
>>> R version 2.15.2 (2012-10-26) -- "Trick or Treat"
>>> Copyright (C) 2012 The R Foundation for Statistical Computing
>>> ISBN 3-900051-07-0
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> R is free software and comes with ABSOLUTELY NO WARRANTY.
>>> You are welcome to redistribute it under certain conditions.
>>> Type 'license()' or 'licence()' for distribution details.
>>>
>>> Natural language support but running in an English locale
>>>
>>> R is a collaborative project with many contributors.
>>> Type 'contributors()' for more information and
>>> 'citation()' on how to cite R or R packages in publications.
>>>
>>> Type 'demo()' for some demos, 'help()' for on-line help, or
>>> 'help.start()' for an HTML browser interface to help.
>>> Type 'q()' to quit R.
>>>
>>>> ids = c( "GO:0008150", "GO:0001869")
>>>> result = select(GO.db, keys =ids, cols=c("DEFINITION","TERM"))
>>> Error: could not find function "select"
>>>> sessionInfo()
>>> R version 2.15.2 (2012-10-26)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>> [7] LC_PAPER=C LC_NAME=C
>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> On Wed, Apr 3, 2013 at 1:40 PM, Marc Carlson <mcarlson at fhcrc.org> wrote:
>>>> Hi Jinyan,
>>>>
>>>> The code I showed you before will get you all the GO TERMS and their
>>>> DESCRIPTIONS into a single data frame (without using too much RAM):
>>>>
>>>> library(GO.db)
>>>> k = keys(GOTERM) ## k is now all the GOIDs that we actually have Terms
>>>> for.
>>>> ## If you use another source of GOIDs, you might want to call unique()
>>>> on that 1st.
>>>> ## In order to save time.
>>>> ## Then just call select like I showed you before
>>>> result = select(GO.db, keys =k, cols=c("DEFINITION","TERM"))
>>>>
>>>> ## Then you can use merge() to attach that onto your gene IDs later on.
>>>>
>>>> I hope this helps,
>>>>
>>>>
>>>> Marc
>>>>
>>>>
>>>>
>>>> On 04/03/2013 08:28 AM, Tim Triche, Jr. wrote:
>>>>> Probably so. I will look into it. Thanks for the report
>>>>>
>>>>> --t
>>>>>
>>>>> On Apr 3, 2013, at 8:21 AM, Jinyan Huang <jhuang at hsph.harvard.edu>
>>>>> wrote:
>>>>>
>>>>>> Are there any others efficient way to do this? I just thought there
>>>>>> are some problem in my code.
>>>>>>
>>>>>> On Wed, Apr 3, 2013 at 11:14 AM, Tim Triche, Jr.
>>>>>> <tim.triche at gmail.com> wrote:
>>>>>>> Buy more RAM :-)
>>>>>>>
>>>>>>> --t
>>>>>>>
>>>>>>> On Apr 3, 2013, at 6:59 AM, Jinyan Huang <jhuang at hsph.harvard.edu>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> When I want to get all GO terms on IlluminaHumanMethylation450k.
>>>>>>>> There
>>>>>>>> is a memory problem. It uses more than 10G memory.
>>>>>>>>
>>>>>>>> GOids <- lapply(res2, function(x) unlist(lapply(x, function(y)
>>>>>>>> y$GOID)))
>>>>>>>> GOterms <- lapply(GOids, function(x) mget(x, GOTERM, ifnotfound=NA))
>>>>>>>> Error: memory exhausted (limit reached?)
>>>>>>>> Execution halted
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------Get_all_GO.R----------------------------------------------
>>>>>>>>
>>>>>>>> library(IlluminaHumanMethylation450k.db)
>>>>>>>> ## allow both singly- and multiply-mapped probes (e.g. for SYMBOL)
>>>>>>>> IlluminaHumanMethylation450kGOall
>>>>>>>> <-toggleProbes(IlluminaHumanMethylation450kGO,'all')
>>>>>>>> ## now let's look at the differences that result from toggleProbes()
>>>>>>>> mapped_probes_toggled <-
>>>>>>>> mappedkeys(IlluminaHumanMethylation450kGOall)
>>>>>>>> res <- mget(mapped_probes_toggled,
>>>>>>>> IlluminaHumanMethylation450kGOall,
>>>>>>>> ifnotfound=NA)
>>>>>>>> res2 <- lapply(res, function(x) x[sapply(x, function(y)
>>>>>>>> y['Evidence']!='IEA')])
>>>>>>>> ## fetch the GOIDs from the unencumbered toggled map, to get terms
>>>>>>>> for them
>>>>>>>> library(GO.db)
>>>>>>>> GOids <- lapply(res2, function(x) unlist(lapply(x, function(y)
>>>>>>>> y$GOID)))
>>>>>>>> GOterms <- lapply(GOids, function(x) mget(x, GOTERM, ifnotfound=NA))
>>>>>>>>
>>>>>>>> d<-lapply(GOterms,function(x)do.call(rbind,lapply(x,function(y)data.frame(y at Term,y at GOID,y at Ontology))))
>>>>>>>> df<-do.call(rbind,d)
>>>>>>>> len <- sapply(d,function(x)length(x[,1]))
>>>>>>>> probes <- rep(names(d),len)
>>>>>>>> df.out<-data.frame(probes=probes,df)
>>>>>>>> names(df.out)<-c("probe","GoTerm","GOID","GOCategory")
>>>>>>>>
>>>>>>>> write.table(df.out,"GO_all.txt",quote=F,row.names=F,col.names=T,sep="\t")
>>>>>>>>
>>>>>>>>
>>>>>>>> ----------------------------------------------------------------------------------------------------------------
>>>>>>>>
>>>>>>>> On Tue, Apr 2, 2013 at 7:29 PM, Tim Triche, Jr.
>>>>>>>> <tim.triche at gmail.com> wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> Not sure how I managed not to cc: the list on this initially.
>>>>>>>>> Here's some GO.db code with a sort of "moral" to it ;-)
>>>>>>>>>
>>>>>>>>> --t
>>>>>>>>>
>>>>>>>>> Begin forwarded message:
>>>>>>>>>
>>>>>>>>> library(IlluminaHumanMethylation450k.db)
>>>>>>>>>
>>>>>>>>> ## allow both singly- and multiply-mapped probes (e.g. for SYMBOL)
>>>>>>>>> IlluminaHumanMethylation450kGOall
>>>>>>>>> <-toggleProbes(IlluminaHumanMethylation450kGO, 'all')
>>>>>>>>>
>>>>>>>>> ## now let's look at the differences that result from
>>>>>>>>> toggleProbes()
>>>>>>>>> mapped_probes_default <- mappedkeys(IlluminaHumanMethylation450kGO)
>>>>>>>>> mapped_probes_toggled <-
>>>>>>>>> mappedkeys(IlluminaHumanMethylation450kGOall)
>>>>>>>>> multimapped <- setdiff( mapped_probes_toggled,
>>>>>>>>> mapped_probes_default )
>>>>>>>>>
>>>>>>>>> res0 <- mget(head(multimapped), IlluminaHumanMethylation450kGO,
>>>>>>>>> ifnotfound=NA)
>>>>>>>>> res <- mget(head(multimapped), IlluminaHumanMethylation450kGOall,
>>>>>>>>> ifnotfound=NA)
>>>>>>>>>
>>>>>>>>> ## fetch the GOIDs from the unencumbered toggled map, to get terms
>>>>>>>>> for them
>>>>>>>>>
>>>>>>>>> library(GO.db)
>>>>>>>>> GOids <- lapply(res, function(x) unlist(lapply(x, function(y)
>>>>>>>>> y$GOID)))
>>>>>>>>> GOterms <- lapply(GOids, function(x) mget(x, GOTERM,
>>>>>>>>> ifnotfound=NA))
>>>>>>>>> head(GOterms)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> I'll add this to the docs (next release)
>>>>>>>>>>
>>>>>>>>>> thanks,
>>>>>>>>>>
>>>>>>>>>> --t
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Mar 29, 2013 at 11:24 AM, Fabrice Tourre
>>>>>>>>>> <fabrice.ciup at gmail.com> wrote:
>>>>>>>>>>> Tim,
>>>>>>>>>>>
>>>>>>>>>>> Thank you very much for your reply.
>>>>>>>>>>> I have a list of probe list.
>>>>>>>>>>> Do you a example script for me to get the GO terms, instead of GO
>>>>>>>>>>> ID?
>>>>>>>>>>>
>>>>>>>>>>> The Documentation is not very clear for this.
>>>>>>>>>>>
>>>>>>>>>>> http://www.bioconductor.org/packages/2.11/data/annotation/html/IlluminaHumanMethylation450k.db.html
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Mar 29, 2013 at 12:29 PM, Tim Triche, Jr.
>>>>>>>>>>> <tim.triche at gmail.com> wrote:
>>>>>>>>>>>> Oddly enough, the paper from UCSD with Illumina's folks on it
>>>>>>>>>>>> (*) used the
>>>>>>>>>>>> IlluminaHumanMethylation450k.db package (which I am currently
>>>>>>>>>>>> rebuilding to
>>>>>>>>>>>> have a startup message about toggleProbes()) to annotate both
>>>>>>>>>>>> CpG islands
>>>>>>>>>>>> and GO terms.
>>>>>>>>>>>>
>>>>>>>>>>>> (*)
>>>>>>>>>>>>
>>>>>>>>>>>> http://idekerlab.ucsd.edu/publications/Documents/Hannum_MolCell_2012.pdf
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Mar 29, 2013 at 8:49 AM, Fabrice Tourre
>>>>>>>>>>>> <fabrice.ciup at gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Dear list,
>>>>>>>>>>>>>
>>>>>>>>>>>>> In the annotation file of Infinium HumanMethylation450
>>>>>>>>>>>>> BeadChip,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://support.illumina.com/documents/MyIllumina/b78d361a-def5-4adb-ab38-e8990625f053/HumanMethylation450_15017482_v.1.2.csv
>>>>>>>>>>>>>
>>>>>>>>>>>>> for each probe set, they do not have annotation for GO terms,
>>>>>>>>>>>>> pathways.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As they have done in the annotation file:
>>>>>>>>>>>>> HG-U133_Plus_2.na32.annot.csv.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is there some bioconductor package to annotated the Infinium
>>>>>>>>>>>>> HumanMethylation450 probes? Given a probe, feed back the GO
>>>>>>>>>>>>> terms and
>>>>>>>>>>>>> pathways.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you very much in advance.
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Bioconductor mailing list
>>>>>>>>>>>>> Bioconductor at r-project.org
>>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>>>>>>> Search the archives:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> A model is a lie that helps you see the truth.
>>>>>>>>>>>>
>>>>>>>>>>>> Howard Skipper
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> A model is a lie that helps you see the truth.
>>>>>>>>>>
>>>>>>>>>> Howard Skipper
>>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioconductor mailing list
>>>>>>>>> Bioconductor at r-project.org
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>>> Search the archives:
>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best wishes,
>>>>>>>>
>>>>>>>> Jinyan HUANG
>>>>>>
>>>>>> --
>>>>>> Best wishes,
>>>>>>
>>>>>> Jinyan HUANG
>>>
>>>
>>> --
>>> Best wishes,
>>>
>>> Jinyan HUANG
>>
>>
>>
>> --
>> A model is a lie that helps you see the truth.
>>
>> Howard Skipper
>
>
More information about the Bioconductor
mailing list