[BioC] GO's to gene's
Loren Engrav
engrav at u.washington.edu
Mon Mar 1 05:33:05 CET 2010
Oopps, Amigo says there are 20 such terms, not 68 as I said before, cuz I
retrieved only BP
> From: Loren Engrav <engrav at u.washington.edu>
> Date: Sun, 28 Feb 2010 20:28:17 -0800
> To: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
> Conversation: [BioC] GO's to gene's
> Subject: Re: [BioC] GO's to gene's
>
> Ok thank you
> I now show
>> sessionInfo()
> R version 2.10.1 (2009-12-14)
> i386-apple-darwin9.8.0
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] org.Hs.eg.db_2.3.6 GO.db_2.3.5 RSQLite_0.8-3
> AnnotationDbi_1.8.1 DBI_0.2-5
> [6] Biobase_2.6.1
>
> loaded via a namespace (and not attached):
> [1] tools_2.10.1
>
> And all commands pass with no errors, however I see
>
>> egids
> $`GO:0010711`
> IEP
> "1471"
>
> $`GO:0030199`
> IEA IEA ISS IEA IMP IMP IMP IMP NAS
> IMP NAS IMP ISS
> "302" "304" "538" "871" "1277" "1278" "1280" "1281" "1281"
> "1289" "1289" "1290" "1290"
> NAS IDA NAS IEA IEA IEA IEA IEA NAS
> ISS IDA ISS NAS
> "1301" "1302" "1303" "1805" "2296" "2303" "4010" "4015" "4060"
> "4763" "7042" "7046" "7373"
> NAS NAS
> "9508" "50509"
>
> $`GO:0030574`
> IEA IEA IEA IEA IEA IEA IEA IEA
> IEA IEA IEA
> "4312" "4313" "4314" "4316" "4317" "4318" "4319" "4320"
> "4322" "4325" "4327"
> IEA IDA IMP NAS IEA NAS IEA IEA
> IEA IEA
> "5184" "5645" "5645" "5653" "5657" "9508" "9509" "56547"
> "64066" "140766"
>
> $`GO:0032963`
> IEA IMP
> "3091" "7148"
>
> $`GO:0032964`
> IEA IMP IMP TAS IMP
> "871" "1277" "1281" "1281" "1289"
>
> $`GO:0032966`
> IDA IC
> "3569" "4261"
>
> $`GO:0032967`
> ISS IDA IDA IC IMP TAS IMP
> "265" "2147" "2149" "3066" "7040" "7040" "7043"
>
> $`GO:0033342`
> IMP
> "23560"
>
> So many GO terms containing the word "collagen" are not listed, like
> 0004656
> 0005518
> etc
> Amigo claims there are 68 such terms and the list above has only 8
> What did I do wrong?
> Also I would like to omit the IEA group
>
> Thank you
>
>
>
>
>
>
>> From: Martin Morgan <mtmorgan at fhcrc.org>
>> Date: Sun, 28 Feb 2010 19:30:34 -0800
>> To: Loren Engrav <engrav at u.washington.edu>
>> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
>> Subject: Re: [BioC] GO's to gene's
>>
>> On 02/28/2010 07:17 PM, Loren Engrav wrote:
>>> Thank you both
>>> Given my skills, it might be easier/quicker to do it "manually" with Amigo
>>> But I am trying both methods
>>>
>>> For the second method I get
>>>
>>>> library(GO.db)
>>> Loading required package: AnnotationDbi
>>> Loading required package: Biobase
>>>
>>> Welcome to Bioconductor
>>>
>>> Vignettes contain introductory material. To view, type
>>> 'openVignette()'. To cite Bioconductor, see
>>> 'citation("Biobase")' and for packages 'citation(pkgname)'.
>>>
>>> Loading required package: DBI
>>>> terms <- Term(GOTERM)
>>> Error in function (classes, fdef, mtable) :
>>> unable to find an inherited method for function "Term", for signature
>>> "GOTermsAnnDbBimap"
>>>
>>>> sessionInfo()
>>> R version 2.9.2 Patched (2009-09-05 r49613)
>>> i386-apple-darwin9.8.0
>>>
>>> locale:
>>> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>> ,
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>
>> Update to R version 2.10 and associated Bioc packages, or for a (much)
>> slower solution (you'll want to check that Term and Ontology return ids
>> in identical order)
>>
>> terms = eapply(GOTERM, Term)
>>
>> etc. I have
>>
>>> sessionInfo()
>> R version 2.10.1 Patched (2010-02-23 r51168)
>> x86_64-unknown-linux-gnu
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] GO.db_2.3.5 RSQLite_0.7-3 DBI_0.2-4
>> [4] AnnotationDbi_1.8.1 Biobase_2.6.1
>>
>> loaded via a namespace (and not attached):
>> [1] tools_2.10.1
>>
>>
>> Martin
>>
>>>
>>>> From: Martin Morgan <mtmorgan at fhcrc.org>
>>>> Date: Sun, 28 Feb 2010 18:42:33 -0800
>>>> To: Vincent Carey <stvjc at channing.harvard.edu>
>>>> Cc: Loren Engrav <engrav at u.washington.edu>,
>>>> "bioconductor at stat.math.ethz.ch"
>>>> <bioconductor at stat.math.ethz.ch>
>>>> Subject: Re: [BioC] GO's to gene's
>>>>
>>>> On 02/28/2010 06:14 PM, Vincent Carey wrote:
>>>>> Perhaps there is a package with such functionality. However, with the
>>>>> GO.db package in place, you need to do a little
>>>>> programming, perhaps along the lines of
>>>>>
>>>>> querGO = function(str, attr = "definition", ont = "MF") {
>>>>> require(GO.db, quietly = TRUE)
>>>>> gc = GO_dbconn()
>>>>> quer.1 = paste("select go_id, term from go_term where",
>>>>> attr, "like('%")
>>>>> quer.2 = "%') and ontology = '"
>>>>> quer.3 = "'"
>>>>> quer = paste(quer.1, str, quer.2, ont, quer.3, collapse = "",
>>>>> sep = "")
>>>>> dbGetQuery(gc, quer)
>>>>> }
>>>>>
>>>>> whereby
>>>>>
>>>>>> querGO("collagen", "term")
>>>>> go_id
>>>>> term
>>>>> 1 GO:0004656 procollagen-proline 4-dioxygenase
>>>>> activity
>>>>> 2 GO:0005518 collagen
>>>>> binding
>>>>> 3 GO:0008475 procollagen-lysine 5-dioxygenase
>>>>> activity
>>>>> 4 GO:0019797 procollagen-proline 3-dioxygenase
>>>>> activity
>>>>> 5 GO:0019798 procollagen-proline dioxygenase
>>>>> activity
>>>>> 6 GO:0033823 procollagen glucosyltransferase
>>>>> activity
>>>>> 7 GO:0042329 structural constituent of collagen and cuticulin-based
>>>>> cuticle
>>>>> 8 GO:0050211 procollagen galactosyltransferase
>>>>> activity
>>>>> 9 GO:0070052 collagen V
>>>>> binding
>>>>>>
>>>>
>>>> Also
>>>>
>>>> library(GO.db)
>>>> terms <- Term(GOTERM) # or maybe Definition(GOTERM) ?
>>>> ontologies <- Ontology(GOTERM)
>>>> collagen <- terms[grepl("collagen", terms) & ("MF" == ontologies)]
>>>>
>>>> and the next step,
>>>>
>>>> library(org.Hs.eg.db)
>>>> egids <- mget(names(collagen), org.Hs.egGO2EG, ifnotfound=NA)
>>>> egids <- egids[!is.na(egids)]
>>>>
>>>>
>>>>>
>>>>> On Sun, Feb 28, 2010 at 8:56 PM, Loren Engrav <engrav at u.washington.edu>
>>>>> wrote:
>>>>>> Is there a BioC package that will find all the GO terms containing some
>>>>>> word, like perhaps ³collagen²
>>>>>> And then find all the genes contained within those found terms
>>>>>>
>>>>>> I scanned
>>>>>> GoProfiles
>>>>>> GOSemSim
>>>>>> GOstats
>>>>>> GoTools and
>>>>>> TopGO
>>>>>>
>>>>>> And could not determine that any would do that.
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at stat.math.ethz.ch
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>>
>>>> --
>>>> Martin Morgan
>>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N.
>>>> PO Box 19024 Seattle, WA 98109
>>>>
>>>> Location: Arnold Building M1 B861
>>>> Phone: (206) 667-2793
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> --
>> Martin Morgan
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793
More information about the Bioconductor
mailing list