[BioC] GO's to gene's
Loren Engrav
engrav at u.washington.edu
Mon Mar 1 06:01:11 CET 2010
So I checked
> collagen
And this list matches Amigo
So then would appear the issue lies in
> egids <- mget(names(collagen), org.Hs.egGO2EG, ifnotfound=NA)
Some of the names are finding no associated genes in org.Hs.egGO2EG and so
appear as NA
True? Possible?
My version of org.Hs.egGO2EG is 2.3.6
> From: Loren Engrav <engrav at u.washington.edu>
> Date: Sun, 28 Feb 2010 20:33:05 -0800
> To: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
> Conversation: [BioC] GO's to gene's
> Subject: Re: [BioC] GO's to gene's
>
> Oopps, Amigo says there are 20 such terms, not 68 as I said before, cuz I
> retrieved only BP
>
>
>> From: Loren Engrav <engrav at u.washington.edu>
>> Date: Sun, 28 Feb 2010 20:28:17 -0800
>> To: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
>> Conversation: [BioC] GO's to gene's
>> Subject: Re: [BioC] GO's to gene's
>>
>> Ok thank you
>> I now show
>>> sessionInfo()
>> R version 2.10.1 (2009-12-14)
>> i386-apple-darwin9.8.0
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] org.Hs.eg.db_2.3.6 GO.db_2.3.5 RSQLite_0.8-3
>> AnnotationDbi_1.8.1 DBI_0.2-5
>> [6] Biobase_2.6.1
>>
>> loaded via a namespace (and not attached):
>> [1] tools_2.10.1
>>
>> And all commands pass with no errors, however I see
>>
>>> egids
>> $`GO:0010711`
>> IEP
>> "1471"
>>
>> $`GO:0030199`
>> IEA IEA ISS IEA IMP IMP IMP IMP NAS
>> IMP NAS IMP ISS
>> "302" "304" "538" "871" "1277" "1278" "1280" "1281" "1281"
>> "1289" "1289" "1290" "1290"
>> NAS IDA NAS IEA IEA IEA IEA IEA NAS
>> ISS IDA ISS NAS
>> "1301" "1302" "1303" "1805" "2296" "2303" "4010" "4015" "4060"
>> "4763" "7042" "7046" "7373"
>> NAS NAS
>> "9508" "50509"
>>
>> $`GO:0030574`
>> IEA IEA IEA IEA IEA IEA IEA IEA
>> IEA IEA IEA
>> "4312" "4313" "4314" "4316" "4317" "4318" "4319" "4320"
>> "4322" "4325" "4327"
>> IEA IDA IMP NAS IEA NAS IEA IEA
>> IEA IEA
>> "5184" "5645" "5645" "5653" "5657" "9508" "9509" "56547"
>> "64066" "140766"
>>
>> $`GO:0032963`
>> IEA IMP
>> "3091" "7148"
>>
>> $`GO:0032964`
>> IEA IMP IMP TAS IMP
>> "871" "1277" "1281" "1281" "1289"
>>
>> $`GO:0032966`
>> IDA IC
>> "3569" "4261"
>>
>> $`GO:0032967`
>> ISS IDA IDA IC IMP TAS IMP
>> "265" "2147" "2149" "3066" "7040" "7040" "7043"
>>
>> $`GO:0033342`
>> IMP
>> "23560"
>>
>> So many GO terms containing the word "collagen" are not listed, like
>> 0004656
>> 0005518
>> etc
>> Amigo claims there are 68 such terms and the list above has only 8
>> What did I do wrong?
>> Also I would like to omit the IEA group
>>
>> Thank you
>>
>>
>>
>>
>>
>>
>>> From: Martin Morgan <mtmorgan at fhcrc.org>
>>> Date: Sun, 28 Feb 2010 19:30:34 -0800
>>> To: Loren Engrav <engrav at u.washington.edu>
>>> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
>>> Subject: Re: [BioC] GO's to gene's
>>>
>>> On 02/28/2010 07:17 PM, Loren Engrav wrote:
>>>> Thank you both
>>>> Given my skills, it might be easier/quicker to do it "manually" with Amigo
>>>> But I am trying both methods
>>>>
>>>> For the second method I get
>>>>
>>>>> library(GO.db)
>>>> Loading required package: AnnotationDbi
>>>> Loading required package: Biobase
>>>>
>>>> Welcome to Bioconductor
>>>>
>>>> Vignettes contain introductory material. To view, type
>>>> 'openVignette()'. To cite Bioconductor, see
>>>> 'citation("Biobase")' and for packages 'citation(pkgname)'.
>>>>
>>>> Loading required package: DBI
>>>>> terms <- Term(GOTERM)
>>>> Error in function (classes, fdef, mtable) :
>>>> unable to find an inherited method for function "Term", for signature
>>>> "GOTermsAnnDbBimap"
>>>>
>>>>> sessionInfo()
>>>> R version 2.9.2 Patched (2009-09-05 r49613)
>>>> i386-apple-darwin9.8.0
>>>>
>>>> locale:
>>>> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>> ,
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> Update to R version 2.10 and associated Bioc packages, or for a (much)
>>> slower solution (you'll want to check that Term and Ontology return ids
>>> in identical order)
>>>
>>> terms = eapply(GOTERM, Term)
>>>
>>> etc. I have
>>>
>>>> sessionInfo()
>>> R version 2.10.1 Patched (2010-02-23 r51168)
>>> x86_64-unknown-linux-gnu
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> other attached packages:
>>> [1] GO.db_2.3.5 RSQLite_0.7-3 DBI_0.2-4
>>> [4] AnnotationDbi_1.8.1 Biobase_2.6.1
>>>
>>> loaded via a namespace (and not attached):
>>> [1] tools_2.10.1
>>>
>>>
>>> Martin
>>>
>>>>
>>>>> From: Martin Morgan <mtmorgan at fhcrc.org>
>>>>> Date: Sun, 28 Feb 2010 18:42:33 -0800
>>>>> To: Vincent Carey <stvjc at channing.harvard.edu>
>>>>> Cc: Loren Engrav <engrav at u.washington.edu>,
>>>>> "bioconductor at stat.math.ethz.ch"
>>>>> <bioconductor at stat.math.ethz.ch>
>>>>> Subject: Re: [BioC] GO's to gene's
>>>>>
>>>>> On 02/28/2010 06:14 PM, Vincent Carey wrote:
>>>>>> Perhaps there is a package with such functionality. However, with the
>>>>>> GO.db package in place, you need to do a little
>>>>>> programming, perhaps along the lines of
>>>>>>
>>>>>> querGO = function(str, attr = "definition", ont = "MF") {
>>>>>> require(GO.db, quietly = TRUE)
>>>>>> gc = GO_dbconn()
>>>>>> quer.1 = paste("select go_id, term from go_term where",
>>>>>> attr, "like('%")
>>>>>> quer.2 = "%') and ontology = '"
>>>>>> quer.3 = "'"
>>>>>> quer = paste(quer.1, str, quer.2, ont, quer.3, collapse = "",
>>>>>> sep = "")
>>>>>> dbGetQuery(gc, quer)
>>>>>> }
>>>>>>
>>>>>> whereby
>>>>>>
>>>>>>> querGO("collagen", "term")
>>>>>> go_id
>>>>>> term
>>>>>> 1 GO:0004656 procollagen-proline 4-dioxygenase
>>>>>> activity
>>>>>> 2 GO:0005518 collagen
>>>>>> binding
>>>>>> 3 GO:0008475 procollagen-lysine 5-dioxygenase
>>>>>> activity
>>>>>> 4 GO:0019797 procollagen-proline 3-dioxygenase
>>>>>> activity
>>>>>> 5 GO:0019798 procollagen-proline dioxygenase
>>>>>> activity
>>>>>> 6 GO:0033823 procollagen glucosyltransferase
>>>>>> activity
>>>>>> 7 GO:0042329 structural constituent of collagen and cuticulin-based
>>>>>> cuticle
>>>>>> 8 GO:0050211 procollagen galactosyltransferase
>>>>>> activity
>>>>>> 9 GO:0070052 collagen V
>>>>>> binding
>>>>>>>
>>>>>
>>>>> Also
>>>>>
>>>>> library(GO.db)
>>>>> terms <- Term(GOTERM) # or maybe Definition(GOTERM) ?
>>>>> ontologies <- Ontology(GOTERM)
>>>>> collagen <- terms[grepl("collagen", terms) & ("MF" == ontologies)]
>>>>>
>>>>> and the next step,
>>>>>
>>>>> library(org.Hs.eg.db)
>>>>> egids <- mget(names(collagen), org.Hs.egGO2EG, ifnotfound=NA)
>>>>> egids <- egids[!is.na(egids)]
>>>>>
>>>>>
>>>>>>
>>>>>> On Sun, Feb 28, 2010 at 8:56 PM, Loren Engrav <engrav at u.washington.edu>
>>>>>> wrote:
>>>>>>> Is there a BioC package that will find all the GO terms containing some
>>>>>>> word, like perhaps ³collagen²
>>>>>>> And then find all the genes contained within those found terms
>>>>>>>
>>>>>>> I scanned
>>>>>>> GoProfiles
>>>>>>> GOSemSim
>>>>>>> GOstats
>>>>>>> GoTools and
>>>>>>> TopGO
>>>>>>>
>>>>>>> And could not determine that any would do that.
>>>>>>>
>>>>>>> Thank you.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> [[alternative HTML version deleted]]
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioconductor mailing list
>>>>>>> Bioconductor at stat.math.ethz.ch
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>> Search the archives:
>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at stat.math.ethz.ch
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>>>
>>>>> --
>>>>> Martin Morgan
>>>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>>>> 1100 Fairview Ave. N.
>>>>> PO Box 19024 Seattle, WA 98109
>>>>>
>>>>> Location: Arnold Building M1 B861
>>>>> Phone: (206) 667-2793
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>> --
>>> Martin Morgan
>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N.
>>> PO Box 19024 Seattle, WA 98109
>>>
>>> Location: Arnold Building M1 B861
>>> Phone: (206) 667-2793
More information about the Bioconductor
mailing list