[BioC] GO's to gene's
Loren Engrav
engrav at u.washington.edu
Mon Mar 1 05:28:17 CET 2010
Ok thank you
I now show
> sessionInfo()
R version 2.10.1 (2009-12-14)
i386-apple-darwin9.8.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] org.Hs.eg.db_2.3.6 GO.db_2.3.5 RSQLite_0.8-3
AnnotationDbi_1.8.1 DBI_0.2-5
[6] Biobase_2.6.1
loaded via a namespace (and not attached):
[1] tools_2.10.1
And all commands pass with no errors, however I see
> egids
$`GO:0010711`
IEP
"1471"
$`GO:0030199`
IEA IEA ISS IEA IMP IMP IMP IMP NAS
IMP NAS IMP ISS
"302" "304" "538" "871" "1277" "1278" "1280" "1281" "1281"
"1289" "1289" "1290" "1290"
NAS IDA NAS IEA IEA IEA IEA IEA NAS
ISS IDA ISS NAS
"1301" "1302" "1303" "1805" "2296" "2303" "4010" "4015" "4060"
"4763" "7042" "7046" "7373"
NAS NAS
"9508" "50509"
$`GO:0030574`
IEA IEA IEA IEA IEA IEA IEA IEA
IEA IEA IEA
"4312" "4313" "4314" "4316" "4317" "4318" "4319" "4320"
"4322" "4325" "4327"
IEA IDA IMP NAS IEA NAS IEA IEA
IEA IEA
"5184" "5645" "5645" "5653" "5657" "9508" "9509" "56547"
"64066" "140766"
$`GO:0032963`
IEA IMP
"3091" "7148"
$`GO:0032964`
IEA IMP IMP TAS IMP
"871" "1277" "1281" "1281" "1289"
$`GO:0032966`
IDA IC
"3569" "4261"
$`GO:0032967`
ISS IDA IDA IC IMP TAS IMP
"265" "2147" "2149" "3066" "7040" "7040" "7043"
$`GO:0033342`
IMP
"23560"
So many GO terms containing the word "collagen" are not listed, like
0004656
0005518
etc
Amigo claims there are 68 such terms and the list above has only 8
What did I do wrong?
Also I would like to omit the IEA group
Thank you
> From: Martin Morgan <mtmorgan at fhcrc.org>
> Date: Sun, 28 Feb 2010 19:30:34 -0800
> To: Loren Engrav <engrav at u.washington.edu>
> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
> Subject: Re: [BioC] GO's to gene's
>
> On 02/28/2010 07:17 PM, Loren Engrav wrote:
>> Thank you both
>> Given my skills, it might be easier/quicker to do it "manually" with Amigo
>> But I am trying both methods
>>
>> For the second method I get
>>
>>> library(GO.db)
>> Loading required package: AnnotationDbi
>> Loading required package: Biobase
>>
>> Welcome to Bioconductor
>>
>> Vignettes contain introductory material. To view, type
>> 'openVignette()'. To cite Bioconductor, see
>> 'citation("Biobase")' and for packages 'citation(pkgname)'.
>>
>> Loading required package: DBI
>>> terms <- Term(GOTERM)
>> Error in function (classes, fdef, mtable) :
>> unable to find an inherited method for function "Term", for signature
>> "GOTermsAnnDbBimap"
>>
>>> sessionInfo()
>> R version 2.9.2 Patched (2009-09-05 r49613)
>> i386-apple-darwin9.8.0
>>
>> locale:
>> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> ,
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>
> Update to R version 2.10 and associated Bioc packages, or for a (much)
> slower solution (you'll want to check that Term and Ontology return ids
> in identical order)
>
> terms = eapply(GOTERM, Term)
>
> etc. I have
>
>> sessionInfo()
> R version 2.10.1 Patched (2010-02-23 r51168)
> x86_64-unknown-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] GO.db_2.3.5 RSQLite_0.7-3 DBI_0.2-4
> [4] AnnotationDbi_1.8.1 Biobase_2.6.1
>
> loaded via a namespace (and not attached):
> [1] tools_2.10.1
>
>
> Martin
>
>>
>>> From: Martin Morgan <mtmorgan at fhcrc.org>
>>> Date: Sun, 28 Feb 2010 18:42:33 -0800
>>> To: Vincent Carey <stvjc at channing.harvard.edu>
>>> Cc: Loren Engrav <engrav at u.washington.edu>, "bioconductor at stat.math.ethz.ch"
>>> <bioconductor at stat.math.ethz.ch>
>>> Subject: Re: [BioC] GO's to gene's
>>>
>>> On 02/28/2010 06:14 PM, Vincent Carey wrote:
>>>> Perhaps there is a package with such functionality. However, with the
>>>> GO.db package in place, you need to do a little
>>>> programming, perhaps along the lines of
>>>>
>>>> querGO = function(str, attr = "definition", ont = "MF") {
>>>> require(GO.db, quietly = TRUE)
>>>> gc = GO_dbconn()
>>>> quer.1 = paste("select go_id, term from go_term where",
>>>> attr, "like('%")
>>>> quer.2 = "%') and ontology = '"
>>>> quer.3 = "'"
>>>> quer = paste(quer.1, str, quer.2, ont, quer.3, collapse = "",
>>>> sep = "")
>>>> dbGetQuery(gc, quer)
>>>> }
>>>>
>>>> whereby
>>>>
>>>>> querGO("collagen", "term")
>>>> go_id term
>>>> 1 GO:0004656 procollagen-proline 4-dioxygenase activity
>>>> 2 GO:0005518 collagen binding
>>>> 3 GO:0008475 procollagen-lysine 5-dioxygenase activity
>>>> 4 GO:0019797 procollagen-proline 3-dioxygenase activity
>>>> 5 GO:0019798 procollagen-proline dioxygenase activity
>>>> 6 GO:0033823 procollagen glucosyltransferase activity
>>>> 7 GO:0042329 structural constituent of collagen and cuticulin-based cuticle
>>>> 8 GO:0050211 procollagen galactosyltransferase activity
>>>> 9 GO:0070052 collagen V binding
>>>>>
>>>
>>> Also
>>>
>>> library(GO.db)
>>> terms <- Term(GOTERM) # or maybe Definition(GOTERM) ?
>>> ontologies <- Ontology(GOTERM)
>>> collagen <- terms[grepl("collagen", terms) & ("MF" == ontologies)]
>>>
>>> and the next step,
>>>
>>> library(org.Hs.eg.db)
>>> egids <- mget(names(collagen), org.Hs.egGO2EG, ifnotfound=NA)
>>> egids <- egids[!is.na(egids)]
>>>
>>>
>>>>
>>>> On Sun, Feb 28, 2010 at 8:56 PM, Loren Engrav <engrav at u.washington.edu>
>>>> wrote:
>>>>> Is there a BioC package that will find all the GO terms containing some
>>>>> word, like perhaps ³collagen²
>>>>> And then find all the genes contained within those found terms
>>>>>
>>>>> I scanned
>>>>> GoProfiles
>>>>> GOSemSim
>>>>> GOstats
>>>>> GoTools and
>>>>> TopGO
>>>>>
>>>>> And could not determine that any would do that.
>>>>>
>>>>> Thank you.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>> --
>>> Martin Morgan
>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N.
>>> PO Box 19024 Seattle, WA 98109
>>>
>>> Location: Arnold Building M1 B861
>>> Phone: (206) 667-2793
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> --
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
More information about the Bioconductor
mailing list