[BioC] advice in building GOALLENTREZID {GO}
Vladimir Morozov
vmorozov at als.net
Wed Apr 9 20:32:10 CEST 2008
> Yes, but if you do, and you want to use any other annotation, you
must also build that. The annotation packages are inter-connected in
non-trivial ways (many depending on GO), so mixing and matching is not a
simple process, which is pretty much why we don't do it more often.
Essentially I need updated GOALLENTREZID for GO enrichment analysis on
Entrez lists. I don't think it depends on other annotation slots. Having
'GO' 'OFFSPRING' lists I probably can get GOALLENTREZID fairly easy:
#parse the Entrez GO "direct" mapping from NCBI
gene2go <- read.delim('/home/data/public/GO/gene2go',head=F,comment.char
= "#")
go2g <-as.character(gene2go$V2);names(go2g)<-gene2go$V4
go2g <- split(go2g,gene2go$V3)
#get Entrez GO "transitive" mapping using 'GO' 'OFFSPRING' lists
go2allg=lapply(c('CC','BP','MF'),function(goType){
eval(parse(text=paste('xx=as.list(','GO',goType,'OFFSPRING',')',sep=''))
)
xx2= c(
mapply(function(x){x},names(xx[is.na(xx)]),SIMPLIFY=F),
mapply(function(x,y){c(x,y)},names(xx[!is.na(xx)]),xx[!is.na(xx)],SIMPLI
FY=F)
)
lapply(xx2,function(x){unlist(unique(go2g[x]))})
})
#collapse into one level list
go2allg <- unlist(go2allg,rec=F,use.names =T)
#seems to be updated 'GOALLENTREZID' excluding 'all'
> length(go2allg)
[1] 23678
> xx <- as.list(GOALLENTREZID)
> length(xx)
[1] 23679
> names(xx)[!(names(xx) %in% names(go2allg))]
[1] "all"
>
> xx$`GO:0000328`
IDA IDA IDA IDA TAS TAS TAS
"850875" "851514" "853290" "853912" "855343" "855949" "856649"
> go2allg[1]
$`GO:0000328`
IDA IDA IDA IDA TAS TAS TAS
ISS
"850875" "851514" "853290" "853912" "855343" "855949" "856649"
"2543332"
So suggestions for parsing the GeneOntology files into 'GO' 'OFFSPRING'
environment would be appreciated
Best,
Vladimir
-----Original Message-----
From: Robert Gentleman [mailto:rgentlem at fhcrc.org]
Sent: Wednesday, April 09, 2008 1:03 PM
To: Vladimir Morozov
Cc: Marc Carlson; bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] advice in building GOALLENTREZID {GO}
Vladimir Morozov wrote:
> Now I got it! Thanks
>
> Can you provide the code to build GO{BP|MF|CC}OFFSPRING I probably
> want to update it more often than biannually
Yes, but if you do, and you want to use any other annotation, you
must also build that. The annotation packages are inter-connected in
non-trivial ways (many depending on GO), so mixing and matching is not a
simple process, which is pretty much why we don't do it more often.
best wishes
Robert
>
> Thanks
> Vlad
>
>
>
> -----Original Message-----
> From: Marc Carlson [mailto:mcarlson at fhcrc.org]
> Sent: Wednesday, April 09, 2008 11:57 AM
> To: Vladimir Morozov
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] advice in building GOALLENTREZID {GO}
>
> Vladimir Morozov wrote:
>> Marc,
>>
>> I need the "transitive"(direct and "child" term) GO->Enterz mapping.
>> Where is Entrez mapping in GO.db?
>>
>>
>
> Yes those maps were deliberately not put into the newer GO.db package.
> Instead you can now find this information in the organism based
> packages as I described in my previous post.
>
> In short, you need to look at the org.Xx.eg.db package for your
> species, where Xx is the genus and species 1st letter (Homo sapiens
> becomes Hs, Mus musculus becomes Mm etc.).
>
> Then you need to look at the org.Hs.egGO2ALLEGS and the
> org.Hs.egGO2EG mappings that the package contains (continuing the
human example).
>
> The problem with having that data in GO was that it munges together GO
> to entrez gene ID associations from several different organisms at the
> same time. Entrez gene IDs are unique, so what we had before with
> these maps inside of GO is not really wrong, but we fear that someone
> could potentially become confused by this, and we want to help steer
> you guys towards getting the correct answers whenever possible. Plus,
> this map was already really huge and needed to be split up in order to
> prevent future versions of the GO package from swelling up into a
"GOjira"
> package. ;)
>
> Hope this helps,
>
>
> Marc
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
More information about the Bioconductor
mailing list