[BioC] Quick start to linking GO terms and microarray data
Sean Davis
sdavis2 at mail.nih.gov
Wed Mar 1 14:28:37 CET 2006
>
> michael watson (IAH-C) wrote:
>
>> Hi Steffen, Wolfgang
>>
>> Thanks a lot, the biomaRt package looks wonderful for the species that
>> are in ensembl... Are there any functions within it to annotate other
>> species? (Eg bacteria, plants etc)
Mick,
This is a quick-and-dirty solution that will get you whatever NCBI has
available for gene ontology, including arabidopsis, for example. Hope this
gets you another few species. The species IDs included are:
> unique(gene2go$taxID)
[1] 3702 4932 6239 7227 7955 9031 9606 10090 10116 36329
[11] 39947 83333 185431 195099 198094 211586 214684 223283 243164 243231
[21] 243233 246200 265669 284812
Hope this helps.
Sean
> download.file('ftp://ftp.ncbi.nih.gov/gene/DATA/gene2go.gz',
destfile='gene2go.gz')
trying URL 'ftp://ftp.ncbi.nih.gov/gene/DATA/gene2go.gz'
ftp data connection made, file length 5541317 bytes
opened URL
==================================================
downloaded 5411Kb
> gene2go <- read.table(gzfile('gene2go.gz'),sep="\t",header=FALSE,quote="")
> colnames(gene2go) <- c('taxID', 'geneID', 'goID', 'evidence', 'qualifier',
'goTerm', 'pubmedlist')
> gene2go[match(1:10,gene2go$geneID),]
taxID geneID goID evidence qualifier
272227 9606 1 GO:0000004 ND
272230 9606 2 GO:0004867 IEA
NA NA NA <NA> <NA> <NA>
NA.1 NA NA <NA> <NA> <NA>
NA.2 NA NA <NA> <NA> <NA>
NA.3 NA NA <NA> <NA> <NA>
NA.4 NA NA <NA> <NA> <NA>
NA.5 NA NA <NA> <NA> <NA>
272240 9606 9 GO:0004060 TAS
272244 9606 10 GO:0004060 TAS
goTerm pubmedlist
272227 biological process unknown -
272230 serine-type endopeptidase inhibitor activity -
NA <NA> <NA>
NA.1 <NA> <NA>
NA.2 <NA> <NA>
NA.3 <NA> <NA>
NA.4 <NA> <NA>
NA.5 <NA> <NA>
272240 arylamine N-acetyltransferase activity 10908296
272244 arylamine N-acetyltransferase activity 2340091
# and an example from A. thaliana
# the GO for A. thaliana is from TAIR
> gene2go[match(819280,gene2go$geneID),]
taxID geneID goID evidence qualifier
goTerm
12430 3702 819280 GO:0003700 ISS transcription factor
activity
pubmedlist
12430 7948864
More information about the Bioconductor
mailing list