[BioC] Quick start to linking GO terms and microarray data

Wed Mar 1 13:33:46 CET 2006

Hi Steffen, Wolfgang

Thanks a lot, the biomaRt package looks wonderful for the species that
are in ensembl... Are there any functions within it to annotate other
species? (Eg bacteria, plants etc)

Many thanks
Mick

-----Original Message-----
From: Steffen Durinck [mailto:sdurinck at ebi.ac.uk] 
Sent: 01 March 2006 13:24
To: michael watson (IAH-C)
Cc: Sean Davis; Bioconductor
Subject: Re: [BioC] Quick start to linking GO terms and microarray data

Hi Mike,

As Wolfgang already suggested you can do this with the biomaRt package.
Here is how should do this:

 > library(biomaRt)
Loading required package: XML
Loading required package: RCurl
 > mart = useMart("ensembl",dataset="hsapiens_gene_ensembl")
Checking attributes and filters ... ok
 > getGO(id=c(100,620),type="entrezgene",mart=mart)

        go_id                                    go_description 
evidence_code
1  GO:0004000                      adenosine deaminase 
activity           TAS
2  GO:0016787                                hydrolase 
activity           IEA
3  GO:0009117                             nucleotide 
metabolism           IEA
4  GO:0009168  purine ribonucleoside monophosphate 
biosynthesis           IEA
5  GO:0019735 antimicrobial humoral response (sensu 
Vertebrata)           TAS
6  GO:0006955                                   immune 
response           IMP
7  GO:0006955                                   immune 
response           IEA
8  GO:0006163                      purine nucleotide 
metabolism           IMP
9  GO:0006163                      purine nucleotide 
metabolism           IEA
10 GO:0005737                                         
cytoplasm           IDA
11 GO:0005737                                         
cytoplasm           IEA
   ensembl_gene_id ensembl_transcript_id
1  ENSG00000196839       ENST00000359372
2  ENSG00000196839       ENST00000359372
3  ENSG00000196839       ENST00000359372
4  ENSG00000196839       ENST00000359372
5  ENSG00000196839       ENST00000359372
6  ENSG00000196839       ENST00000359372
7  ENSG00000196839       ENST00000359372
8  ENSG00000196839       ENST00000359372
9  ENSG00000196839       ENST00000359372
10 ENSG00000196839       ENST00000359372
11 ENSG00000196839       ENST00000359372

best,
Steffen

michael watson (IAH-C) wrote:

>Thanks Sean, but I really wanted to demonstrate this in Bioconductor
:-S
>
>I tried running the vignettes in goTools, the first time it froze up my
>PC for about 30 minutes and then gave out a cryptic message about
>coercing x to a list, the second time it froze up my PC and then R
>crashed with no warning :-S
>
>As far as I can tell, GOStats doesn't have any clear examples of simple
>mapping of microarray data to GO terms.
>
>Given that one of the major, fundamental tasks biologists want to do is
>find out functional information for significantly differentailly
>expressed genes, shouldn't this be a little easier, and a little more
>transparent, in bioconductor?
>
>Again, I ask, does anyone have any simple examples of going from a list
>of LocusLink IDs to a list of GO Terms?  (i.e. GO identifiers and the
>biological function/term associated with those identifiers)
>
>Many thanks
>Mick
>
>-----Original Message-----
>From: Sean Davis [mailto:sdavis2 at mail.nih.gov] 
>Sent: 01 March 2006 11:44
>To: michael watson (IAH-C); Bioconductor
>Subject: Re: [BioC] Quick start to linking GO terms and microarray data
>
>
>
>
>On 3/1/06 6:20 AM, "michael watson (IAH-C)"
<michael.watson at bbsrc.ac.uk>
>wrote:
>
>  
>
>>Hi
>>
>>I want to investigate the GO terms associated with my microarray data
>>(normally, a list of genes from topTable() in limma)
>>
>>I have read the vignettes for goTools and GOStats, and to be honest, I
>>am still a little unclear what the overall process is, particularly if
>>    
>>
>I
>  
>
>>am working with a custom array and not with affy or operon.
>>
>>Lets say, for example, I have my array data in a data.frame containing
>>gene names.  In a separate data frame I have a link between my gene
>>names and LocusLink IDs.  How do I:
>>
>>1) Find the GO terms associated with subsets of my genes? (I realise I
>>can use merge() to link my array data to the LocusLink ids, but what
>>    
>>
>do
>  
>
>>I do then?)
>>
>>2) Fins out if a particular GO term is statistically over-represented
>>    
>>
>in
>  
>
>>a particular group
>>    
>>
>
>Hi, Mick.
>
>I would take your locuslink IDs for your genes and dump out two lists
to
>a
>text file:
>
>1)  All LocusIDs on your array.
>2)  All LoucsIDs in your genelist.
>
>Then use an external program or web tool such as DAVID/EASE to do the
>analysis.
>
>That said, there was some discussion on using straight locusIDs (rather
>than
>requiring a metadata package) in GOHyperG.  I don't know where that
>conversion stands.
>
>As to your question about linking genes to GO, that is actually done at
>the
>transcript/protein level.  Merging to entrez gene (locuslink) happens
>after
>the fact.  Using various data sources, you can link by refseq,
>locuslink,
>ensembl ids, ucsc knowngenes, human invitational ids (human), and
>probably
>several others in species other than human.
>
>Sean
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>
>  
>