[BioC] Quick start to linking GO terms and microarray data

Wed Mar 1 14:42:58 CET 2006

Hi,

Next to Ensembl, biomaRt currently includes Wormbase, VEGA, Uniprot and msd.
Soon I expect plants to be represented as well via the Gramene database 
(http://www.gramene.org).

Best,
Steffen

michael watson (IAH-C) wrote:

>Hi Steffen, Wolfgang
>
>Thanks a lot, the biomaRt package looks wonderful for the species that
>are in ensembl... Are there any functions within it to annotate other
>species? (Eg bacteria, plants etc)
>
>Many thanks
>Mick
>
>-----Original Message-----
>From: Steffen Durinck [mailto:sdurinck at ebi.ac.uk] 
>Sent: 01 March 2006 13:24
>To: michael watson (IAH-C)
>Cc: Sean Davis; Bioconductor
>Subject: Re: [BioC] Quick start to linking GO terms and microarray data
>
>Hi Mike,
>
>As Wolfgang already suggested you can do this with the biomaRt package.
>Here is how should do this:
>
> > library(biomaRt)
>Loading required package: XML
>Loading required package: RCurl
> > mart = useMart("ensembl",dataset="hsapiens_gene_ensembl")
>Checking attributes and filters ... ok
> > getGO(id=c(100,620),type="entrezgene",mart=mart)
>
>        go_id                                    go_description 
>evidence_code
>1  GO:0004000                      adenosine deaminase 
>activity           TAS
>2  GO:0016787                                hydrolase 
>activity           IEA
>3  GO:0009117                             nucleotide 
>metabolism           IEA
>4  GO:0009168  purine ribonucleoside monophosphate 
>biosynthesis           IEA
>5  GO:0019735 antimicrobial humoral response (sensu 
>Vertebrata)           TAS
>6  GO:0006955                                   immune 
>response           IMP
>7  GO:0006955                                   immune 
>response           IEA
>8  GO:0006163                      purine nucleotide 
>metabolism           IMP
>9  GO:0006163                      purine nucleotide 
>metabolism           IEA
>10 GO:0005737                                         
>cytoplasm           IDA
>11 GO:0005737                                         
>cytoplasm           IEA
>   ensembl_gene_id ensembl_transcript_id
>1  ENSG00000196839       ENST00000359372
>2  ENSG00000196839       ENST00000359372
>3  ENSG00000196839       ENST00000359372
>4  ENSG00000196839       ENST00000359372
>5  ENSG00000196839       ENST00000359372
>6  ENSG00000196839       ENST00000359372
>7  ENSG00000196839       ENST00000359372
>8  ENSG00000196839       ENST00000359372
>9  ENSG00000196839       ENST00000359372
>10 ENSG00000196839       ENST00000359372
>11 ENSG00000196839       ENST00000359372
>
>
>best,
>Steffen
>
>michael watson (IAH-C) wrote:
>
>  
>
>>Thanks Sean, but I really wanted to demonstrate this in Bioconductor
>>    
>>
>:-S
>  
>
>>I tried running the vignettes in goTools, the first time it froze up my
>>PC for about 30 minutes and then gave out a cryptic message about
>>coercing x to a list, the second time it froze up my PC and then R
>>crashed with no warning :-S
>>
>>As far as I can tell, GOStats doesn't have any clear examples of simple
>>mapping of microarray data to GO terms.
>>
>>Given that one of the major, fundamental tasks biologists want to do is
>>find out functional information for significantly differentailly
>>expressed genes, shouldn't this be a little easier, and a little more
>>transparent, in bioconductor?
>>
>>Again, I ask, does anyone have any simple examples of going from a list
>>of LocusLink IDs to a list of GO Terms?  (i.e. GO identifiers and the
>>biological function/term associated with those identifiers)
>>
>>Many thanks
>>Mick
>>
>>-----Original Message-----
>>From: Sean Davis [mailto:sdavis2 at mail.nih.gov] 
>>Sent: 01 March 2006 11:44
>>To: michael watson (IAH-C); Bioconductor
>>Subject: Re: [BioC] Quick start to linking GO terms and microarray data
>>
>>
>>
>>
>>On 3/1/06 6:20 AM, "michael watson (IAH-C)"
>>    
>>
><michael.watson at bbsrc.ac.uk>
>  
>
>>wrote:
>>
>> 
>>
>>    
>>
>>>Hi
>>>
>>>I want to investigate the GO terms associated with my microarray data
>>>(normally, a list of genes from topTable() in limma)
>>>
>>>I have read the vignettes for goTools and GOStats, and to be honest, I
>>>am still a little unclear what the overall process is, particularly if
>>>   
>>>
>>>      
>>>
>>I
>> 
>>
>>    
>>
>>>am working with a custom array and not with affy or operon.
>>>
>>>Lets say, for example, I have my array data in a data.frame containing
>>>gene names.  In a separate data frame I have a link between my gene
>>>names and LocusLink IDs.  How do I:
>>>
>>>1) Find the GO terms associated with subsets of my genes? (I realise I
>>>can use merge() to link my array data to the LocusLink ids, but what
>>>   
>>>
>>>      
>>>
>>do
>> 
>>
>>    
>>
>>>I do then?)
>>>
>>>2) Fins out if a particular GO term is statistically over-represented
>>>   
>>>
>>>      
>>>
>>in
>> 
>>
>>    
>>
>>>a particular group
>>>   
>>>
>>>      
>>>
>>Hi, Mick.
>>
>>I would take your locuslink IDs for your genes and dump out two lists
>>    
>>
>to
>  
>
>>a
>>text file:
>>
>>1)  All LocusIDs on your array.
>>2)  All LoucsIDs in your genelist.
>>
>>Then use an external program or web tool such as DAVID/EASE to do the
>>analysis.
>>
>>That said, there was some discussion on using straight locusIDs (rather
>>than
>>requiring a metadata package) in GOHyperG.  I don't know where that
>>conversion stands.
>>
>>As to your question about linking genes to GO, that is actually done at
>>the
>>transcript/protein level.  Merging to entrez gene (locuslink) happens
>>after
>>the fact.  Using various data sources, you can link by refseq,
>>locuslink,
>>ensembl ids, ucsc knowngenes, human invitational ids (human), and
>>probably
>>several others in species other than human.
>>
>>Sean
>>
>>_______________________________________________
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>
>> 
>>
>>    
>>
>
>
>
>  
>