[BioC] Quick start to linking GO terms and microarray data
michael watson (IAH-C)
michael.watson at bbsrc.ac.uk
Wed Mar 1 13:33:46 CET 2006
Hi Steffen, Wolfgang
Thanks a lot, the biomaRt package looks wonderful for the species that
are in ensembl... Are there any functions within it to annotate other
species? (Eg bacteria, plants etc)
Many thanks
Mick
-----Original Message-----
From: Steffen Durinck [mailto:sdurinck at ebi.ac.uk]
Sent: 01 March 2006 13:24
To: michael watson (IAH-C)
Cc: Sean Davis; Bioconductor
Subject: Re: [BioC] Quick start to linking GO terms and microarray data
Hi Mike,
As Wolfgang already suggested you can do this with the biomaRt package.
Here is how should do this:
> library(biomaRt)
Loading required package: XML
Loading required package: RCurl
> mart = useMart("ensembl",dataset="hsapiens_gene_ensembl")
Checking attributes and filters ... ok
> getGO(id=c(100,620),type="entrezgene",mart=mart)
go_id go_description
evidence_code
1 GO:0004000 adenosine deaminase
activity TAS
2 GO:0016787 hydrolase
activity IEA
3 GO:0009117 nucleotide
metabolism IEA
4 GO:0009168 purine ribonucleoside monophosphate
biosynthesis IEA
5 GO:0019735 antimicrobial humoral response (sensu
Vertebrata) TAS
6 GO:0006955 immune
response IMP
7 GO:0006955 immune
response IEA
8 GO:0006163 purine nucleotide
metabolism IMP
9 GO:0006163 purine nucleotide
metabolism IEA
10 GO:0005737
cytoplasm IDA
11 GO:0005737
cytoplasm IEA
ensembl_gene_id ensembl_transcript_id
1 ENSG00000196839 ENST00000359372
2 ENSG00000196839 ENST00000359372
3 ENSG00000196839 ENST00000359372
4 ENSG00000196839 ENST00000359372
5 ENSG00000196839 ENST00000359372
6 ENSG00000196839 ENST00000359372
7 ENSG00000196839 ENST00000359372
8 ENSG00000196839 ENST00000359372
9 ENSG00000196839 ENST00000359372
10 ENSG00000196839 ENST00000359372
11 ENSG00000196839 ENST00000359372
best,
Steffen
michael watson (IAH-C) wrote:
>Thanks Sean, but I really wanted to demonstrate this in Bioconductor
:-S
>
>I tried running the vignettes in goTools, the first time it froze up my
>PC for about 30 minutes and then gave out a cryptic message about
>coercing x to a list, the second time it froze up my PC and then R
>crashed with no warning :-S
>
>As far as I can tell, GOStats doesn't have any clear examples of simple
>mapping of microarray data to GO terms.
>
>Given that one of the major, fundamental tasks biologists want to do is
>find out functional information for significantly differentailly
>expressed genes, shouldn't this be a little easier, and a little more
>transparent, in bioconductor?
>
>Again, I ask, does anyone have any simple examples of going from a list
>of LocusLink IDs to a list of GO Terms? (i.e. GO identifiers and the
>biological function/term associated with those identifiers)
>
>Many thanks
>Mick
>
>-----Original Message-----
>From: Sean Davis [mailto:sdavis2 at mail.nih.gov]
>Sent: 01 March 2006 11:44
>To: michael watson (IAH-C); Bioconductor
>Subject: Re: [BioC] Quick start to linking GO terms and microarray data
>
>
>
>
>On 3/1/06 6:20 AM, "michael watson (IAH-C)"
<michael.watson at bbsrc.ac.uk>
>wrote:
>
>
>
>>Hi
>>
>>I want to investigate the GO terms associated with my microarray data
>>(normally, a list of genes from topTable() in limma)
>>
>>I have read the vignettes for goTools and GOStats, and to be honest, I
>>am still a little unclear what the overall process is, particularly if
>>
>>
>I
>
>
>>am working with a custom array and not with affy or operon.
>>
>>Lets say, for example, I have my array data in a data.frame containing
>>gene names. In a separate data frame I have a link between my gene
>>names and LocusLink IDs. How do I:
>>
>>1) Find the GO terms associated with subsets of my genes? (I realise I
>>can use merge() to link my array data to the LocusLink ids, but what
>>
>>
>do
>
>
>>I do then?)
>>
>>2) Fins out if a particular GO term is statistically over-represented
>>
>>
>in
>
>
>>a particular group
>>
>>
>
>Hi, Mick.
>
>I would take your locuslink IDs for your genes and dump out two lists
to
>a
>text file:
>
>1) All LocusIDs on your array.
>2) All LoucsIDs in your genelist.
>
>Then use an external program or web tool such as DAVID/EASE to do the
>analysis.
>
>That said, there was some discussion on using straight locusIDs (rather
>than
>requiring a metadata package) in GOHyperG. I don't know where that
>conversion stands.
>
>As to your question about linking genes to GO, that is actually done at
>the
>transcript/protein level. Merging to entrez gene (locuslink) happens
>after
>the fact. Using various data sources, you can link by refseq,
>locuslink,
>ensembl ids, ucsc knowngenes, human invitational ids (human), and
>probably
>several others in species other than human.
>
>Sean
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>
>
>
More information about the Bioconductor
mailing list