[BioC] Quick start to linking GO terms and microarray data
Steffen Durinck
sdurinck at ebi.ac.uk
Wed Mar 1 15:02:26 CET 2006
Hi Mick,
The biomaRt package can retrieve data from BioMart data management
systems (see: http://www.biomart.org). Any database that provides such
a BioMart implementation can thus be queried. Ensembl and Wormbase for
example provide this and are queried in real-time through biomaRt.
For species that are not in these systems, the biomaRt package can not
provide help unless a local BioMart for this species is set up or you
can try to convince the database of interest to include a BioMart
system. I expect plants and fly to be included soon but have no
information on other species.
Best,
Steffen
michael watson (IAH-C) wrote:
>Hi Steffen
>
>Sorry if I am confused, but getGO() seems to require a connection to an
>ensembl database. If I have identifiers for a species that is not in
>ensembl, can I still use biomaRt to retrieve GO (and other) annotations?
>
>If so, it is a little unclear how to do this from the vignettes :-S
>
>Thank you for the help
>
>Mick
>
>-----Original Message-----
>From: Steffen Durinck [mailto:sdurinck at ebi.ac.uk]
>Sent: 01 March 2006 13:43
>To: michael watson (IAH-C)
>Cc: Bioconductor
>Subject: Re: [BioC] Quick start to linking GO terms and microarray data
>
>Hi,
>
>Next to Ensembl, biomaRt currently includes Wormbase, VEGA, Uniprot and
>msd.
>Soon I expect plants to be represented as well via the Gramene database
>(http://www.gramene.org).
>
>Best,
>Steffen
>
>
>michael watson (IAH-C) wrote:
>
>
>
>>Hi Steffen, Wolfgang
>>
>>Thanks a lot, the biomaRt package looks wonderful for the species that
>>are in ensembl... Are there any functions within it to annotate other
>>species? (Eg bacteria, plants etc)
>>
>>Many thanks
>>Mick
>>
>>-----Original Message-----
>>From: Steffen Durinck [mailto:sdurinck at ebi.ac.uk]
>>Sent: 01 March 2006 13:24
>>To: michael watson (IAH-C)
>>Cc: Sean Davis; Bioconductor
>>Subject: Re: [BioC] Quick start to linking GO terms and microarray data
>>
>>Hi Mike,
>>
>>As Wolfgang already suggested you can do this with the biomaRt package.
>>Here is how should do this:
>>
>>
>>
>>>library(biomaRt)
>>>
>>>
>>Loading required package: XML
>>Loading required package: RCurl
>>
>>
>>>mart = useMart("ensembl",dataset="hsapiens_gene_ensembl")
>>>
>>>
>>Checking attributes and filters ... ok
>>
>>
>>>getGO(id=c(100,620),type="entrezgene",mart=mart)
>>>
>>>
>> go_id go_description
>>evidence_code
>>1 GO:0004000 adenosine deaminase
>>activity TAS
>>2 GO:0016787 hydrolase
>>activity IEA
>>3 GO:0009117 nucleotide
>>metabolism IEA
>>4 GO:0009168 purine ribonucleoside monophosphate
>>biosynthesis IEA
>>5 GO:0019735 antimicrobial humoral response (sensu
>>Vertebrata) TAS
>>6 GO:0006955 immune
>>response IMP
>>7 GO:0006955 immune
>>response IEA
>>8 GO:0006163 purine nucleotide
>>metabolism IMP
>>9 GO:0006163 purine nucleotide
>>metabolism IEA
>>10 GO:0005737
>>cytoplasm IDA
>>11 GO:0005737
>>cytoplasm IEA
>> ensembl_gene_id ensembl_transcript_id
>>1 ENSG00000196839 ENST00000359372
>>2 ENSG00000196839 ENST00000359372
>>3 ENSG00000196839 ENST00000359372
>>4 ENSG00000196839 ENST00000359372
>>5 ENSG00000196839 ENST00000359372
>>6 ENSG00000196839 ENST00000359372
>>7 ENSG00000196839 ENST00000359372
>>8 ENSG00000196839 ENST00000359372
>>9 ENSG00000196839 ENST00000359372
>>10 ENSG00000196839 ENST00000359372
>>11 ENSG00000196839 ENST00000359372
>>
>>
>>best,
>>Steffen
>>
>>michael watson (IAH-C) wrote:
>>
>>
>>
>>
>>
>>>Thanks Sean, but I really wanted to demonstrate this in Bioconductor
>>>
>>>
>>>
>>>
>>:-S
>>
>>
>>
>>
>>>I tried running the vignettes in goTools, the first time it froze up
>>>
>>>
>my
>
>
>>>PC for about 30 minutes and then gave out a cryptic message about
>>>coercing x to a list, the second time it froze up my PC and then R
>>>crashed with no warning :-S
>>>
>>>As far as I can tell, GOStats doesn't have any clear examples of
>>>
>>>
>simple
>
>
>>>mapping of microarray data to GO terms.
>>>
>>>Given that one of the major, fundamental tasks biologists want to do
>>>
>>>
>is
>
>
>>>find out functional information for significantly differentailly
>>>expressed genes, shouldn't this be a little easier, and a little more
>>>transparent, in bioconductor?
>>>
>>>Again, I ask, does anyone have any simple examples of going from a
>>>
>>>
>list
>
>
>>>of LocusLink IDs to a list of GO Terms? (i.e. GO identifiers and the
>>>biological function/term associated with those identifiers)
>>>
>>>Many thanks
>>>Mick
>>>
>>>-----Original Message-----
>>>From: Sean Davis [mailto:sdavis2 at mail.nih.gov]
>>>Sent: 01 March 2006 11:44
>>>To: michael watson (IAH-C); Bioconductor
>>>Subject: Re: [BioC] Quick start to linking GO terms and microarray
>>>
>>>
>data
>
>
>>>
>>>
>>>On 3/1/06 6:20 AM, "michael watson (IAH-C)"
>>>
>>>
>>>
>>>
>><michael.watson at bbsrc.ac.uk>
>>
>>
>>
>>
>>>wrote:
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>>Hi
>>>>
>>>>I want to investigate the GO terms associated with my microarray data
>>>>(normally, a list of genes from topTable() in limma)
>>>>
>>>>I have read the vignettes for goTools and GOStats, and to be honest,
>>>>
>>>>
>I
>
>
>>>>am still a little unclear what the overall process is, particularly
>>>>
>>>>
>if
>
>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>I
>>>
>>>
>>>
>>>
>>>
>>>
>>>>am working with a custom array and not with affy or operon.
>>>>
>>>>Lets say, for example, I have my array data in a data.frame
>>>>
>>>>
>containing
>
>
>>>>gene names. In a separate data frame I have a link between my gene
>>>>names and LocusLink IDs. How do I:
>>>>
>>>>1) Find the GO terms associated with subsets of my genes? (I realise
>>>>
>>>>
>I
>
>
>>>>can use merge() to link my array data to the LocusLink ids, but what
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>do
>>>
>>>
>>>
>>>
>>>
>>>
>>>>I do then?)
>>>>
>>>>2) Fins out if a particular GO term is statistically over-represented
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>in
>>>
>>>
>>>
>>>
>>>
>>>
>>>>a particular group
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>Hi, Mick.
>>>
>>>I would take your locuslink IDs for your genes and dump out two lists
>>>
>>>
>>>
>>>
>>to
>>
>>
>>
>>
>>>a
>>>text file:
>>>
>>>1) All LocusIDs on your array.
>>>2) All LoucsIDs in your genelist.
>>>
>>>Then use an external program or web tool such as DAVID/EASE to do the
>>>analysis.
>>>
>>>That said, there was some discussion on using straight locusIDs
>>>
>>>
>(rather
>
>
>>>than
>>>requiring a metadata package) in GOHyperG. I don't know where that
>>>conversion stands.
>>>
>>>As to your question about linking genes to GO, that is actually done
>>>
>>>
>at
>
>
>>>the
>>>transcript/protein level. Merging to entrez gene (locuslink) happens
>>>after
>>>the fact. Using various data sources, you can link by refseq,
>>>locuslink,
>>>ensembl ids, ucsc knowngenes, human invitational ids (human), and
>>>probably
>>>several others in species other than human.
>>>
>>>Sean
>>>
>>>_______________________________________________
>>>Bioconductor mailing list
>>>Bioconductor at stat.math.ethz.ch
>>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>
>
>
>
>
>
More information about the Bioconductor
mailing list