[BioC] Quick start to linking GO terms and microarray data
Steffen Durinck
sdurinck at ebi.ac.uk
Wed Mar 1 14:42:58 CET 2006
Hi,
Next to Ensembl, biomaRt currently includes Wormbase, VEGA, Uniprot and msd.
Soon I expect plants to be represented as well via the Gramene database
(http://www.gramene.org).
Best,
Steffen
michael watson (IAH-C) wrote:
>Hi Steffen, Wolfgang
>
>Thanks a lot, the biomaRt package looks wonderful for the species that
>are in ensembl... Are there any functions within it to annotate other
>species? (Eg bacteria, plants etc)
>
>Many thanks
>Mick
>
>-----Original Message-----
>From: Steffen Durinck [mailto:sdurinck at ebi.ac.uk]
>Sent: 01 March 2006 13:24
>To: michael watson (IAH-C)
>Cc: Sean Davis; Bioconductor
>Subject: Re: [BioC] Quick start to linking GO terms and microarray data
>
>Hi Mike,
>
>As Wolfgang already suggested you can do this with the biomaRt package.
>Here is how should do this:
>
> > library(biomaRt)
>Loading required package: XML
>Loading required package: RCurl
> > mart = useMart("ensembl",dataset="hsapiens_gene_ensembl")
>Checking attributes and filters ... ok
> > getGO(id=c(100,620),type="entrezgene",mart=mart)
>
> go_id go_description
>evidence_code
>1 GO:0004000 adenosine deaminase
>activity TAS
>2 GO:0016787 hydrolase
>activity IEA
>3 GO:0009117 nucleotide
>metabolism IEA
>4 GO:0009168 purine ribonucleoside monophosphate
>biosynthesis IEA
>5 GO:0019735 antimicrobial humoral response (sensu
>Vertebrata) TAS
>6 GO:0006955 immune
>response IMP
>7 GO:0006955 immune
>response IEA
>8 GO:0006163 purine nucleotide
>metabolism IMP
>9 GO:0006163 purine nucleotide
>metabolism IEA
>10 GO:0005737
>cytoplasm IDA
>11 GO:0005737
>cytoplasm IEA
> ensembl_gene_id ensembl_transcript_id
>1 ENSG00000196839 ENST00000359372
>2 ENSG00000196839 ENST00000359372
>3 ENSG00000196839 ENST00000359372
>4 ENSG00000196839 ENST00000359372
>5 ENSG00000196839 ENST00000359372
>6 ENSG00000196839 ENST00000359372
>7 ENSG00000196839 ENST00000359372
>8 ENSG00000196839 ENST00000359372
>9 ENSG00000196839 ENST00000359372
>10 ENSG00000196839 ENST00000359372
>11 ENSG00000196839 ENST00000359372
>
>
>best,
>Steffen
>
>michael watson (IAH-C) wrote:
>
>
>
>>Thanks Sean, but I really wanted to demonstrate this in Bioconductor
>>
>>
>:-S
>
>
>>I tried running the vignettes in goTools, the first time it froze up my
>>PC for about 30 minutes and then gave out a cryptic message about
>>coercing x to a list, the second time it froze up my PC and then R
>>crashed with no warning :-S
>>
>>As far as I can tell, GOStats doesn't have any clear examples of simple
>>mapping of microarray data to GO terms.
>>
>>Given that one of the major, fundamental tasks biologists want to do is
>>find out functional information for significantly differentailly
>>expressed genes, shouldn't this be a little easier, and a little more
>>transparent, in bioconductor?
>>
>>Again, I ask, does anyone have any simple examples of going from a list
>>of LocusLink IDs to a list of GO Terms? (i.e. GO identifiers and the
>>biological function/term associated with those identifiers)
>>
>>Many thanks
>>Mick
>>
>>-----Original Message-----
>>From: Sean Davis [mailto:sdavis2 at mail.nih.gov]
>>Sent: 01 March 2006 11:44
>>To: michael watson (IAH-C); Bioconductor
>>Subject: Re: [BioC] Quick start to linking GO terms and microarray data
>>
>>
>>
>>
>>On 3/1/06 6:20 AM, "michael watson (IAH-C)"
>>
>>
><michael.watson at bbsrc.ac.uk>
>
>
>>wrote:
>>
>>
>>
>>
>>
>>>Hi
>>>
>>>I want to investigate the GO terms associated with my microarray data
>>>(normally, a list of genes from topTable() in limma)
>>>
>>>I have read the vignettes for goTools and GOStats, and to be honest, I
>>>am still a little unclear what the overall process is, particularly if
>>>
>>>
>>>
>>>
>>I
>>
>>
>>
>>
>>>am working with a custom array and not with affy or operon.
>>>
>>>Lets say, for example, I have my array data in a data.frame containing
>>>gene names. In a separate data frame I have a link between my gene
>>>names and LocusLink IDs. How do I:
>>>
>>>1) Find the GO terms associated with subsets of my genes? (I realise I
>>>can use merge() to link my array data to the LocusLink ids, but what
>>>
>>>
>>>
>>>
>>do
>>
>>
>>
>>
>>>I do then?)
>>>
>>>2) Fins out if a particular GO term is statistically over-represented
>>>
>>>
>>>
>>>
>>in
>>
>>
>>
>>
>>>a particular group
>>>
>>>
>>>
>>>
>>Hi, Mick.
>>
>>I would take your locuslink IDs for your genes and dump out two lists
>>
>>
>to
>
>
>>a
>>text file:
>>
>>1) All LocusIDs on your array.
>>2) All LoucsIDs in your genelist.
>>
>>Then use an external program or web tool such as DAVID/EASE to do the
>>analysis.
>>
>>That said, there was some discussion on using straight locusIDs (rather
>>than
>>requiring a metadata package) in GOHyperG. I don't know where that
>>conversion stands.
>>
>>As to your question about linking genes to GO, that is actually done at
>>the
>>transcript/protein level. Merging to entrez gene (locuslink) happens
>>after
>>the fact. Using various data sources, you can link by refseq,
>>locuslink,
>>ensembl ids, ucsc knowngenes, human invitational ids (human), and
>>probably
>>several others in species other than human.
>>
>>Sean
>>
>>_______________________________________________
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>
>>
>>
>>
>>
>
>
>
>
>
More information about the Bioconductor
mailing list