[BioC] Gostats with Yeast annotation

Marc Carlson mcarlson at fhcrc.org
Tue Feb 10 21:05:24 CET 2009


Hi Yolande,

Unlike the "eg" packages, the annotation package org.Sc.sgd.db is based
on sgd instead of NCBI. That means that the central IDs are the
systematic Yeast identifiers that you can see in the examples below
(YAL002W for example). So these IDs (and not entrez gene IDs) become the
currency for dealing with GOstats when using yeast. Alternatively, if
you needed an entrez gene ID for some other reason, the version of this
package that is found in the devel branch will let you get one of those.

Marc



Yolande Tra wrote:
> Hi Alex,
>
> I need some help. For yeast with two-color microarray, do you know which identifier to use (there is no org.Sc.sgdENTREZID)in the 
>
> ls("package:org.Sc.sgd)
> [1] "org.Sc.sgd"        "org.Sc.sgd_dbconn"     "org.Sc.sgd_dbfile"     "org.Sc.sgd_dbInfo"    
>  [5] "org.Sc.sgd_dbschema" "org.Sc.sgdALIAS"    "org.Sc.sgdCHR"         "org.Sc.sgdCHRLENGTHS" 
>  [9] "org.Sc.sgdCHRLOC" "org.Sc.sgdCOMMON2ORF"  "org.Sc.sgdDESCRIPTION" "org.Sc.sgdENZYME"     
> [13] "org.Sc.sgdENZYME2ORF" "org.Sc.sgdGENENAME" "org.Sc.sgdGO"        "org.Sc.sgdGO2ALLORFS" 
> [17] "org.Sc.sgdGO2ORF"  "org.Sc.sgdINTERPRO"    "org.Sc.sgdMAPCOUNTS"  "org.Sc.sgdORGANISM"   
> [21] "org.Sc.sgdPATH"    "org.Sc.sgdPATH2ORF"    "org.Sc.sgdPFAM"       "org.Sc.sgdPMID"       
> [25] "org.Sc.sgdPMID2ORF"    "org.Sc.sgdREJECTORF"   "org.Sc.sgdSMART" .db")
>
> Yolande
>
>
>
>
>
> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch on behalf of Alex Gutteridge
> Sent: Fri 7/25/2008 4:35 AM
> To: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] Gostats with Yeast annotation
>  
> Hi,
>
> Just to confirm the org.Sc.sgd.db package and GOstats seem to work  
> fine together for me in Bioc-devel (Sample session pasted below).
>
> R version 2.8.0 Under development (unstable) (2008-07-22 r46103)
> Copyright (C) 2008 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
> R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details.
>   Natural language support but running in an English locale
> R is a collaborative project with many contributors.
> Type 'contributors()' for more information and
> 'citation()' on how to cite R or R packages in publications.
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for an HTML browser interface to help.
> Type 'q()' to quit R.
> [Previously saved workspace restored]
>  > library(Category)
> Loading required package: Biobase
> Loading required package: tools
> Welcome to Bioconductor
>   Vignettes contain introductory material. To view, type
>   'openVignette()'. To cite Bioconductor, see
>   'citation("Biobase")' and for packages 'citation(pkgname)'.
> Loading required package: graph
> Loading required package: annotate
> Loading required package: AnnotationDbi
> Loading required package: DBI
> Loading required package: RSQLite
> Loading required package: xtable
> Loading required package: genefilter
> Loading required package: survival
> Loading required package: splines
>  > library(GOstats)
> Loading required package: GO.db
> Loading required package: RBGL
>  > sel = readLines("Turbidostat.genes")
>  > uni = readLines("all.genes")
>  > params =  
> new 
> ("GOHyperGParams 
> ",geneIds 
> = 
> sel 
> ,universeGeneIds 
> = 
> uni 
> ,annotation 
> = 
> "org 
> .Sc 
> .sgd 
> .db 
> ",ontology="BP",pvalueCutoff=0.1,conditional=FALSE,testDirection="over")
>  > over = hyperGTest(params)
>  > summary(over)
>                GOBPID       Pvalue OddsRatio    ExpCount Count Size
> GO:0006412 GO:0006412 1.109223e-16  2.755286  62.1352567   125  383
> GO:0010467 GO:0010467 4.482367e-14  1.824627 225.8284001   317 1392
> GO:0009059 GO:0009059 8.256804e-14  2.232463  89.0659423   154  549
> GO:0043170 GO:0043170 8.691113e-13  1.698656 414.6676658   510 2556
> GO:0044267 GO:0044267 5.484817e-12  1.746362 214.3098539   296 1321
> GO:0019538 GO:0019538 1.268805e-11  1.718287 224.6927688   306 1385
> [..snip..]
>  > q()
> Save workspace image? [y/n/c]: n
> ag357 at ag357-pc2102:~/Desktop/study> head Turbidostat.genes
> YAL001C
> YAL002W
> YAL003W
> YAL005C
> YAL008W
> YAL009W
> YAL010C
> YAL011W
> YAL019W
> ag357 at ag357-pc2102:~/Desktop/study> head all.genes
> YHR047C
> YHR051W
> YHR066W
> YHR068W
> YHR075C
> YHR076W
> YHR080C
> YHR083W
> YHR143W-A
> YKL137W
>
> AlexG
>
> On 22 Jul 2008, at 18:07, Robert Gentleman wrote:
>
>   
>> Hi Alex,
>>  If you are willing to use R-devel and Bioc-devel, the issue should  
>> be fixed there.  I would be interested in hearing of any problems  
>> you might have (or successes) using that version.  I am waiting for  
>> some reports of success before I port this to release,
>>
>> best wishes
>>  Robert
>>
>>
>> Alex Gutteridge wrote:
>>     
>>> Hi,
>>> I've been trying to use the hyperGTest method from the GOstats  
>>> package with some yeast ORF data. I notice in this thread from a  
>>> month or so ago that there are problems at the moment with using  
>>> any of the yeast annotation sets apart from 'YEAST' (which is  
>>> deprecated) due to missing ID2EntrezID methods:
>>> https://stat.ethz.ch/pipermail/bioconductor/2008-June/022697.html
>>> I just wanted to make sure that this was still the case and I guess  
>>> fish around for an estimated ETA for when the org.Sc.sgd.db  
>>> annotations (which are replacing YEAST as I understand it) will be  
>>> compatible with hyperGTest?
>>> Also, is the exact source of GO annotations used in these packages  
>>> documented anywhere? Looking in the DESCRIPTION file I see  
>>> 'primarily based on mapping using ORF identifiers from SGD' for  
>>> org.Sc.sgd.db and 'assembled using data from public data  
>>> repositories' for YEAST. Should I just take it these are based on  
>>> the SGD GO annotation file from the date given in the Packaged  
>>> field of the DESCRIPTION file? For YEAST there is
>>>       
>> the man page is pretty explicit, (?org.Sc.sgdGO)
>>
>>     Mappings were based on data provided by: Yeast Genome (
>>     ftp://genome-ftp.stanford.edu/pub/yeast/data_download ) on
>>     2008-Mar29
>>
>> I am not sure what more we could put there.
>>
>> best wishes
>>  Robert
>>
>>     
>>> also a Created field which is aprox. 1 month prior to the Packaged  
>>> date so I'm guessing the real age of the data is that one? The  
>>> yeast annotations change so quickly it's useful to be able to pin  
>>> this down as accurately as possible.
>>> Thanks in advance for any help with these questions.
>>> Alex Gutteridge
>>> Department of Biochemistry
>>> University of Cambridge
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>       
>> -- 
>> Robert Gentleman, PhD
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M2-B876
>> PO Box 19024
>> Seattle, Washington 98109-1024
>> 206-667-7700
>> rgentlem at fhcrc.org
>>
>>     
>
> Alex Gutteridge
>
> Department of Biochemistry
> University of Cambridge
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>



More information about the Bioconductor mailing list