[BioC] Question RE: getEnrichedGo in ChIPpeakAnno package

Thu Jun 10 23:29:15 CEST 2010

Hi Noah,

Yeast is difficult because that community has a strong preference for
their classic IDs.  The same is true in arabidopsis.  This is why those
two organism packages have "sgd" and "tair" in their respective package
names.  I have managed to keep the rest of the org packages entrez gene
centric however.  Another thing you can do if you find yourself with a
similar problem is to use the org.Sc.sgdENTREZID provided by the
ord.Sc.sgd.db package.  The package may be orf centric, but you can
still map to an entrez gene ID if you use the org.Sc.sgdENTREZID mapping.

  Marc

On 06/10/2010 01:53 PM, Zhu, Julie wrote:
> Hi Noah,
>
> Yes, you are right that this is due to the differences between the org.Hs.eg.db and org.Sc.sgd.db. The org.Hs.eg.db is Entrez ID centric while org.Sc.sgd.db is orf centric.  It would be nice if all the org.*.*.dbs have similar data structure and mapping. For now, I would suggest call getEnrichedGO function with a list of orfs using the following syntax. You need to first convert the list of Ensembl ID to orfs first.
>
> enrichedGO.Cse4 <- getEnrichedGO (orfs, feature_id_type="entrez_id", orgAnn="org.Sc.sgd.db", maxP=0.05, multiAdj =TRUE, minGOterm=5,  multiAdjMethod="BH")
>
> Best regards,
>
> Julie
>
>
>
> On 6/10/10 4:15 PM, "Noah Dowell" <noahd at ucla.edu> wrote:
>
> Hello All,
>
> I couldn't find a solution to my question in the archives and my attempts have been unsuccessful so hopefully someone has some advice.
>
> I have analyzed my yeast ChIP-chip tiling array data using Starr and converted my list of chip-enriched regions to RangedData to make use of the peakOverlap and GOenrichment functions in ChIPpeakAnno.  The annotatePeakInBatch function has worked nicely but I am stuck with the getEnrichedGO function.  I think the problem may be due to differences between the org.Hs.eg.db and org.Sc.sgd.db.  The org.Hs.eg.db has a mapping of ENSEMBL gene accession numbers to Entrez Gene identifiers, but the org.Sc.sgd.db completely lacks this and uses a mapping to SGD Gene Identifiers.  As far as I can tell the getEnrichedGO function calls for a mapping to Entrez Gene ids thus the error I am showing below.
>
> Does anyone know of a work around for this?
>
> Thank you for your help.
>
> Noah
>
>
>   
>> library(org.Sc.sgd.db)
>>     
>   
>> goTest <- getEnrichedGO(annoPeakChr1data, orgAnn = "org.Sc.sgd.db", maxP = 0.01, multiAdj =TRUE, minGOterm = 10, multiAdjMethod = "BH")
>>     
> Error in get(paste(GOgenome, "ENSEMBL2EG", sep = "")) :
>   object 'org.Sc.sgdENSEMBL2EG' not found
>
> ##also tried:
>
>   
>> goTest <- getEnrichedGO(annoPeakChr1data, orgAnn = "org.Sc.sgd.db", feature_id_type= "ensembl_gene_id", maxP = 0.01, multiAdj =TRUE, minGOterm = 10, multiAdjMethod = "BH")
>>     
> Error in get(paste(GOgenome, "ENSEMBL2EG", sep = "")) :
>   object 'org.Sc.sgdENSEMBL2EG' not found
>
> #### here is what my annotatePeak Object looks like:
>
>   
>> head(annoPeakChr1data)
>>     
> RangedData with 6 rows and 9 value columns across 1 space
>                    space         ranges |        peak      strand     feature start_position end_position
>              <character>      <IRanges> | <character> <character> <character>      <numeric>    <numeric>
> 01 YAL069W             I [   16,   254] |          01           1     YAL069W            335          649
> 02 YAL067W-A           I [ 2731,  2924] |          02           1   YAL067W-A           2480         2707
> 06 YAL062W             I [29935, 29959] |          06           1     YAL062W          31568        32941
> 07 YAL062W             I [30011, 30039] |          07           1     YAL062W          31568        32941
> 08 YAL062W             I [31661, 31678] |          08           1     YAL062W          31568        32941
> 09 YAL062W             I [31702, 31710] |          09           1     YAL062W          31568        32941
>
>
>
>
>
>
>
>
>
>   
>> sessionInfo()
>>     
> R version 2.11.0 (2010-04-22)
> i386-apple-darwin9.8.0
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] grid      stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>  [1] org.Sc.sgd.db_2.4.1                 rtracklayer_1.8.1                   RCurl_1.3-1                         bitops_1.0-4.1
>  [5] Starr_1.4.0                         affxparser_1.20.0                   affy_1.26.0                         Ringo_1.12.0
>  [9] Matrix_0.999375-38                  lattice_0.18-5                      RColorBrewer_1.0-2                  ChIPpeakAnno_1.4.0
> [13] limma_3.4.0                         org.Hs.eg.db_2.4.1                  GO.db_2.4.1                         RSQLite_0.8-4
> [17] DBI_0.2-5                           AnnotationDbi_1.10.0                BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.16.0
> [21] GenomicRanges_1.0.1                 Biostrings_2.16.0                   IRanges_1.6.0                       multtest_2.4.0
> [25] Biobase_2.8.0                       biomaRt_2.4.0
>
> loaded via a namespace (and not attached):
>  [1] affyio_1.16.0         annotate_1.26.0       genefilter_1.30.0     MASS_7.3-5            preprocessCore_1.10.0 pspline_1.0-14
>  [7] splines_2.11.0        survival_2.35-8       tools_2.11.0          XML_2.8-1             xtable_1.5-6
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>