[BioC] ChIPpeakAnno::getEnrichedGo crashes but I don't know why

Wed Jan 5 22:15:13 CET 2011

Eric,

You could convert the exon IDs associated with the peaks to ensemble gene
IDs and input these ensemble gene IDs to the getEnrichedGO function.

Best regards,

Julie

On 1/5/11 4:09 PM, "Eric Cabot" <elcabot at gmail.com> wrote:

> Hi Julie,
> 
>    It may be a while before I get back to you on this, because I did my
> mapping and ChIP-Seq analysis with Hg19 (NCBI 37), not Hg18 (NCBI 36).
> I'm also a little concerned about using transcription start site
> annotations rather than exons, because the the binding domains are not
> thought to be restricted to only promoters.  Any suggestions?
> 
> Eric
> 
> 
> 
> Zhu, Lihua (Julie) wrote:
>> Eric,
>> 
>> The annotated dataset has exon ID instead of gene ID while the getEnrichedGO
>> is expecting feature_id_type="ensembl_gene_id". For a list of supported
>> feature_id_type, please type ?getEnrichedGO.
>> 
>> To use getEnrichedGO function, first get the TSS annotation.
>> 
>> TSS.human.NCBI36 = getAnnotation(ENSEMBLE_GENES_MART, featureType="TSS")
>> 
>> or use the build in TSS as
>> 
>> data(TSS.human.NCBI36)
>> 
>> Then annotate your peaks with TSS.human.NCBI36 followed by getEnrichedGO
>> call.
>> 
>> Please let me know if this works for you.
>> 
>> Best regards,
>> 
>> Julie
>> 
>> 
>> 
>> 
>> On 1/5/11 12:29 PM, "Eric Cabot" <elcabot at gmail.com> wrote:
>> 
>>> Hi Julie,
>>> 
>>>   Thank you for your response.
>>> 
>>> Here is the sessionInfo and traceback output and also a few lines of
>>> "my_annotated_regions".
>>> 
>>> Regards,
>>> 
>>> Eric Cabot
>>> 
>>>> sessionInfo()
>>> R version 2.12.1 (2010-12-16)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>> 
>>> locale:
>>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>   [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>> 
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>> 
>>> other attached packages:
>>>   [1] ChIPpeakAnno_1.6.0                  limma_3.6.9
>>>   [3] org.Hs.eg.db_2.4.6                  GO.db_2.4.5
>>>   [5] RSQLite_0.9-4                       DBI_0.2-5
>>>   [7] AnnotationDbi_1.12.0
>>> BSgenome.Ecoli.NCBI.20080805_1.3.16
>>>   [9] BSgenome_1.18.2                     GenomicRanges_1.2.2
>>> [11] Biostrings_2.18.2                   IRanges_1.8.8
>>> [13] multtest_2.6.0                      Biobase_2.10.0
>>> [15] biomaRt_2.6.0
>>> 
>>> loaded via a namespace (and not attached):
>>> [1] MASS_7.3-9      RCurl_1.5-0     splines_2.12.1  survival_2.36-2
>>> [5] tools_2.12.1    XML_3.2-0
>>> my_enrichedGO<-getEnrichedGO(my_annotated_regions,orgAnn="org.Hs.eg.db",maxP
>>> =0
>>> .01,multiAdj=FALSE,minGOterm=1,feature_id_type="ensembl_gene_id")
>>> Error in if (class(go.ids) != "matrix" | dim(go.ids)[2] < 4) { :
>>>    argument is of length zero
>>>> traceback()
>>> 2: addAncestors(this.GO[this.GO[, 3] == "BP", ], "bp")
>>> 1: getEnrichedGO(FC2_annotated_regions, orgAnn = "org.Hs.eg.db",
>>>         maxP = 0.01, multiAdj = FALSE, minGOterm = 1, feature_id_type =
>>> "ensembl_gene_id")
>>> 
>>> 
>>> 
>>>> as.data.frame(my_annotated_regions[1:15,])
>>>     space     start       end width                   names    peak strand
>>> 1      1 241997936 241998205   270 R-10060 ENSE00001749374 R-10060      +
>>> 2      1 237109743 237110002   260 R-10082 ENSE00001643382 R-10082      +
>>> 3      1 236080267 236080415   149 R-10086 ENSE00001807176 R-10086      +
>>> 4      1 233853245 233853514   270 R-10096 ENSE00001776382 R-10096      +
>>> 5      1 233727956 233728104   149 R-10097 ENSE00001442190 R-10097      +
>>> 6      1 230728554 230728823   270 R-10108 ENSE00001731401 R-10108      +
>>> 7      1 229687129 229687277   149 R-10113 ENSE00001439385 R-10113      +
>>> 8      1 228943263 228943412   150 R-10121 ENSE00001903546 R-10121      +
>>> 9      1 218358885 218359176   292 R-10157 ENSE00001439386 R-10157      +
>>> 10     1 212254259 212254408   150 R-10179 ENSE00001624346 R-10179      +
>>> 11     1 210086264 210086513   250 R-10184 ENSE00001903225 R-10184      +
>>> 12     1 209863549 209863698   150 R-10185 ENSE00001336255 R-10185      +
>>> 13     1 207437117 207437264   148 R-10190 ENSE00001742112 R-10190      +
>>> 14     1 190352400 190352548   149 R-10246 ENSE00001782518 R-10246      +
>>> 15     1 184432607 184432755   149 R-10260 ENSE00001283926 R-10260      +
>>>             feature start_position end_position insideFeature
>>> distancetoFeature
>>> 1  ENSE00001749374      241995237    241996089    downstream
>>> 2699
>>> 2  ENSE00001643382      237144639    237145008      upstream
>>> -34896
>>> 3  ENSE00001807176      236078715    236078821    downstream
>>> 1552
>>> 4  ENSE00001776382      233807017    233807237    downstream
>>> 46228
>>> 5  ENSE00001442190      233749750    233750272      upstream
>>> -21794
>>> 6  ENSE00001731401      230728406    230728586    overlapEnd
>>> 148
>>> 7  ENSE00001439385      229685652    229685769    downstream
>>> 1477
>>> 8  ENSE00001903546      228882063    228882416    downstream
>>> 61200
>>> 9  ENSE00001439386      218303137    218303294    downstream
>>> 55748
>>> 10 ENSE00001624346      212253973    212254092    downstream
>>> 286
>>> 11 ENSE00001903225      210111538    210111622      upstream
>>> -25274
>>> 12 ENSE00001336255      209859550    209859630    downstream
>>> 3999
>>> 13 ENSE00001742112      207438342    207438381      upstream
>>> -1225
>>> 14 ENSE00001782518      190331193    190331400    downstream
>>> 21207
>>> 15 ENSE00001283926      184446520    184446737      upstream
>>> -13913
>>>     shortestDistance fromOverlappingOrNearest
>>> 1              1847             NearestStart
>>> 2             34637             NearestStart
>>> 3              1446             NearestStart
>>> 4             46008             NearestStart
>>> 5             21646             NearestStart
>>> 6                32             NearestStart
>>> 7              1360             NearestStart
>>> 8             60847             NearestStart
>>> 9             55591             NearestStart
>>> 10              167             NearestStart
>>> 11            25025             NearestStart
>>> 12             3919             NearestStart
>>> 13             1078             NearestStart
>>> 14            21000             NearestStart
>>> 15            13765             NearestStart
>>> 
>>> 
>>> Zhu, Lihua (Julie) wrote:
>>>> Hi Eric,
>>>> 
>>>> Could you please post the session information with sessionInfo() command?
>>>> Could you please also send a few ensembl IDs in your annotated dataset?
>>>> Thanks!
>>>> 
>>>> Best regards,
>>>> 
>>>> Julie
>>>> 
>>>> 
>>>> On 1/4/11 6:51 PM, "Eric Cabot" <elcabot at gmail.com> wrote:
>>>> 
>>>>> I am a relatively new Bioconductor user and I am trying to analyze some
>>>>> ChIP-seq results that came from QuEST using the ChIPpeakAnno package.
>>>>> 
>>>>> After importing the regions of interest into RangedData objects and doing
>>>>> the following:
>>>>> 
>>>>> 
>>>> ENSEMBLE_GENES_MART<-useMart(biomart="ensembl",dataset="hsapiens_gene_ensem
>>>> bl
>>>> ">
>>>> )
>>>>> ENSEMBL_ExonPlus_Annotation<-getAnnotation(ENSEMBLE_GENES_MART,
>>>>> featureType="ExonPlusUtr")
>>>>> 
>>>>> 
>>>>> I had no problem annotating  and generating  a Venn diagram to show the
>>>>> overlaps between my three sets of peaks. To annotate, I used:
>>>>> 
>>>>> annotated_regions=annotatePeakInBatch(myranged,
>>>>> AnnotationData=ENSEMBL_ExonPlus_Annotation)
>>>>> 
>>>>> 
>>>>> But I cannot seem to get the getEnrichedGo method to work on this (or my
>>>>> other two annotated regions). Here is a typical command line:
>>>>> 
>>>>> 
>>>>> my_enrichedGO<-getEnrichedGO(annotated_regions,orgAnn="org.Hs.eg.db",maxP=
>>>>> 0.
>>>>> 01
>>>>> ,multiAdj=TRUE,minGOterm=1,
>>>>> multiAdjMethod="BH",feature_id_type="ensembl_gene_id")
>>>>> 
>>>>> and here is a typical error message:
>>>>> 
>>>>> enrichedGO<-getEnrichedGO(annotated_regions,orgAnn="org.Hs.eg.db",maxP=0.0
>>>>> 1,
>>>>> mu
>>>>> ltiAdj=TRUE,minGOterm=1,feature_id_type="ensembl_gene_id")
>>>>> Error in if (class(go.ids) != "matrix" | dim(go.ids)[2] < 4) { :
>>>>>    argument is of length zero
>>>>> 
>>>>> 
>>>>> Which leads me to ask:
>>>>> 
>>>>> 1) Is this error message supposed to be meaningful to me-i.e. a user-or is
>>>>> it something that I should be sending to the developer of the package?
>>>>> 
>>>>> 2) Is there anything obvious from this that suggests what corrective
>>>>> action I should be taking?
>>>>> 
>>>>> 
>>>>> Eric Cabot
>>>>> University of Wisconsin
>>>>> 
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>> 
>>>> 
>> 
>> 
>