[BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno

Zhu, Lihua (Julie) Julie.Zhu at umassmed.edu
Sun Mar 17 01:25:44 CET 2013


Stefan,

Could you please send me a simple dataset to see what went wrong? Thanks for
letting know!

Thanks for hashing out the chromosome region enrichment score calculation!
Looks like you are looking for something very similar to GO enrichment
analysis (getEnrichedGO function in ChIPpeakAnno).

For option a, as you mentioned, we will need the genome level estimation of
the distribution for each category, which could be tricky. Once we have
these estimation, then Hypergeometric test can be applied to find whether a
category is enriched/depleted.

Jaccard index (option b) is an interesting alternative. We could implement
Jaccard index at the peak level first and add options for computing Jaccard
index at the nucleotide level later.

Best regards,

Julie


On 3/16/13 7:41 PM, "Stefan Pinter" <pinter at molbio.mgh.harvard.edu> wrote:

> Julie, I compared my results from ChipPeakAnno with a simple overlap in
> bedtools: the % of peaks inside genes (exon + intron + 3utr + 5utr) reported
> by ChIPpeakAnno is over an order of magnitude lower than what I learned from
> bedtools. Browsing the data indicates that bedtools is right. Something is
> off, do you have a test.data set you can check against?
> Thanks,
> -S.
> 
> ----- Original Message -----
> From: "Stefan Pinter" <pinter at molbio.mgh.harvard.edu>
> To: "Lihua Zhu (Julie)" <Julie.Zhu at umassmed.edu>
> Cc: "bioconductor" <bioconductor at r-project.org>
> Sent: Saturday, March 16, 2013 1:30:55 PM
> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno
> 
> Hi Julie, 
> 
> well, I was more thinking of something along the lines of either:
> 
> a.) Over/Under-representation:
> Percentage of exons in genome = 2%
> Percentage of peaks overlapping exons = 10%
> Exons are 5-fold over-represented
> 
> But I realize now, that will only work with a full annotation, not a custom
> annotation.
> 
> b.) something like the jaccard index used in bedtools:
> 
> total length of intersection / total length of union
> 
> http://en.wikipedia.org/wiki/Jaccard_index
> 
> Just suggestions, there are probably many more possible ways to calculate a
> score.
> Thank you,
> - Stefan.
> 
> ----- Original Message -----
> From: "Lihua Zhu (Julie)" <Julie.Zhu at umassmed.edu>
> To: "Stefan Pinter" <pinter at molbio.mgh.harvard.edu>
> Cc: "<bioconductor at r-project.org>" <bioconductor at r-project.org>
> Sent: Friday, March 15, 2013 1:28:09 PM
> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno
> 
> Stefan,
> 
> Regarding the enrichment/depletion score, Does the following toy example
> illustrate how to calculate such score? If not, could you please give a toy
> example? Thanks!
> 
> For example, if the total number of peaks = 100, number of peaks assigned to
> promoter = 90, number of peaks assigned to enhancer = 10, then the
> enrichment score = 90% and 10% for promoter region and enhancer region
> respectively.
> 
> We will add closest to annotatePeakInBatch in the dev version. Thanks for
> the great suggestion!
> 
> Please cc bioconductor at r-project.org. Thanks!
> 
> Best regards,
> 
> Julie
> 
> 
> On 3/15/13 12:26 PM, "Stefan Pinter" <pinter at molbio.mgh.harvard.edu> wrote:
> 
>> Sure, thank you Julie.
>> 
>> Yes, that is what I meant. Since all the necessary information (peak
>> locations, feature locations, total count, total length) are already
>> available
>> in RangedData & Annotation, it would be very convenient to calculate
>> enrichment/depletion and possibly even significance scores (by permutation)
>> on
>> the fly. The significance score may be overkill, but if the function at least
>> reported enrichment/depletion scores, the user could always supply a number
>> of
>> shuffled ranges to build a random model of enrichment scores and calculate
>> significance after.
>> 
>> In addition, would it be possible to add another definition for
>> PeakLocForDistance in annotatePeakInBatch?
>> PeakLocForDistance = "start means using start of the peak to calculate the
>> distance to feature"
>> 
>> It would be helpful to have "closest", meaning distance to feature measured
>> from peak start or peak end, whichever is closer. That would help with broad
>> peaks, which using "middle" for isn't very helpful.
>> 
>> Thank you and Best wishes,
>> - Stefan.
>> 
>> ----- Original Message -----
>> From: "Lihua Zhu (Julie)" <Julie.Zhu at umassmed.edu>
>> To: "Stefan Pinter" <pinter at molbio.mgh.harvard.edu>
>> Cc: "<bioconductor at r-project.org>" <bioconductor at r-project.org>
>> Sent: Friday, March 15, 2013 8:06:03 AM
>> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno
>> 
>> Stefan,
>> 
>> You can download 2.6.1 to directly use assignChromosomeRegion without the
>> package prefix (I updated last night and it might take sometime to propagate
>> to the installation site). To find out what parameters supported by the
>> function, in R, please type help( ChIPpeakAnno:::assignChromosomeRegion).
>> You will see that indeed PeakLocForDistance is not supported. If needed, I
>> can add such parameter.
>> 
>> Also regarding your previous suggestion of adding enrichment status of
>> feature length, do you mean enrichment of peaks in certain category of
>> chromosome region? For example, a significant enrichment score with 90%
>> peaks in promoter region would be interesting.
>> 
>> Could you please keep the thread in the Bioc for others to
>> contribute/benefit? Thanks!
>> 
>> Best regards,
>> 
>> Julie
>> 
>> 
>> On 3/14/13 10:10 PM, "Stefan Pinter" <pinter at molbio.mgh.harvard.edu> wrote:
>> 
>>> PPS. I think PeakLocForDistance is not working in that function:
>>> 
>>>> l.feat.dRF8.100k <-ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS,
>>>> Exon,
>>>> utr5, utr3, proximal.promoter.cutoff=100000,
>>>> immediate.downstream.cutoff=100000, PeakLocForDistance="middle")
>>> Error in ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, utr5,  :
>>>   unused argument(s) (PeakLocForDistance = "middle")
>>>> l.feat.dRF8.100k <-ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS,
>>>> Exon,
>>>> utr5, utr3, proximal.promoter.cutoff=100000,
>>>> immediate.downstream.cutoff=100000, PeakLocForDistance= middle)
>>> Error in ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, utr5,  :
>>>   unused argument(s) (PeakLocForDistance = middle)
>>>> l.feat.dRF8.100k <-ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS,
>>>> Exon,
>>>> utr5, utr3, proximal.promoter.cutoff=100000,
>>>> immediate.downstream.cutoff=100000, PeakLocForDistance = c("middle"))
>>> Error in ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, utr5,  :
>>>   unused argument(s) (PeakLocForDistance = c("middle"))
>>> 
>>> ----- Original Message -----
>>> From: "Stefan Pinter" <pinter at molbio.mgh.harvard.edu>
>>> To: "Lihua Zhu (Julie)" <Julie.Zhu at umassmed.edu>
>>> Sent: Thursday, March 14, 2013 9:39:10 PM
>>> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno
>>> 
>>> PS. if I might add a simple feature addition to that function - it would be
>>> a
>>> TRUE/FALSE trigger for whether to also report enrichment/depletion stats
>>> based
>>> on feature lengths, i.e. 50% peaks labeled as intergenic (or enhancers) is
>>> less interesting than 50% of peaks reported as exonic (much greater
>>> enrichment
>>> as total exonic feature length is smaller). Thanks, Best...S
>>> 
>>> ----- Original Message -----
>>> From: "Stefan Pinter" <pinter at molbio.mgh.harvard.edu>
>>> To: "Lihua Zhu (Julie)" <Julie.Zhu at umassmed.edu>
>>> Cc: "bioconductor" <bioconductor at r-project.org>
>>> Sent: Thursday, March 14, 2013 9:32:33 PM
>>> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno
>>> 
>>> Hi Julie,
>>> 
>>> yes, that worked! Thank you for the quick help, I very much appreciate it!
>>> Thank you and Best wishes,
>>> - Stefan.
>>> 
>>> ----- Original Message -----
>>> From: "Lihua Zhu (Julie)" <Julie.Zhu at umassmed.edu>
>>> To: "Lihua Zhu (Julie)" <Julie.Zhu at umassmed.edu>, "Stefan Pinter"
>>> <pinter at molbio.mgh.harvard.edu>
>>> Cc: "<bioconductor at r-project.org>" <bioconductor at r-project.org>
>>> Sent: Thursday, March 14, 2013 8:50:51 PM
>>> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno
>>> 
>>> Stefan,
>>> 
>>> Thanks for reporting this! Actually the function assignChromosomeRegion  is
>>> not exported.
>>> 
>>> Could you please try the following to see if you can use it? Thanks!
>>> 
>>> ChIPpeakAnno:::assignChromosomeRegion
>>> 
>>> Best regards,
>>> 
>>> Julie
>>> 
>>> 
>>> On 3/14/13 7:36 PM, "Lihua Julie Zhu" <julie.zhu at umassmed.edu> wrote:
>>> 
>>>> Stefan,
>>>> 
>>>> Could you please send me the sessionInfo? I just want to make sure you
>>>> installed version 2.6.0. Thanks!
>>>> 
>>>> I noticed that when I try to install ChIPpeakAnno with the following code,
>>>> I
>>>> get 2.4 version instead.
>>>> 
>>>> source("http://bioconductor.org/biocLite.R")
>>>> biocLite("ChIPpeakAnno")
>>>> 
>>>> Best regards,
>>>> 
>>>> Julie



More information about the Bioconductor mailing list