[BioC] ChIPpeakAnno
Zhu, Lihua (Julie)
Julie.Zhu at umassmed.edu
Thu Jun 20 14:37:50 CEST 2013
Ann,
Thanks for the feedback!
Your function call is correct. However, there is a difference between maxgap
and distancetoFeature (or shortestDistance). Maxgap specifies the maximum
gap between two ranges instead of the distance between the ends. For
example, when two ranges overlap, then the gap between the two ranges is 0
(no gap) although the distancetoFeature might be greater than 0 which is
calculated as start of peak - the start of the feature.
Here is a toy example
peak: chr1:1000-1600
feature: chr1:300-2000
distance2Feature = 1000 - 300 = 700
shortestDistance = min(abs(1000-300), abs(1000-2000), abs(1600-300),
abs(1600-2000)) = 400 where abs = absolute value
Gap = 0 because these two ranges overlap
Please let me know if this makes sense.
Please CC bioconductor in the subsequent communications for others to
input/benefit. Thanks!
Best regards,
Julie
On 6/20/13 3:00 AM, "Ann Mongan" <amongan at quanticel.com> wrote:
> Dear Julie,
> Thank you for developing ChIPpeakAnno, I find it very useful.
> Anyway, I¹m using ChIPpeakAnno_2.2.0. I found some peculiarity with how my
> peaks are assign to features that are outside of maxgap (example below).
> Could you help me understand why I get these results? I suppose some
> arguments must not be set correctly.
> Thanks for your help.
> Ann
>
> t1 = findOverlappingPeaks(ASR, refseqRanges, maxgap=5000, multiple=TRUE,
> select='all',NameOfPeaks1='KDM5B',NameOfPeaks2='RefSeq')
>
>> head(t1$OverlappingPeaks[t1$OverlappingPeaks$shortestDistance >5000,])
> KDM5B chr RefSeq RefSeq_start RefSeq_end strand KDM5B_start KDM5B_end
> strand1 overlapFeature shortestDistance
> 62 00033 1 02323 860260 879955 + 870589 871263
> + inside 8692
> 63 00034 1 02323 860260 879955 + 871383 871883
> + inside 8072
> 64 00035 1 02323 860260 879955 + 873522 874033
> + inside 5922
> 120 00062 1 02363 955503 991496 + 964918 966100
> + inside 9415
> 121 00063 1 02363 955503 991496 + 975841 976296
> + inside 15200
> 138 00081 1 02398 1109264 1133315 + 1120693 1121410
> + inside 11429
>
>
>
> p = annotatePeakInBatch(head(ASR,100), AnnotationData=refseqRanges,
> output="both", maxgap=5000,
> PeakLocForDistance="middle", FeatureLocForDistance="TSS",select="all")
>
>> head(as.data.frame(p)[p$distancetoFeature>5000,])
> space start end width names peak strand
> feature start_position end_position insideFeature distancetoFeature
> shortestDistance
> 7 chr1 870589 871263 675 33 1244.NM_152486.SAMD11 33 +
> 1244.NM_152486.SAMD11 861120 879961 inside
> 9806 8698
> 8 chr1 871383 871883 501 34 1244.NM_152486.SAMD11 34 +
> 1244.NM_152486.SAMD11 861120 879961 inside
> 10513 8078
> 9 chr1 873522 874033 512 35 1244.NM_152486.SAMD11 35 +
> 1244.NM_152486.SAMD11 861120 879961 inside
> 12658 5928
> 10 chr1 874123 875130 1008 36 1244.NM_152486.SAMD11 36 +
> 1244.NM_152486.SAMD11 861120 879961 inside
> 13506 4831
> 11 chr1 875328 875693 366 37 1244.NM_152486.SAMD11 37 +
> 1244.NM_152486.SAMD11 861120 879961 inside
> 14390 4268
> 12 chr1 875720 879253 3534 38 1244.NM_152486.SAMD11 38 +
> 1244.NM_152486.SAMD11 861120 879961 inside
> 16366 708
> fromOverlappingOrNearest
> 7 NearestStart
> 8 NearestStart
> 9 NearestStart
> 10 NearestStart
> 11 NearestStart
> 12 NearestStart
>
>
>
More information about the Bioconductor
mailing list