[BioC] ChIPpeakAnno Strandedness and distance calculation
Dario Strbenac
D.Strbenac at garvan.org.au
Mon May 17 01:04:58 CEST 2010
Not quite. I mean that midpoint to midpoint distance would be good when the feature didn't have a strand. If the feature does have strand information, then it'd be nice to have midpoint of peak to start (start for + strand, end for - strand) of feature.
Thanks,
Dario.
---- Original message ----
>Date: Fri, 14 May 2010 07:43:38 -0400
>From: "Zhu, Julie" <Julie.Zhu at umassmed.edu>
>Subject: Re: ChIPpeakAnno Strandedness and distance calculation
>To: "D.Strbenac at garvan.org.au" <D.Strbenac at garvan.org.au>
>Cc: "bioconductor" <bioconductor at stat.math.ethz.ch>
>
> Hi Dario,
>
> Thanks for the clarification! So you still want the
> nearest feature and insideFeature column to be
> calculated using start (+ strand) and end (-
> strand). You only need an extra column with distance
> from midpoint to midpoint to be included?
>
> Best regards,
>
> Julie
>
> On 5/14/10 12:20 AM, "Dario Strbenac"
> <D.Strbenac at garvan.org.au> wrote:
>
> Oh, actually I thought of a problem with making
> start = midpoint. If you modify the start position
> to be the average, then you might get the wrong
> values in insideFeature column.
>
> e.g.
>
> Peak
> Real coordinates
> --------------------
> chr | start | end |
> -------------------
> chr1 | 5000 | 5500 |
>
> Peak
> modified coordinates (start is midpoint)
> --------------------
> chr | start | end |
> -------------------
> chr1 | 5250 | 5500 |
>
> A gene
> -----------------------------
> chr | start | end | strand |
> -----------------------------
> chr1 | 1000 | 5100 | - |
>
> So, instead of being 'overlapStart', it is called
> as 'upstream'.
>
> It would be good if the package worked out an
> extra column for the tables, called 'position' and
> used the position for the distances, and the real
> start and end positions for the overlapping.
>
> e.g. Something like this
>
> if("strand" %in% colnames(table))
> {
> table$position = ifelse(table$strand == '+',
> table$start, table$end)
> } else {
> table$position = round((table$start + table$end)
> / 2)
> }
>
> - Dario.
> ---- Original message ----
> >Date: Thu, 13 May 2010 22:40:46 -0400
> >From: "Zhu, Julie" <Julie.Zhu at umassmed.edu>
> >Subject: Re: ChIPpeakAnno Strandedness and
> distance calculation
> >To: "D.Strbenac at garvan.org.au"
> <D.Strbenac at garvan.org.au>
> >Cc: "bioconductor"
> <bioconductor at stat.math.ethz.ch>
> >
> >Hi Dario,
> >
> >You can create the annotation with strand =
> c("+"). For example,
> >
> >AnnotationRangedData = RangedData(IRanges(start =
> c(967659, 2010898,
> >2496700, 3075866,
> >+ 3123260), end = c(967869, 2011108, 2496920,
> 3076166, 3123470), names =
> >c("t1",
> >+ "t2", "t3", "t4", "t5")), space = c("1", "2",
> "3", "1", "2"), strand
> >=c("+"))
> >
> >Please take a look at the examples given on the
> paper just published on BMC
> >Bioinformatics
> >http://www.biomedcentral.com/1471-2105/11/237. In
> case you could not open
> >the link, I also attached the pdf file.
> >
> >Regarding your other question about distance
> calculation, I suggest to
> >create your AnnotationRangedData and
> PeakRangedData with start=midpoint to
> >get the distance between midpoints. The distance
> is calculated differently
> >for features in plus strand and minus strand. For
> example, to calculate the
> >distance between peak and TSS, the distance is
> calculated as the distance
> >between the start of the binding site and the
> TSS, which is the gene start
> >for genes located on the forward strand and the
> gene end for genes located
> >on the reverse strand. Therefore, adding another
> parameter would mean to
> >overwrite the way how the distance is calculated
> based on strandedness.
> >After you tried the above suggested way and still
> prefer having a new
> >parameter, I will be happy to add it to the next
> release.
> >
> >Best regards,
> >
> >Julie
> >
> >
> >*******************************************
> >Lihua Julie Zhu, Ph.D
> >Research Associate Professor
> >Program in Gene Function and Expression
> >Program in Molecular Medicine
> >University of Massachusetts Medical School
> >364 Plantation Street, Room 613
> >Worcester, MA 01605
> >508-856-5256
> >http://www.umassmed.edu/pgfe/faculty/zhu.cfm
> >
> >
> >
> >
> >
> >On 5/13/10 9:00 PM, "Dario Strbenac"
> <D.Strbenac at garvan.org.au> wrote:
> >
> >> Hello again,
> >>
> >> Just one more question. When we are looking at
> DNA methtylation, we don't have
> >> the strand of the peak (because the reverse
> complement of CG is CG). It seems
> >> that it might not be possible to do this with
> ChipPeakAnno ?
> >>
> >> e.g.
> >>
> >>> > head(peaksT)
> >> chr start end
> >> 1 chr13 83351701 83352000
> >> 2 chr13 83351401 83351700
> >> 3 chr20 25011901 25012200
> >> 4 chr13 83352001 83352300
> >> 5 chr8 143402101 143402400
> >> 6 chr2 238246801 238247100
> >>
> >>> > head(featTable)
> >> name chr strand start end
> >> 1 7896759 chr1 + 781253 783614
> >> 2 7896761 chr1 + 850983 869824
> >> 3 7896779 chr1 + 885829 890958
> >> 4 7896798 chr1 + 891739 900345
> >> 5 7896817 chr1 + 938709 939782
> >> 6 7896822 chr1 + 945365 981355
> >>
> >> Also, sometimes our feature table is a table of
> CpG islands, which don't have
> >> a strand associated with them.
> >>
> >> e.g.
> >>
> >>> > head(featTable2)
> >> chr start end CpG Island Name
> >> 1 chr1 18598 19673 CpG:_116
> >> 2 chr1 124987 125426 CpG:_30
> >> 3 chr1 317653 318092 CpG:_29
> >> 4 chr1 427014 428027 CpG:_84
> >> 5 chr1 439136 440407 CpG:_99
> >> 6 chr1 523082 523977 CpG:_94
> >>
> >> Is it possible to do this annotation with
> ChipPeakAnno ? Currently, the
> >> annotatePeakInBatch function gives me an error
> when I don't give it strand
> >> information when I create my RangedData object.
> >>
> >> Thanks,
> >> Dario.
> >>
> >> --------------------------------------
> >> Dario Strbenac
> >> Research Assistant
> >> Cancer Epigenetics
> >> Garvan Institute of Medical Research
> >> Darlinghurst NSW 2010
> >> Australia
> >>
> >>
> >
> >On 5/13/10 8:10 PM, "Dario Strbenac"
> <D.Strbenac at garvan.org.au> wrote:
> >
> >> Hello,
> >>
> >> Firstly, thank you for making this package. It
> seems so useful ! We were
> >> thinking of writing something like this
> ourselves, until I saw your package,
> >> because we do a lot of ChIP-Seq here.
> >>
> >> I just have a small feature request. In your
> distance calculation, you do
> >> start of peak - start of feature. Would it be
> possible to allow the user to
> >> choose if they want the distance calculation to
> use the start or the middle of
> >> the feature (and also for the peak) ? This is
> because we do a lot of
> >> methylation studies, and for CpG island
> features, we like to use the midpoint
> >> as the position of our feature. It would also
> be nice to be able to use the
> >> midpoint of the peak as the peak's position,
> since this is usually where the
> >> signal is strongest.
> >>
> >> Thanks,
> >> Dario.
> >>
> >> --------------------------------------
> >> Dario Strbenac
> >> Research Assistant
> >> Cancer Epigenetics
> >> Garvan Institute of Medical Research
> >> Darlinghurst NSW 2010
> >> Australia
> >
> >
>
> --------------------------------------
> Dario Strbenac
> Research Assistant
> Cancer Epigenetics
> Garvan Institute of Medical Research
> Darlinghurst NSW 2010
> Australia
More information about the Bioconductor
mailing list