[BioC] ChIPpeakAnno Strandedness and distance calculation
Zhu, Julie
Julie.Zhu at umassmed.edu
Fri May 14 04:40:46 CEST 2010
Hi Dario,
You can create the annotation with strand = c(³+²). For example,
AnnotationRangedData = RangedData(IRanges(start = c(967659, 2010898,
2496700, 3075866,
+ 3123260), end = c(967869, 2011108, 2496920, 3076166, 3123470), names =
c("t1",
+ "t2", "t3", "t4", "t5")), space = c("1", "2", "3", "1", "2"), strand
=c("+"))
Please take a look at the examples given on the paper just published on BMC
Bioinformatics
http://www.biomedcentral.com/1471-2105/11/237. In case you could not open
the link, I also attached the pdf file.
Regarding your other question about distance calculation, I suggest to
create your AnnotationRangedData and PeakRangedData with start=midpoint to
get the distance between midpoints. The distance is calculated differently
for features in plus strand and minus strand. For example, to calculate the
distance between peak and TSS, the distance is calculated as the distance
between the start of the binding site and the TSS, which is the gene start
for genes located on the forward strand and the gene end for genes located
on the reverse strand. Therefore, adding another parameter would mean to
overwrite the way how the distance is calculated based on strandedness.
After you tried the above suggested way and still prefer having a new
parameter, I will be happy to add it to the next release.
Best regards,
Julie
*******************************************
Lihua Julie Zhu, Ph.D
Research Associate Professor
Program in Gene Function and Expression
Program in Molecular Medicine
University of Massachusetts Medical School
364 Plantation Street, Room 613
Worcester, MA 01605
508-856-5256
http://www.umassmed.edu/pgfe/faculty/zhu.cfm
On 5/13/10 9:00 PM, "Dario Strbenac" <D.Strbenac at garvan.org.au> wrote:
> Hello again,
>
> Just one more question. When we are looking at DNA methtylation, we don't have
> the strand of the peak (because the reverse complement of CG is CG). It seems
> that it might not be possible to do this with ChipPeakAnno ?
>
> e.g.
>
>> > head(peaksT)
> chr start end
> 1 chr13 83351701 83352000
> 2 chr13 83351401 83351700
> 3 chr20 25011901 25012200
> 4 chr13 83352001 83352300
> 5 chr8 143402101 143402400
> 6 chr2 238246801 238247100
>
>> > head(featTable)
> name chr strand start end
> 1 7896759 chr1 + 781253 783614
> 2 7896761 chr1 + 850983 869824
> 3 7896779 chr1 + 885829 890958
> 4 7896798 chr1 + 891739 900345
> 5 7896817 chr1 + 938709 939782
> 6 7896822 chr1 + 945365 981355
>
> Also, sometimes our feature table is a table of CpG islands, which don't have
> a strand associated with them.
>
> e.g.
>
>> > head(featTable2)
> chr start end CpG Island Name
> 1 chr1 18598 19673 CpG:_116
> 2 chr1 124987 125426 CpG:_30
> 3 chr1 317653 318092 CpG:_29
> 4 chr1 427014 428027 CpG:_84
> 5 chr1 439136 440407 CpG:_99
> 6 chr1 523082 523977 CpG:_94
>
> Is it possible to do this annotation with ChipPeakAnno ? Currently, the
> annotatePeakInBatch function gives me an error when I don't give it strand
> information when I create my RangedData object.
>
> Thanks,
> Dario.
>
> --------------------------------------
> Dario Strbenac
> Research Assistant
> Cancer Epigenetics
> Garvan Institute of Medical Research
> Darlinghurst NSW 2010
> Australia
>
>
On 5/13/10 8:10 PM, "Dario Strbenac" <D.Strbenac at garvan.org.au> wrote:
> Hello,
>
> Firstly, thank you for making this package. It seems so useful ! We were
> thinking of writing something like this ourselves, until I saw your package,
> because we do a lot of ChIP-Seq here.
>
> I just have a small feature request. In your distance calculation, you do
> start of peak - start of feature. Would it be possible to allow the user to
> choose if they want the distance calculation to use the start or the middle of
> the feature (and also for the peak) ? This is because we do a lot of
> methylation studies, and for CpG island features, we like to use the midpoint
> as the position of our feature. It would also be nice to be able to use the
> midpoint of the peak as the peak's position, since this is usually where the
> signal is strongest.
>
> Thanks,
> Dario.
>
> --------------------------------------
> Dario Strbenac
> Research Assistant
> Cancer Epigenetics
> Garvan Institute of Medical Research
> Darlinghurst NSW 2010
> Australia
More information about the Bioconductor
mailing list