[Bioc-sig-seq] Comparing two chipseq position sets

Ivan Gregoretti ivangreg at gmail.com
Thu May 7 16:02:52 CEST 2009


Hello Steve, Nicolas and Michael,

I agree with all of you: it is not a trivial question.

I asked the bioc-sig-seq listers because I thought, --Hey, this must
be the everyday's question of the genome analyst.

Say you ran your chipseq under condition A and then you ran it under
condition B. Then you have to decide whether A and B made any
difference. It doesn't get any simpler than that!

I can't compare the two means or the two dispersions. I have to
compare pairs. The problem is that it is not trivial to unambiguously
determine which spot in B must be paired with each spot in A. To start
with, A and B may have different numbers of loci (ie 15000 versus
18000).

I'll take a look at genomeIntervals and IRanges.

By the way, Michael, would you let me know as soon as the new IRanges
documentation comes out? You guys were working on something, I
understand.

Thank you all,

Ivan

Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1592
Fax: 1-301-496-9878



On Thu, May 7, 2009 at 9:24 AM, Michael Lawrence <mflawren at fhcrc.org> wrote:
>
>
> On Wed, May 6, 2009 at 12:40 PM, Ivan Gregoretti <ivangreg at gmail.com> wrote:
>>
>> Hello Bioc-sig-seq,
>>
>> Say you run your ChIP-seq and find binding positions like this
>>
>> chr1  3660781  3662707
>> chr1  4481742  4482656
>> chr1  4482813  4484003
>> chr1  4561320  4562262
>> chr1  4774887  4776304
>> chr1  4797291  4798822
>> chr1     4847807  4848846
>> chr1  5008093  5009386
>> chr1  5009514  5010046
>> chr1  5010095  5010583
>> ...[many more loci and chromosomes]...
>>
>> Then you want to compare it to published data like this
>>
>> chr1  3659579  3662079
>> chr1  4773791  4776291
>> chr1  4797473  4799973
>> chr1  4847394  4849894
>> chr1  5007460  5009960
>> chr1  5072753  5075253
>> chr1  6204242  6206742
>> chr1  7078730  7081230
>> chr1  9282452  9284952
>> chr1  9683423  9685923
>> ...[many more loci and chromosomes]...
>>
>> What method would you use to test whether these two lists are
>> significantly different?
>
> This is a tough statistical question that probably needs to be a bit more
> specific, but as far as technical tools, in addition to genomeIntervals
> there is the IRanges package and its efficient "overlap" function. IRanges
> is well integrated with the rest of sequence analysis infrastructure in
> Bioconductor.
>
>>
>> Any pointer would be appreciated.
>>
>> Ivan
>>
>> Ivan Gregoretti, PhD
>> National Institute of Diabetes and Digestive and Kidney Diseases
>> National Institutes of Health
>> 5 Memorial Dr, Building 5, Room 205.
>> Bethesda, MD 20892. USA.
>> Phone: 1-301-496-1592
>> Fax: 1-301-496-9878
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
>



More information about the Bioc-sig-sequencing mailing list