[BioC] Copy Number Analysis for Mapreduce

Sat Mar 17 19:33:29 CET 2012

Hi,

On Sat, Mar 17, 2012 at 12:54 PM, My Coyne <mcoyne at boninc.com> wrote:
>
> I'm in search of copy number analysis implementation that would fit for Hadoop/Mapreduce paradigm; I appreciate if anyone has used/experienced with copy number analysis that can be
> used with Hadoop/Mapreduce and point me to those.
>
> Hadoop is a software framework on Linux that allows for large scale distributed data analysis. Hadoop uses MapReduce paradigm to implement its fault tolerant distributed computing
> system over large datasets on cluster's distributed file system.  In Mapreduce paradigm there are separate Map and Reduce steps, each step is done in parallel; hence program execution is
> divided into a Map and a Reduce stage.  For such reason, I am looking for Copy Number Analyssis Algorithm fits into the MapReduce paradigm.

You might want to start looking at the GATK:
http://www.broadinstitute.org/gsa/wiki/index.php/The_Genome_Analysis_Toolkit

I'm not sure if it has exactly what you want, but it could be a good
place to start as a foundation/toolbox if you're looking to build such
a thing. From their website:

"""
The Genome Analysis Toolkit (GATK) is a structured programming
framework designed to enable rapid development of efficient and robust
analysis tools for next-generation DNA sequencers. The GATK solves the
data management challenge by separating data access patterns from
analysis algorithms, using the functional programming philosophy of
Map/Reduce
"""

HTH,

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact