[BioC] Call for comments on analyzing aCGH data with huge number of probes on a single chromosome

pingzhao Hu phu at sickkids.ca
Fri Apr 4 21:48:14 CEST 2008


Sean,
Thanks,This is really helpful!
I just test the chromosome with 3.5M probes in a single sample, it 
took less than 20 minutes to get the job done.

Dr. Shannon, I also very thank for your useful comments!
Have a great weekend.

Pingzhao


At 12:35 PM 4/4/2008, Sean Davis wrote:
>On Fri, Apr 4, 2008 at 12:09 PM, pingzhao Hu <phu at sickkids.ca> wrote:
> >
> >  Sean,
> >  Thanks!
> >  The gold is to identify copy number variation from normal human samples.
> >  I have tried CBS, cghFLasso
> >  (http://biostatistics.oxfordjournals.org/cgi/content/abstract/kxm013v1)
> >  our own method
> >  (http://biostatistics.oxfordjournals.org/cgi/content/abstract/kxl035v1),
> >  etc methods.
>
>You probably have a few options.  First, you could try "smoothing" the
>data by using a moving window average or some such thing to reduce
>noise and reduce the number of probes.  I think Nimblegen does this
>for data that they give back to customers when they do CGH for
>service.  With the reduced-dimensionality data, you could then apply
>your method of choice.  Obviously, you loose resolution doing this.
>Another alternative is an algorithm called "stepgram" developed by
>Doron Lipson.  It is used in the CGHAnalytics commercial package
>available from Agilent (where it is called ADM-1).  It is also
>available as a windows executable from here:
>
>http://bioinfo.cs.technion.ac.il/stepgram/
>
>I have an R package that uses that algorithm that, unfortunately, I am
>not allowed to distribute.  That said, it is by far the fastest
>algorithm that I have tested for CGH analysis.  For comparison, for
>200k probes, Stepgram runs in 4 seconds, aCGH in about 50 seconds,
>DNAcopy (CBS) and GLAD in about 400 seconds.
>
>Hope that helps,
>
>Sean
>
>
> >  Pingzhao
> >
> >
> >  At 11:45 AM 4/4/2008, Sean Davis wrote:
> >  >On Fri, Apr 4, 2008 at 11:38 AM, pingzhao Hu <phu at sickkids.ca> wrote:
> >  > >
> >  > >  Hi All,
> >  > >  I have a question about analyzing aCGH data with huge number of
> >  > >  probes on a single chromosome.
> >  > >  We have a set of customized NimbleGen aCGH human sample 
> data. Each sample
> >  > >  has 40 million probes. Even a single chromosome has >3M probes.
> >  > >
> >  > >  I tried some R-based and Matlab-based aCGH analysis software to
> >  > >  analyze just a single chromosome in
> >  > >  a single sample using our supercomputer, but no hopes! Some software
> >  > >  just show error messages (works fine for small
> >  > >  data sets) and some software can not complete the analysis even after
> >  > >  1-2 days CPU time.
> >  > >
> >  > >  I am wondering whether any people in the list have experience in
> >  > >  analyzing the aCGH data with such a scale.
> >  > >  If you have, can you share some your experience with me?
> >  > >
> >  > >  Will it be a good idea to first divide the chromosome into some small
> >  > >  pieces (say each pieice has 10,000 probes) and then run the algorithm
> >  > >  on each piece of the chromosome?
> >  >
> >  >What are the goals of the analysis?  What types of samples (cancer,
> >  >comparative genomics, normal DNA)?  And what methods have you tried?
> >  >
> >  >Sean
> >
> >
> >
> >  ========================================
> >  Pingzhao Hu
> >  Statistical Analysis Facility
> >  The Centre for Applied Genomics (TCAG)
> >  The Hospital for Sick Children Research Institute
> >  MaRS Centre - East Tower
> >  101 College Street, Room 15-705
> >  Toronto, Ontario, M5G 1L7, Canada
> >  Tel.: (416) 813-7654 x6016
> >  Email: phu at sickkids.ca
> >  Web: http://www.tcag.ca/statisticalAnalysis.html
> >
> >  _______________________________________________
> >  Bioconductor mailing list
> >  Bioconductor at stat.math.ethz.ch
> >  https://stat.ethz.ch/mailman/listinfo/bioconductor
> >  Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >



========================================
Pingzhao Hu
Statistical Analysis Facility
The Centre for Applied Genomics (TCAG)
The Hospital for Sick Children Research Institute
MaRS Centre - East Tower
101 College Street, Room 15-705
Toronto, Ontario, M5G 1L7, Canada
Tel.: (416) 813-7654 x6016
Email: phu at sickkids.ca
Web: http://www.tcag.ca/statisticalAnalysis.html



More information about the Bioconductor mailing list