[BioC] general CGH threshold question
Wolfgang Raffelsberger
wraff at igbmc.fr
Fri Aug 21 19:04:10 CEST 2009
Dear Bioconductors,
for segmenting CGH data there are several packages available and most of
them tend to give me similar overall results.
However, when it comes to make the point about a collection of
cancer-specimens (=patients), I have to decide of how to combine all the
so nicely segmented individual profiles. And at some point I'm forced
to take the arbitrary decision for a threshold deciding if a given
position/segment from a specimen (patient) should be considered/counted
as aberrant or not.
Of course one could say, that in theory a given segment should either be
there as a single copy of doubled, tripled (etc..) or lost and that
expected rations should follow this. However in my view reality is
quite different. Surgeons tend to remove (a bit) more tissue than the
tumor itself, so there is reason to assume some normal tissue, plus
tumors may be heterogeneous. All these reasons contribute to the fact
that I see log-rations less than +/- 1 (which would describe this ideal
case), and I wonder how many of them could still represent "true"
alterations.
Now I've seen people making fairly arbitrary decisions about such
thresholds, like 0.5 (corresponds to : ~40% of molecules tested with
doubled DNA while the rest may be normal) or other values in that
range. Unfortunately the biologists/clinicians can't help me on the
question which fraction of cells should be altered to be still considered.
Now another part of the story enters the scene. From some (preliminary)
comparisons I've seen that Agilent software may give quite different
results about the frequency of lost/amplified zones of the genome (while
at least CBS, GLAD, aCGH and snapCGH were in major agreement for
penetration counts at a given threshold - I apologize for not mentioning
all the other BioC packages available). And not-bioinformatics people
keep asking me why this might be so. After all I wonder if this might
have something to do with the choice of the threshold mentioned above.
Of course, if you choose a threshold closer to 0 (like 0.1 or 0.2)
you'll find more aberrations above threshold, but not just more, to my
surprise - at sudden - entire chromosome-arms show up as enriched for
gains or losses, making the results (a bit) more look like the Agilent
results.
So when looking at all the distribution of all log2-ratios (say for some
100 patients) I see a rather bell-shaped (slightly asymmetric)
distribution. A qqplot has a slight sigmoid character and the 99.9%
(t-distribution) confidence interval with that many df is way to close
to 0.
So my question : What do you suggest as a procedure to define a
threshold to decide if a given position/segment may be considered as
altered when piling up all the biopsies/patients in study ?
Besides statistical ideas I also wonder if anybody has data from
comparisons with other experimental techniques to understand the "true"
status and the discrepancy with the Agilent software ?
Thank's in advance,
Wolfgang Raffelsberger
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Wolfgang Raffelsberger, PhD
Laboratoire de BioInformatique et Génomique Intégratives
CNRS UMR7104, IGBMC,
1 rue Laurent Fries, 67404 Illkirch Strasbourg, France
Tel (+33) 388 65 3300 Fax (+33) 388 65 3276
wolfgang.raffelsberger (at) igbmc.fr
More information about the Bioconductor
mailing list