[Bioc-devel] MRD measurements in Leukemic patients using NGS data in r

Thu Mar 5 22:36:31 CET 2020

a few thoughts:

1) look into Shearwater (
https://bioconductor.org/packages/release/bioc/html/deepSNV.html), then

2) talk to Todd Druley @ WashU, Elli Pappaemanuil @ MSKCC, Ruud & Bob @
Erasmus, the usual suspects

3) plan to validate w/ddPCR (at the absolute very least) and be aware that
most MRD in leukemia is done by a combination of BCR/TCR + breakpoint PCR
(lymphoid/fusion-driven) or DFN flow (myeloid + normal cyto)

not saying that ML-based methods might not help, but if you've got a
30x-100x genome (or even 1000x FM1) and are trying to compete with existing
standard approaches that can detect molecules at 1e-6, it'll be rough.  An
alternative approach (that has been used repeatedly) is to throw caution to
the wind, generate primers for numerous subject-specific somatic variants,
and use the ensemble to try and model MRD (speaking of ML). On the one
hand, that could give the clinic a "customer for life"; on the other hand,
it's not conducive to large-scale automation & deployment. As far as I
know, it never got much traction, in leukemia or anywhere else.  (Consider
that flow cytometry is capable of detecting 1-in-10K to 1-in-a-million
cells in most clinical flow labs.)

Best of luck! (and if you're not already working with UMI-tagged reads,
please talk to the people in #2 above; the reason most people won't go
below 5% VAF is that you get thwacked by error rates at that level, and the
reason most NGS-based MRD is based on UMIs is that existing PCR-based
methods have 6 logs sensitivity.)

--t

On Thu, Mar 5, 2020 at 4:08 PM Mulder, R <r.mulder01 using umcg.nl> wrote:

> Hi,
>
>
> I was wondering if anyone could help me with a script and support for the
> above mentioned goal.
>
> For this I have several BAM files for which I want to determine de
> nucleotide count per region of interest. The latter could be several
> hotspot mutation sites. I would like to get an overall overview of all the
> BAM files. Next I want to use these counts to determine for any new BAM
> file if the count for a particular genomic position is higher than the
> allowable range, hence could indicate if a mutation is present. For this I
> would like to use the modified Thompson Tau test. I think machine learning
> could be used for this. So, why do I want to do all this? Well, normal NGS
> pipelines only deal with variants at a frequency of 5%. Mutatios below this
> frequency are often missed. To know if a mutation is present below this
> level, you showed dive into the alignment and most often manually
> investigate the base calls. I know that this races some questions regarding
> call qualities, but then again our conventional assays have actually
> confirmed some of these low mutations. In addition, NGS can
>  be used to determine the presence of low frequent mutation which is of
> great importance for determining the measurable residual disease after
> treatment.
>
>
> I am new to r and bioconductor so I would be very thankful if someone
> could help me in my mission to setting up a script for this purpose.
>
>
> Thanks,
>
>
> Rene Mulder
>
> Laboratory Medicine
>
> University Medical Center Groningen
>
> The Netherlands
>
> ________________________________
> De inhoud van dit bericht is vertrouwelijk en alleen bes...{{dropped:15}}
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]