[BioC] VariantTools callVariants - binomial likelihood ratio
heyi xiao
xiaoheyiyh at yahoo.com
Tue Nov 19 19:33:51 CET 2013
Hi Michael,
Thanks for your answers. Now I understand the lrtFreqCutoff step.
I think the the values of p0 (=perror = 0.001) and p1 (=plower = 0.2) should be important for snp calling.
Sequencing error, p0/perror, can be derived from the raw read data, or even better can be recalibrated like SOAPsnp or GATK. But that’s a bit too much of work. On the other hand, why you choose to use p1=plower = 0.2?
I appreciate the simplicity of your method much, i.e. map the binomial likelihood ratio to the cutoff frequency. Have you tried to use the Bayesian probability score as in other method? I know it is more complicated but you get the significance level or p value this way. Thank you!
--------------------------------------------
On Sun, 11/17/13, Michael Lawrence <lawrence.michael at gene.com> wrote:
Subject: Re: VariantTools callVariants - binomial likelihood ratio
Cc: "bioconductor at r-project.org" <bioconductor at r-project.org>, "Michael Lawrence" <lawrence.michael at gene.com>
Date: Sunday, November 17, 2013, 10:29 PM
On Sun, Nov 17, 2013
wrote:
Hi,
I try to use VariantTools package package and I am working
on the vignette examples. I have some questions on the
variant calling procedure. The vignette says: “The
callVariants function uses a binomial likelihood ratio test
for this purpose. The ratio is P(D|p = plower)=P(D|p =
perror).”
To understand this statement, I located the two internal
functions that carry out this step.
> VariantTools:::lrtFreqCutoff
function (p0, p1, n = if (C == 1L) 1L else n, C = 1L)
{
num <- (1/n) * log(C) + log(1 - p0) - log(1 - p1)
denom <- log(p1) - log(p0) + log(1 - p0) - log(1 -
p1)
num/denom
}
> VariantTools:::BinomialLRFilter
function (p.lower = 0.2, p.error = 1/1000)
{
function(x) {
freq.cutoff <- lrtFreqCutoff(p.error,
p.lower)
sample.freq <- altDepth(x)/totalDepth(x)
passed <- sample.freq >= freq.cutoff
passed[is.na(passed)] <- FALSE
passed
}
}
However, I get even more confused. Here are my questions:
The binomial likelihood ratio should be
p1^k*(1-p1)^(n-k)/(p0^k*(1-p0)^(n-k)), here n and k are
total trial and number of hits. I don’t understand why it
becomes num/denom in the lrtFreqCutoff function?
The lrtFreqCutoff is not calculating the
likelihood ratio: it's calculating the frequency (k/n)
at which the ratio becomes greater than C. It takes some
algebra to work this out. 'n' is the coverage, but
it cancels out when C = 1. That's what the filter always
uses. If it didn't we would need to consider the
coverage at each position. As it is now, it is only a
function of the two probabilities and so is constant (~ 4%
using the defaults) for every position.
What’s n and C in lrtFreqCutoff? These arguments are not
used when BinomialLRFilter calls lrtFreqCutoff, why?
In BinomialLRFilter, the criterion used is sample.freq >=
freq.cutoff. But sample.freq <- altDepth(x)/totalDepth(x)
is a frequency as defined, yet freq.cutoff seems to be the
binomial likelihood ratio. How a frequency can be compared
to likelihood ratio?
In BinomialLRFilter, p0=perror = 0.001 and p1=plower = 0.2,
the default p0 and p1 values are assumed, why not used the
actual sequencing error rate and lowest variant frequency?
The user can change the defaults. How do you
propose we figure out the correct values automatically? We
could try to estimate the error rate from e.g. overlapping
read ends, but that's just a heuristic.
[[elided Yahoo spam]]
More information about the Bioconductor
mailing list