[R] how to calculate this natural logarithm
Petr Savicky
savicky at praha1.ff.cuni.cz
Sat Jan 8 13:08:47 CET 2011
On Sat, Jan 08, 2011 at 12:20:59AM +0800, zhaoxing731 wrote:
> Hello
>
> I want to calculate natural logarithm of sum of combinations as follow: (R code)
>
> {
> com_sum=choose(2000000,482)*choose(1000000,118)+choose(2000000,483)*choose(1000000,117)+...+choose(2000000,i)*choose(1000000,600-i)+...+choose(2000000,600)*choose(1000000,0) #calculate the sum
> result=log(com_sum) #calculate the log of the sum
> }
Let me present a more detailed calculation. The natural logarithms
of the products above are
i <- 482:600
logx <- lchoose(2000000,i) + lchoose(1000000,600-i)
The logarithm of the largest of the products is
maxlog <- max(logx)
maxlog # [1] 5675.315
The ratios of the products divided by the largest of them may be computed as
exp(logx - maxlog)
# [1] 1.000000e+00 4.885522e-01 2.361712e-01 1.129583e-01 5.345075e-02
# ...
Only a few of the ratios are not negligible, so the sum of the
products differs from the largest product by a small factor. This
factor may be estimated as
sum(exp(logx - maxlog))
# [1] 1.937370
The required logarithm differs from maxlog by the logarithm of this
factor. So, the result is
options(digits=15)
maxlog + log(sum(exp(logx - maxlog)))
# [1] 5675.97681976982
Computing the sum of products exactly using long integer arithmetic
and computing its logarithm with large precision produced
5675.9768197698224041850302683544102271115821956713424401649...
which matches well the result obtained using double precision in R.
The above approach may be derived from the numerical part of the algorithm
suggested by Ted Harding for computing extremely small p-values in the Fisher Exact
test. There, we are computing a sum of small numbers, not large ones. However,
the problem is similar, since we cannot represent the summands themselves, but
we can represent their logarithms. In my opinion, P(a, b, c, d) is the largest
of the summands or close to the largest, so we calculate the ratios of the summands
to the largest of them or to a close approximation of the largest.
Petr Savicky.
More information about the R-help
mailing list