# [R] Chebyshev Inequality — MVUE

(Ted Harding) ted.harding at wlandres.net
Sun Jul 10 20:49:43 CEST 2011

```On 10-Jul-11 16:27:04, Durant, James T. (ATSDR/DTEM/PRMSB) wrote:
> Hello,
> I was interested in trying to write an R script to calculate a
> UCL for a lognormal distribution using the Chebyshev Inequality
> -- MVUE Approach (based on EPA’s guidance found in
> http://www.epa.gov/oswer/riskassessment/pdf/ucl.pdf).
> This looks like it should be straight forward, but I am need to
> calculate an MVUE for the population mean and an MVUE for the
> population variance, which requires a value (g_n) from a table A7,
> found in Aitchison and Brown (1969): The lognormal distribution.
> I have looked across the RSiteSearch and can not seem to find a
> function that will give me g_n or the MVUE for mean and variance
> of lognormal distribution.
>
> Is there an R function that will give me g_n or will calculate
> an MVUE for the population mean and variance for the lognormal
> distribution?
>
> VR
> Jim
> James T. Durant, MSPH CIH
> Emergency Response Coordinator
> US Agency for Toxic Substances and Disease Registry
> Atlanta, GA 30341
> 770-378-1695

Some quick comments. I will try to repond more fully later.

1. The Chebyshev inequality is usually very conservative.
As a simple example, consider X with a negative exponential
distribution with density exp(x), so that the population
mean is 1 and the population variance is also 1.

Then, for a factor K, Chebyshev says that

Prob(|X-1] > K*1) < 1/(K^2).

This is only informative if K>1. So (e.g.) take K=2. Then the Chebyshev
result is that this Prob < 1/4. HOwever, because X is positive, the
event in question is X > 1 + 2 = 3 so Prob is exp(-3) = 0.0498 < 1/20.

The reference you cite suggests ("Exhibit 5") applying the method to
log-transformed data, which for lognormal data would be normally
distributed. So apply Chebyshev to N(0,1) (mean=0, var=1). Then

Prob(|X-0| > K*1)  < 1/(K^2) as before.

Now take K=2 again (i.e. outside +/- 2 SDs, so Prob approx=0.05).
But Chebyshev still says "Prob < 1/4 = 0.25".

So, as a first comment, I am seriously wondering about the wisdom
of basing an approach on Chebyshev's inequality. Note also the
comments in your reference at the end of that section (bottom of
essentially a warning on similar lines to the above.

2. The function in the reference you cite is not "g_n" but "psi_n",
and the Table cited from Aitchison and Brown is not A7 but A2.

On page 45 of Aitchison and Brown (1969), section 5.41 "The Method
of Maximum Likelihood", the function psi_n is defined (Eqn 5.38)
so as to be applicable to the sufficient statistics mean(log(X))
and var(log(X)) to yield unbiased estimators of the population
mean of X and the population variance of X (Eqns (5.40) and (5.42)).

psi_n is defined as an infinite series which, according to A&B
(page 46) "converges only slowly", and they exhibit a finite-form
asymptotic approximation to it (Eqn (5.43)) which is accurate
asyn=mptotically to O(1/(n^3)). This fairly simple expression
would be easy to define as a function in R:

psi <- function(t,n){
exp(t)*(1 - t*(t+1)/n + (t^2)*(3(t^2) + 22*t + 21)/(6*(n^2)))
}

Hoping this helps. As I say, I hope to find time later to look
at this in more detail.

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 10-Jul-11                                       Time: 19:49:39
------------------------------ XFMail ------------------------------

```