[R] What ECDF function?
Robert A LaBudde
ral at lcfltd.com
Sat Jun 9 22:26:09 CEST 2007
At 12:57 PM 6/9/2007, Marco wrote:
><snip>
>2.I found various version of P-P plot where instead of using the
>"ecdf" function use ((1:n)-0.5)/n
> After investigation I found there're different definition of ECDF
>(note "i" is the rank):
> * Kaplan-Meier: i/n
> * modified Kaplan-Meier: (i-0.5)/n
> * Median Rank: (i-0.3)/(n+0.4)
> * Herd Johnson i/(n+1)
> * ...
> Furthermore, similar expressions are used by "ppoints".
> So,
> 2.1 For P-P plot, what shall I use?
> 2.2 In general why should I prefer one kind of CDF over another one?
><snip>
This is an age-old debate in statistics. There are many different
formulas, some of which are optimal for particular distributions.
Using i/n (which I would call the Kolmogorov method), (i-1)/n or
i/(n+1) is to be discouraged for general ECDF modeling. These
correspond in quality to the rectangular rule method of integration
of the bins, and assume only that the underlying density function is
piecewise constant. There is no disadvantage to using these methods,
however, if the pdf has multiple discontinuities.
I tend to use (i-0.5)/n, which corresponds to integrating with the
"midpoint rule", which is a 1-point Gaussian quadrature, and which is
exact for linear behavior with derivative continuous. It's simple,
it's accurate, and it is near optimal for a wide range of continuous
alternatives.
The formula (i- 3/8)/(n + 1/4) is optimal for the normal
distribution. However, it is equal to (i-0.5)/n to order 1/n^3, so
there is no real benefit to using it. Similarly, there is a formula
(i-.44)/(N+.12) for a Gumbel distribution. If you do know for sure
(don't need to test) the form of the distribution, you're better off
fitting that distribution function directly and not worrying about the edf.
Also remember that edfs are not very accurate, so the differences
between these formulae are difficult to justify in practice.
================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd. URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239 Fax: 757-467-2947
"Vere scire est per causas scire"
More information about the R-help
mailing list