[R] Kolmogorov-Smirnof test for lognormal distribution with estimated parameters

Frank E Harrell Jr f.harrell at vanderbilt.edu
Wed Jan 12 18:19:17 CET 2005


Christoph Buser wrote:
> Hi Kwabena
> 
> I did once a simulation, generating normal distributed values
> (500 values) and calculating a KS test with estimated
> parameters. For 10000 times repeating this test I got about
> 1 significant tests (on a level alpha=0.05 I'm expecting about 500 
> significant tests by chance)
> So I think if you estiamte the parameters from the data, you fit
> to good and the used distribution of the test statistic is not
> adequate as it is indicated in the help page you cited. There
> (in the help page) is some literature, but it is no easy stuff
> to read.
> Furthermore I know no implementation of an KS test which
> accounts for this estimation of the parameter.
> 
> I recommend a graphical tool instead of a test:
> 
> x <- rlnorm(100)
> qqnorm(log(x))
> 
> See also ?qqnorm and ?qqplot.
> 
> If you insist on testing a theoretical distribution be aware
> that a non significant test does not mean that your data has the
> tested distribution (especially if you have few data, there is
> no power in the test to detect deviations from the theoretical
> distribution and the conclusion that the data fits well is
> trappy)
> 
> If there are enough data I'd prefer a chi square test to the KS
> test (but even there I use graphical tools instead). 
> 
> See ?chisq
> 
> For this test you have to specify classes and this is 
> subjective (you can't avoid this).
> 
> You can reduce the DF of the expected chi square distribution
> (under H_0) by the number of estimated parameters from the data
> and will get better results. 
> 
> DF = number of classes - 1 - estimated parameters
> 
> I think this test is more powerful than the KS test,
> particularly if you must estimate the parameters from data.
> 
> Regards,
> 
> Christoph
> 

It is also a good idea to ask why one compares against a known 
distribution form.  If you use the empirical CDF to select a parametric 
distribution, the final estimate of the distribution will inherit the 
variance of the ECDF.  The main reason statisticians think that 
parametric curve fits are far more efficient than nonparametric ones is 
that they don't account for model uncertainty in their final confidence 
intervals.

-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University




More information about the R-help mailing list