[R] Help please..

Wed Mar 1 12:08:21 CET 2000

Hello R-world,

I am facing a peculiar problem and hope someone out there
can comment on it.

In goodness-of-fit tests for evaluation of distributions,
there are three well-known methods:

	1. Chi-square
	2. Anderson-Darling
	3. Kolmogorov-Sminrov

I am trying to use the second test. Many researchers have
reported results using this test. I wrote programs in C and
now in R to do this. I run into serious problems. I am
enclosing the R program and the outputs I get. My comments
are embedded in the program text itself. Can anyone help
me out please ??

I am aware that this test fails in real life cases when
the sample size is large. But many authors have said that
the test succeeds if applied to subsamples. I mean, they
have said that if the test is tried 100 times on
different subsamples of size 1000 (or less), it is likely
to pass in about 80 cases.

In my case, I succeed only in the case where the data is
taken from a book "Simulation Modeling & Analysis" by
Kelton and Law. In all other cases I fail. Even the
subsamples from the book data fail this test.

Am I doing something wrong ? I am very surprised. I attach
the code and the results below.

Regards.

--ajit

--------------------------The R session--------------------

R : Copyright 2000, The R Development Core Team
Version 0.99.0  (February 7, 2000)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type	"?license" or "?licence" for distribution details.

R is a collaborative project with many contributors.
Type	"?contributors" for a list.

Type	"demo()" for some demos, "help()" for on-line help, or
    	"help.start()" for a HTML browser interface to help.
Type	"q()" to quit R.

> 
> ad.expo.test <- function(gen)
+ {
+ 	mean.gen <- mean(gen)
+ 	#z <- 1 - exp(-gen/0.399)
+ 	z <- 1 - exp(-gen/mean.gen)
+ 
+ 	#cat(z, '\n')
+ 	k <- 0
+ 
+ 	len.gen <- length(gen)
+ 	#cat(len.gen, '\n')
+ 	for (i in 1:len.gen)
+ 	{
+ 		#p <- log(z[i])
+ 		#cat(p, '\n')
+ 
+ 		j <- len.gen + 1 - i
+ 		#cat(i, j, '\n')
+ 
+ 		p <- log(z[i]) 
+ 		q <- log(1 - z[j])
+ 
+ 		r <-(2 * i - 1) * (p + q)
+ 		#cat(p+ q, r, '\n')
+ 		k <- k + r
+ 		#p <- p * (2 * i - 1)
+ 		#cat(p, '\n')
+ 	}
+ 
+ 	k <- - (k / len.gen)
+ 	k <- k - len.gen
+ 
+ 	k <- (1 + 0.6/len.gen) * k
+ 
+ 	return(list(lamda=mean.gen, ad=k))
+ 	#cat(k, '\n')
+ }
> 
> # The Exponential data from Law & Kelton "Simulation Modeling & Analysis"
> # second edition, table 6.7, page 367 (with lamda 0.399)
> 
> kelton <- c(rep(0.01, 8), rep(0.02, 2), rep(0.03, 3), rep(0.04, 6),
+ 	rep(0.05, 10),
+ 	rep(0.06, 4), rep(0.07, 10), rep(0.08, 4), rep(0.09, 2), rep(0.1, 9),
+ 	rep(0.11, 5), rep(0.12, 4), rep(0.13, 2), rep(0.14, 4), rep(0.15,6),
+ 	0.17, 0.18, rep(0.19, 3), 0.20, rep(0.21, 5), rep(0.22, 3),
+ 	rep(0.23, 5),
+ 	0.24, rep(0.25, 5), rep(0.26, 5), 0.27, rep(0.28, 2), rep(0.29, 2),
+ 	0.30, rep(0.31, 2), 0.32, rep(0.35, 3), rep(0.36, 3), rep(0.37, 2),
+ 	rep(0.38, 5), 0.39, rep(0.40, 2), rep(0.41, 2), rep(0.43, 3),
+ 	0.44, rep(0.45, 2), 0.46, rep(0.47, 3), 0.48, rep(0.49, 4),
+ 	rep(0.50, 3),
+ 	rep(0.51, 3), rep(0.52, 2), rep(0.53, 3), rep(0.54, 2),
+ 	rep(0.55, 2), 0.56, rep(0.57, 2), 0.60, rep(0.61, 2), rep(0.63, 2),
+ 	0.64, rep(0.65, 3), rep(0.69, 2), 0.70, rep(0.72, 3),
+ 	0.74, 0.75, 0.76, 0.77, 0.79, 0.84, 0.86, 0.87, 
+ 	rep(0.88, 2), 
+ 	0.90, 0.93, 0.93, 0.95, 0.97, 1.03, 1.05, 1.05, 1.06, 1.09, 1.10,
+ 	1.11, 1.12, 1.17, 1.18,
+ 	rep(1.24,2), 1.28, 1.33, 1.38, 1.44, 1.51, 1.72, 1.83, 1.96
+ 	)
> ad.expo.test(kelton)
$lamda
[1] 0.3988128

$ad
[1] 0.5594367

----------- The above result shows that the sample passes 
----------- the ANDERSON-DARLING
----------- criteria (null hypothesis at level 0.10)

> 
> # Now subsample 100 observations out of the kelton data and
> # apply the test again.
> 
> kelton.s <- sample(kelton, 100, replace=T)
> ad.expo.test(kelton.s)
$lamda
[1] 0.3467

$ad
[1] 88.76308

---------- The above result shows that the subsample 
---------- fails the ANDERSON-DARLING criteria (null 
---------- hypothesis at any level)

> 
> # Now generate 200 values artificially and apply the test
> 
> lamda <- 0.4
> 
> rexp(200, 1/lamda) -> gen
> ad.expo.test(gen)
$lamda
[1] 0.3866837

$ad
[1] 193.9315

>

---------- The above result shows that the artificially
---------- generated data fails the ANDERSON-DARLING criteria 
---------- (null hypothesis at any level)

----------------End of R session-----------------------------

--------------------------------------------------------------------------
Ajit K. Jena					  Phone  :  +46-455-385655
						  Fax    :  +46-455-385667
Dept. of Telecom and Signal Processing		  Mobile :  +46-736-547086
University of Karlskrona/Ronneby		  Email  : akj at its.hk-r.se
S-371 79 Karlskrona, Sweden
--------------------------------------------------------------------------

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._