The package *Rgof* brings together a number of routines for
the goodness-of-fit problem for univariate data. We have a data set x,
and we want to test whether it was generated by the probability
distribution F.

The highlights of this package are:

- it runs a large number of different tests simultaneously.

- it works for continuous and for discrete (aka histogram) data.

- it allows for parameter estimation.

- it uses Rcpp and parallel programming to be very fast.

- it includes routines that make it very easy to estimate the power of
the different tests for specific situations.

- it allows for the possibility that the sample size was a draw from a
Poisson random variable.

- it allows the user to add more tests.

- it includes a routine to draw the usual power graph.

**Note** all runs of the test routine are done with
*B=1000* and all runs of the power routines with arguments
*B=c(500, 500), maxProcessor = 2* in order to pass
*devtools::check()*.

**Continuous Data**

- Kolmogorov Smirnov (KS) (Massey 1951), (Kolmogorov 1933), (Smirnov 1948)
- Kuiper (K) (Kuiper 1960)
- Anderson-Darling (AD) (Anderson and Darling 1952), (Anderson and Darling 1954)
- Cramer-vonMises (CvM) (Anderson
1962)

- Wilson (W)

- Zhang’s methods (ZA, ZK, ZC) (Zhang
2002)

- Wasserstein p=1 (Wassp1) (E del Barrio 1999)

For all of these tests the distribution of the test statistic under the null hypothesis is found via simulation.

- Eight variations of chi square tests, using the formulas by Pearson or based on the likelihood ratio test, bins of equal size or equal probability and both a large number and a small number of bins, 100 and 10 by default. The p values are found using the usual chi square approximation to the null distribution. If parameters are estimated this is done via the method of minimum chi square, see (Berkson 1980). In all cases bins are combined until all of them have an expected count of at least 2.

There is a very large literature on chi square tests, the oldest of the goodness of fit tests. For a survey see (Rolke and Gutierrez-Gongora 2020).

**Discrete (Histogram) Data**

All the methods above are also implemented for discrete data, taking advantage of the fact that the number of bins is fixed and does not grow with the sample size. Therefore the discrete versions of the tests run generally much faster than the continuous one.

It is worth noting that these discrete versions are based on the theoretical ideas of the tests and not on the actual formula of calculation for the continuous case. The test statistics can therefore be different even when applied to the same data. For example, the Anderson-Darling test is based on the distance measure

\[A^2=n\int_{-\infty}^{\infty} \frac{(\hat{F}(x)-F(x))^2}{F(x)(1-F(x))}dF(x) \] where \(F\) is the theoretical distribution function under the null hypothesis and \(\hat{F}\) is the empirical distribution function. In the case of continuous data it can be shown that

\[A^2=-n-\frac1n\sum_{i=1}^n (2i-1)\left(\log F(x_i) +\log[1-F(x_{n+1-i})\right)\] However, for discrete data we have

\[A^2=n\sum_{i=1}^k \frac{(\hat{F}(x_i)-F(x_i))^2}{F(x_i)(1-F(x_i))}\left(F(x_i)-F(x_{i-1}\right)\]

with \(F(x_0)=0\).

In the continuous case \(\hat{F}\) is a step function but \(F\) is continuous, and therefore \(A^2>0\). In the discrete case however\(A^2=0\) is possible. This shows that the two cases are fundamentally different.

As for continuous data null distributions are found using simulation. In fact in the case of discrete data none of the tests has a known distribution for the test statistic under the null hypothesis.

- Four variations of chi square tests, using the formulas by Pearson or based on the likelihood ratio test and both a large number and a small number of bins. Again the routine combines bins until all have expected counts greater than 2, and the chi square approximation is used to find p values.

These methods can be used for both discrete and histogram data. The
main difference between these two is that discrete data has (a
countable) number of possible values whereas histogram data has possible
ranges of values (the bins). The only method directly affected by this
difference is Wassp1, which requires actual values. All other methods
ignore the *vals* argument.

We generate a data set of size 1000 from a Binomial distribution with n=20 and success probability 0.5, and then test \(H_0:F=Bin(20, 0.5)\).

```
vals=0:20 #possible values of random variable
pnull=function() pbinom(0:20, 20, 0.5) # cumulative distribution function (cdf)
rnull = function() table(c(0:20, rbinom(1000, 20, 0.5)))-1
# generate data under the null hypothesis, make sure that vector of counts has same length as vals, possibly 0.
```

**Null Hypothesis is true**

```
x = rnull()
gof_test_disc(x, pnull, rnull, vals, B=1000, doMethod = "all")
#> $Statistics
#> KS K AD CvM
#> 0.0089 0.0126 0.1329 0.0249
#> W ZA ZC Wassp1
#> 2.0453 3.3967 217.2362 0.0346
#> chi large Pearson chi small Pearson chi large LR-m chi small LR-m
#> 1.9080 1.2390 1.9052 1.2310
#>
#> $p.value
#> KS K AD CvM
#> 0.9710 0.9830 0.9970 0.9700
#> W ZA ZC Wassp1
#> 0.6690 0.7090 0.7120 0.9950
#> chi large Pearson chi small Pearson chi large LR-m chi small LR-m
#> 0.9995 0.9987 0.9995 0.9987
```

**Null Hypothesis is false**

```
x = table(c(0:20, rbinom(1000, 20, 0.55)))-1
#true p is 0.55, not 0.5
gof_test_disc(x, pnull, rnull, vals, B=1000, doMethod = "all")$p.value
#> KS K AD CvM
#> 0 0 0 0
#> W ZA ZC Wassp1
#> 0 0 0 0
#> chi large Pearson chi small Pearson chi large LR-m chi small LR-m
#> 0 0 0 0
```

Arguments of *gof_test_disc*:

x: vector with counts (histogram heights). Should have a number for each value of vals, possibly 0.

pnull: function to find values of cumulative distribution function for each value of vals. Function has no arguments.

rnull: function to generate data from true density. Function has no arguments. Function needs to insure that output is a vector with same length as vals.

vals: all possible values of discrete random variable, that is all x with \(P(X=x)>0\)

In some fields like high energy physics it is common that the sample size is not fixed but a random variable drawn from a Poisson distribution with a known rate. Our package easily accomodates that:

We generate a data set of size 1000 from a Binomial distribution with n=20 and success probability p, and then test F=Bin(20, .). p is estimated from data.

```
vals=0:20
pnull=function(p=0.5) pbinom(0:20, 20, ifelse(p>0&&p<1, p, 0.5))
rnull = function(p=0.5) table(c(0:20, rbinom(1000, 20, p)))-1
phat = function(x) sum(0:20*x)/sum(x)/20
```

**Null Hypothesis is true**

```
x = table(c(0:20, rbinom(1000, 20, 0.5)))-1
gof_test_disc(x, pnull, rnull, vals, phat=phat, B=1000, doMethod = "all")$p.value
#> KS K AD CvM
#> 0.3030 0.1670 0.2080 0.2770
#> W ZA ZC Wassp1
#> 0.1450 0.8570 0.4720 0.1680
#> chi large Pearson chi small Pearson chi large LR-m chi small LR-m
#> 0.3779 0.4247 0.4576 0.4512
```

**Null Hypothesis is true**

```
x = table(c(0:20, rbinom(1000, 20, 0.55)))-1
# p is not 0.5, but data is still from a binomial distribution with n=20
gof_test_disc(x, pnull, rnull, vals, phat=phat, B=1000, doMethod = "all")$p.value
#> KS K AD CvM
#> 0.3980 0.6350 0.5150 0.5640
#> W ZA ZC Wassp1
#> 0.8550 0.8750 0.6450 0.6310
#> chi large Pearson chi small Pearson chi large LR-m chi small LR-m
#> 0.9295 0.8272 0.9360 0.8316
```

**Null Hypothesis is false**

```
x = table(c(rep(0:20, 5), rbinom(1000-21*5, 20, 0.53)))
# data has to many small and large values to be from a binomial
gof_test_disc(x, pnull, rnull, vals, phat=phat, B=1000, doMethod = "all")$p.value
#> KS K AD CvM
#> 0.003 0.000 0.000 0.000
#> W ZA ZC Wassp1
#> 0.002 0.000 0.000 0.000
#> chi large Pearson chi small Pearson chi large LR-m chi small LR-m
#> 0.000 0.000 0.000 0.000
```

Arguments of *gof_test_disc*:

x: vector with counts. Should have a number for each value in vals, possibly 0.

pnull: cumulative distribution function. One argument, a vector of parameters.

rnull: function to generate data from distribution. One argument, a vector of parameters.

vals: possible values of discrete random variable, that is all x with \(P(X=x)>0\)

phat: function of data to estimate parameter. Data is sole argument.

The estimation of the parameter(s) in the case of the chi square tests is done via the minimum chi square method. The routine uses a general function minimizer. If there are values of the parameter that are not possible this can lead to warnings. It is best to put a check into the pnull function to avoid this issue. As an example the function pnull above checks that the success probability p is in the interval \((0.1)\).

A variant of discrete data sometimes encountered is data given in the form of a histogram, that is as a set of bins and their counts. The main distinction is that discrete data has specific values, for example the non-negative integers for a Poisson distribution, whereas histogram data has ranges of numbers, the bins. It turns out that, though, that the only method that requires actual values is Wassp1, and for that method one can use the midpoint of the intervals.

As an example consider the following case: we have histogram data and we want to test whether it comes from an exponential rate 1 distribution, truncated to the interval 0-2:

```
rnull = function() {
y = rexp(2500, 1) # Exp(1) data
y = y[y<2][1:1500] # 1500 events on 0-2
bins = 0:40/20 # binning
hist(y, bins, plot=FALSE)$counts # find bin counts
}
x = rnull()
bins = 0:40/20
vals = (bins[-1]+bins[-21])/2
pnull = function() {
bins = 1:40/20
pexp(bins, 1)/pexp(2, 1)
}
gof_test_disc(x, pnull, rnull, vals, doMethod = "all")$p.value
#> KS K AD CvM
#> 0.0292 0.0024 0.0818 0.0910
#> W ZA ZC Wassp1
#> 0.3436 0.2030 0.0966 0.1114
#> chi large Pearson chi small Pearson chi large LR-m chi small LR-m
#> 0.2974 0.4811 0.2490 0.4545
```

**Null Hypothesis is true**

```
x = rnorm(1000)
gof_test_cont(x, pnull, rnull, qnull, B=1000, doMethod = "all")$p.value
#> KS K AD CvM
#> 0.3950 0.3950 0.2880 0.3500
#> W ZA ZK ZC
#> 0.5800 0.5220 0.2550 0.5800
#> Wassp1 EP large Pearson ES large Pearson EP small Pearson
#> 0.2730 0.0816 0.5400 0.1284
#> ES small Pearson EP large LR-m ES large LR-m EP small LR-m
#> 0.2447 0.2964 0.4631 0.1332
#> ES small LR-m
#> 0.3068
```

**Null Hypothesis is false**

```
x = rnorm(1000, 0.5)
gof_test_cont(x, pnull, rnull, qnull, B=1000, doMethod = "all")$p.value
#> KS K AD CvM
#> 0 0 0 0
#> W ZA ZK ZC
#> 0 0 0 0
#> Wassp1 EP large Pearson ES large Pearson EP small Pearson
#> 0 0 0 0
#> ES small Pearson EP large LR-m ES large LR-m EP small LR-m
#> 0 0 0 0
#> ES small LR-m
#> 0
```

Arguments of *gof_test_cont*:

x: data set

pnull: function to find cdf. Function has one argument, points where cdf is to be evaluated.

rnull: function to generate data from density. Function has no argument.

qnull: quantile function. Function has one argument, points where inverse cdf is to be evaluated. If missing Wassp1 method is not used.

```
pnull = function(x, p=0) pnorm(x, p)
qnull = function(x, p=0) qnorm(x, p)
rnull = function(p) rnorm(1000, p)
phat = function(x) mean(x)
```

**Null Hypothesis is true**

```
x = rnorm(1000)
gof_test_cont(x, pnull, rnull, qnull, phat=phat, B=1000, doMethod = "all")$p.value
#> KS K AD CvM
#> 0.1730 0.1730 0.1870 0.2720
#> W ZA ZK ZC
#> 0.2750 0.6030 0.2350 0.5550
#> Wassp1 EP large Pearson ES large Pearson EP small Pearson
#> 0.2170 0.0023 0.0954 0.5176
#> ES small Pearson EP large LR-m ES large LR-m EP small LR-m
#> 0.5232 0.0107 0.2085 0.5147
#> ES small LR-m
#> 0.5812
```

**Null Hypothesis is true**

```
x = rnorm(1000, 0.5)
gof_test_cont(x, pnull, rnull, qnull, phat=phat)$p.value
#> W ZK ZC Wassp1
#> 0.4570 0.1984 0.3268 0.7182
#> EP small Pearson ES small Pearson
#> 0.6214 0.6794
```

**Null Hypothesis is false**

```
x = rnorm(1000, 0.5, 2)
gof_test_cont(x, pnull, rnull, qnull, phat=phat, B=1000, doMethod = "all")$p.value
#> KS K AD CvM
#> 0 0 0 0
#> W ZA ZK ZC
#> 0 0 0 0
#> Wassp1 EP large Pearson ES large Pearson EP small Pearson
#> 0 0 0 0
#> ES small Pearson EP large LR-m ES large LR-m EP small LR-m
#> 0 0 0 0
#> ES small LR-m
#> 0
```

Arguments:

x: data set

pnull: function to find cdf. First argument is points to evaluate function, second argument is parameters.

rnull: function to generate data from density. Sole argument is vector of parameters.

qnull: quantile function. Arguments are same as pnull.

phat: function to estimate the parameter

```
pnull = function(x, p=c(0, 1)) pnorm(x, p[1], ifelse(p[2]>0, p[2], 0.001))
qnull = function(x, p=c(0, 1)) qnorm(x, p[1], ifelse(p[2]>0, p[2], 0.001))
rnull = function(p=c(0, 1)) rnorm(1000, p[1], ifelse(p[2]>0, p[2], 0.001))
phat = function(x) c(mean(x), sd(x))
```

**Null Hypothesis is true**

```
x = rnorm(1000)
gof_test_cont(x, pnull, rnull, qnull, phat=phat, B=1000, doMethod = "all")$p.value
#> KS K AD CvM
#> 0.4110 0.4110 0.3760 0.3950
#> W ZA ZK ZC
#> 0.4100 0.3400 0.2870 0.3440
#> Wassp1 EP large Pearson ES large Pearson EP small Pearson
#> 0.3450 0.5576 0.3154 0.5036
#> ES small Pearson EP large LR-m ES large LR-m EP small LR-m
#> 0.7751 0.7432 0.2360 0.5022
#> ES small LR-m
#> 0.7735
```

**Null Hypothesis is true**

```
x = rnorm(1000, 0.5)
gof_test_cont(x, pnull, rnull, qnull, phat=phat, B=1000, doMethod = "all")$p.value
#> KS K AD CvM
#> 0.8110 0.8110 0.5690 0.7840
#> W ZA ZK ZC
#> 0.8210 0.2720 0.4180 0.2790
#> Wassp1 EP large Pearson ES large Pearson EP small Pearson
#> 0.5220 0.1061 0.3877 0.8568
#> ES small Pearson EP large LR-m ES large LR-m EP small LR-m
#> 0.4186 0.2579 0.5467 0.8541
#> ES small LR-m
#> 0.4325
```

**Null Hypothesis is true**

```
x = rnorm(1000, 0.5, 2)
gof_test_cont(x, pnull, rnull, qnull, phat=phat, B=1000, doMethod = "all")$p.value
#> KS K AD CvM
#> 0.4060 0.4060 0.3100 0.3640
#> W ZA ZK ZC
#> 0.3890 0.0240 0.0790 0.0540
#> Wassp1 EP large Pearson ES large Pearson EP small Pearson
#> 0.2130 0.5339 0.5418 0.1156
#> ES small Pearson EP large LR-m ES large LR-m EP small LR-m
#> 0.2940 0.6369 0.4703 0.1285
#> ES small LR-m
#> 0.3030
```

**Null Hypothesis is false**

```
x = rt(1000, 2)
gof_test_cont(x, pnull, rnull, qnull, phat=phat, B=1000, doMethod = "all")$p.value
#> KS K AD CvM
#> 0 0 0 0
#> W ZA ZK ZC
#> 0 0 0 0
#> Wassp1 EP large Pearson ES large Pearson EP small Pearson
#> 0 0 0 0
#> ES small Pearson EP large LR-m ES large LR-m EP small LR-m
#> 0 0 0 0
#> ES small LR-m
#> 0
```

Arguments: Same as for a single parameter.

```
vals = 0:10
pnull = function() pbinom(0:10, 10, 0.5)
rnull =function () table(c(0:10, rbinom(100, 10, 0.5)))-1
ralt =function (p) table(c(0:10, rbinom(100, 10, p)))-1
P=gof_power_disc(pnull, rnull, vals, ralt,
param_alt=seq(0.5, 0.6, 0.01), B=c(500, 500), nbins=c(11, 5))
plot_power(P, "p", Smooth=FALSE)
```

In all cases the arguments vals, pnull and rnull are the same as for the testing routines. In addition we now have

ralt: a routine with one parameter that generates data under some alternative hypothesis.

param_alt: values to be passed to ralt. This allows the calculation of the power for many different values.

B=c(1000, 1000) the first number is the number of simulation runs for power estimation and the second the number of runs to be used to find the null distribution.

```
vals = 0:10
pnull = function(p) pbinom(0:10, 10, p)
rnull = function (p) table(c(0:10, rbinom(100, 10, p)))-1
phat = function(x) sum(0:10*x)/1000
```

**Null Hypothesis is true**

```
ralt =function (p) table(c(0:10, rbinom(100, 10, p)))-1
gof_power_disc(pnull, rnull, vals, ralt, c(0.5, 0.6), phat,
B=c(100, 200), nbins=c(11, 5), maxProcessors = 2)
#> KS K AD CvM W ZA ZC Wassp1 chi large Pearson
#> 0.5 0.06 0.06 0.04 0.08 0.05 0.05 0.04 0.06 0.02
#> 0.6 0.05 0.06 0.07 0.07 0.03 0.05 0.00 0.06 0.03
#> chi small Pearson chi large LR-m chi small LR-m
#> 0.5 0.06 0.02 0.06
#> 0.6 0.02 0.03 0.02
```

Note that power estimation in the case of a composite hypothesis (aka with parameters estimated) is much slower than the simple hypothesis case.

**Null Hypothesis is false**

```
ralt =function (p) table(c(rep(0:10, 2), rbinom(100, 10, p)))
gof_power_disc(pnull, rnull, vals, ralt, 0.5, phat,
B=c(100, 200), nbins=c(11, 5), maxProcessors = 2)
#> KS K AD CvM
#> 1.00 1.00 1.00 1.00
#> W ZA ZC Wassp1
#> 0.63 1.00 0.00 1.00
#> chi large Pearson chi small Pearson chi large LR-m chi small LR-m
#> 0.23 0.09 0.17 0.10
```

```
pnull = function(x) pnorm(x)
qnull = function(x) qnorm(x)
rnull = function() rnorm(100)
ralt = function(mu=0) rnorm(100, mu)
gof_power_cont(pnull, rnull, qnull, ralt, c(0, 1), B=c(100, 200))
#> KS K AD CvM W ZA ZK ZC Wassp1 EP large Pearson
#> 0 0.02 0.02 0.04 0.04 0.04 0.07 0.06 0.05 0.04 0.07
#> 1 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
#> ES large Pearson EP small Pearson ES small Pearson EP large LR-m
#> 0 0.12 0.12 0.05 0.03
#> 1 1.00 1.00 1.00 1.00
#> ES large LR-m EP small LR-m ES small LR-m
#> 0 0.06 0.06 0.05
#> 1 1.00 1.00 1.00
```

```
pnull = function(x, p=c(0,1)) pnorm(x, p[1], ifelse(p[2]>0, p[2], 0.01))
qnull = function(x, p=c(0,1)) qnorm(x, p[1], ifelse(p[2]>0, p[2], 0.01))
rnull = function(p=c(0,1)) rnorm(500, p[1], p[2])
ralt = function(mu=0) rnorm(100, mu)
phat = function(x) c(mean(x), sd(x))
gof_power_cont(pnull, rnull, qnull, ralt, c(0, 1), phat, B=c(100, 200),
maxProcessor=2)
#> KS K AD CvM W ZA ZK ZC Wassp1 EP large Pearson
#> 0 0.94 0.90 0.08 0.09 0.09 0.64 0.01 0.01 1 0.06
#> 1 0.97 0.89 0.09 0.06 0.06 0.60 0.02 0.03 1 0.04
#> ES large Pearson EP small Pearson ES small Pearson EP large LR-m
#> 0 0.03 0.09 0.04 0.04
#> 1 0.04 0.08 0.04 0.02
#> ES large LR-m EP small LR-m ES small LR-m
#> 0 0.04 0.06 0.05
#> 1 0.03 0.06 0.04
```

```
ralt = function(df=1) {
# t distribution truncated at +- 5
x=rt(1000, df)
x=x[abs(x)<5]
x[1:100]
}
gof_power_cont(pnull, rnull, qnull, ralt, c(2, 50), phat, Range=c(-5,5), B=c(100, 200), maxProcessor=2)
#> KS K AD CvM W ZA ZK ZC Wassp1 EP large Pearson
#> 2 1.00 1.00 0.63 0.61 0.62 0.96 0.27 0.17 1 0.19
#> 50 0.96 0.94 0.04 0.06 0.04 0.69 0.00 0.02 1 0.09
#> ES large Pearson EP small Pearson ES small Pearson EP large LR-m
#> 2 0.28 0.24 0.36 0.10
#> 50 0.00 0.11 0.03 0.03
#> ES large LR-m EP small LR-m ES small LR-m
#> 2 0.24 0.21 0.37
#> 50 0.01 0.07 0.03
```

Its is very easy for a user to add other goodness-of-fit tests to the
package. This can be done by editing the routines
**TS_cont** and/or **TS_disc**, which are
located in the folder *inst/examples* in the Rgof library folder.
Or a user can write their own version of these files.

**Example**

Say we wish to use a test that is a variant of the Cramer-vonMises test, using the integrated absolute difference of the empirical and the theoretical distribution function:

\[\int_{-\infty}^{\infty} \vert F(x) - \hat{F}(x) \vert dF(x)\] For continuous data we have the routine

```
newTS_cont = function(x, Fx, param, qnull) {
Fx=sort(Fx)
n=length(x)
out = sum(abs( (2*1:n-1)/2/n-Fx ))
names(out) = "CvM alt"
out
}
```

This routine has to have four arguments x, Fx, param and qnull, just
like TS_cont. Note that the return object has to be a
**named** vector. Also, the arguments param and qnull are
likely not needed for the calculation of the new test statistic but need
to be present for consistency reasons.

Then we can run this test with

```
pnull = function(x) punif(x)
qnull = function(x) qunif(x)
rnull = function() runif(500)
x = rnull()
Rgof::gof_test_cont(x, pnull, rnull, qnull, TS=newTS_cont)
#> $Statistics
#> CvM alt
#> 6.9863
#>
#> $p.value
#> CvM alt
#> 0.3946
```

Say we want to find the power of this test when the true distribution is a linear:

```
ralt = function(slope) {
if(slope==0) y=runif(500)
else y=(slope-1+sqrt((1-slope)^2+4*slope* runif(500)))/2/slope
}
```

```
gof_power_cont(pnull, rnull, qnull, ralt, TS=newTS_cont, param_alt=round(seq(0, 0.5, length=5), 3), Range=c(0,1))
#> CvM alt
#> 0 0.049
#> 0.125 0.337
#> 0.25 0.893
#> 0.375 0.998
#> 0.5 1.000
```

for discrete data we write the routine using Rcpp:

```
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector newTS_disc(IntegerVector x,
NumericVector Fx,
NumericMatrix nm,
NumericVector vals) {
Rcpp::CharacterVector methods=CharacterVector::create("CvM alt");
int const nummethods=methods.size();
int k=x.size(), n, i;
NumericVector TS(nummethods), ecdf(k);
double tmp;
TS.names() = methods;
n=0;
for(i=0;i<k;++i) n = n + x[i];
ecdf(0) = double(x(0))/double(n);
for(i=1;i<k;++i) {
ecdf(i) = ecdf(i-1) + x(i)/double(n);
}
tmp = std::abs(ecdf[0]-Fx(0))*Fx(0);
for(i=1;i<k;++i)
tmp = tmp + std::abs(ecdf(i)-Fx(i))*(Fx(i)-Fx(i-1));
TS(0) = tmp;
return TS;
}
```

Again the routine has to have four arguments x, Fx, nm and vals, just like TS_disc, and the output vector has to have names. Aagin, the arguments nm and vals are likely not needed for the calculation of the new test statistic but need to be present for consistency reasons.

Note that one drawback of writing the routine in Rcpp is that it is then not possible to used multiple processors.

```
vals=1:50/51
pnull = function() (1:50)/50
rnull = function() c(rmultinom(1, 500, rep(1/50,50)))
x = rnull()
gof_test_disc(x, pnull, rnull, vals, TS=newTS_disc)
#> $Statistics
#> CvM alt
#> 0.013
#>
#> $p.value
#> CvM alt
#> 0.4648
```

and for power calculations we can run

```
ralt = function(slope) {
if(slope==0) p=rep(1/50, 50)
else p=diff(slope * (0:50/50)^2 + (1 - slope) * 0:50/50)
c(rmultinom(1, 500, p))
}
gof_power_disc(pnull, rnull, vals, ralt, TS=newTS_disc, param_alt=round(seq(0, 0.5, length=5), 3))
#> Parallel Programming is not possible if custom TS is written in C++. Switching to single processor
#> CvM alt
#> 0 0.055
#> 0.125 0.367
#> 0.25 0.898
#> 0.375 0.999
#> 0.5 1.000
```

Anderson, T W. 1962. “On the Distribution of the Two-Sample
Cramer-von Mises Criterion.”
*Annals of Mathematical Statistics* 33 (3): 1148–59.

Anderson, T W, and D A Darling. 1952. “Asymptotic Theory of
Certain Goodness-of-Fit Criteria Based on Stochastic Processes.”
*Annals of Mathematical Statistics* 23: 193–212.

———. 1954. “A Test of Goodness-of-Fit.” *JASA* 49:
765–69.

Berkson, J. 1980. “Minimum Chi-Square, Not Maximum
Likelihood.” *Ann. Math. Stat* 8 (3): 457–87.

E del Barrio, C Matran, J A Cuesta-Albertos. 1999. “Tests of
Goodness of Fit Based on the L2-Wasserstein Distance.” *Annals
of Statistics* 1230-1239: 27.

Kolmogorov, A. 1933. “Sulla Determinazione Empirica Di Una Legge
Di Distribuzione.” *G. Ist. Ital. Attuari.* 4: 83–91.

Kuiper, N H. 1960. “Tests Concerning Random Points on a
Circle.” *Proceedings of the Koninklijke Nederlandse Akademie
van Wetenschappen* 63: 38–47.

Massey, F J. 1951. “The
Kolmogorov-Smirnov Test for
Goodness-of-Fit.” *JASA* 46: 68–78.

Rolke, Wolfgang, and Cristian Gutierrez-Gongora. 2020. “A
Chi-Square Goodness-of-Fit Test for Continuous Distributions Against a
Known Alternative.” *Computational Statistics*. https://doi.org/10.1007/s00180-020-00997-x.

Smirnov, N. 1948. “Table for Estimating the Goodness of Fit of
Empirical Distributions.” *Annals of Mathematical
Statistics* 19: 279–81.

Zhang, J. 2002. “Powerful Goodness-of-Fit Tests Based on
Likelihood Ratio.” *Journal of the RSS (Series B)* 64:
281–94.