# [R] Compare two normal to one normal

Rolf Turner r.turner at auckland.ac.nz
Wed Sep 23 04:45:47 CEST 2015

```On 23/09/15 13:39, John Sorkin wrote:

> Charles, I am not sure the answer to me question, given a dataset,
> how can one compare the fit of a model of the fits the data to a
> mixture of two normal distributions to the fit of a model that uses a
> single normal distribution, can be based on the glm model you
> suggest.
>
> I have used normalmixEM to fit the data to a mixture of two normal
> curves. The model estimates four (perhaps five) parameters: mu1, sd^2
> 1, mu2, sd^2, (and perhaps lambda, the mixing proportion. The mixing
> proportion may not need to be estimated, it may be determined once
> once specifies mu1, sd^2 1, mu2, and sd^2.) Your model fits the data
> to a model that contains only the mean, and estimates 2 parameters
> mu0 and sd0^2.  I am not sure that your model and mine can be
> considered to be nested. If I am correct I can't compare the log
> likelihood values from the two models. I  may be wrong. If I am, I
> should be able to perform a log likelihood test with 2 (or 3, I am
> not sure which) DFs. Are you suggesting the models are nested? If so,
> should I use 3 or 2 DFs?

You are quite correct; there are subtleties involved here.

The one-component model *is* nested in the two-component model, but is
nested "ambiguously".

(1) The null (single component) model for a mixture distribution is
ill-defined.  Note that a single component could be achieved either by
setting the mixing probabilities equal to (1,0) or (0,1) or by setting
mu_1 = mu_2 and sigma_1 = sigma_2.

(2) However you slice it, the parameter values corresponding to the null
model fall on the *boundary* of the parameter space.

(3) Consequently the asymptotics go to hell in a handcart and the
likelihood ratio statistic, however you specify the null model, does not
have an asymptotic chi-squared distribution.

(4) I have a vague idea that there are ways of obtaining a valid
asymptotic null distribution for the LRT but I am not sufficiently
knowledgeable to provide any guidance here.

(5) You might be able to gain some insight from delving into the
literature --- a reasonable place to start would be with "Finite Mixture
Models" by McLachlan and Peel:

@book{mclachlan2000finite,
title={Finite Mixture Models, Wiley Series in
Probability and Statistics},
author={McLachlan, G and Peel, D},
year={2000},
publisher={John Wiley \& Sons, New York}
}

(6) My own approach would be to do "parametric bootstrapping":

* fit (to the real data) the null model and calculate
the log-likelihood L1, any way you like
* fit the full model and determine the log-likelihood L2
* form the test statistic LRT = 2*(L2 - L1)
* simulate data sets from the fitted parameters for the null model
* for each such simulate data set calculate a test statistic in the
foregoing manner, obtaining LRT^*_1, ..., LRT^*_N
* the p-value for your test is then

p = (m+1)/(N+1)

where m = the number of LRT^*_i values that greater than LRT

The factor of 2 is of course completely unnecessary.  I just put it in
"by analogy" with the "real", usual, likelihood ratio statistic.

Note that this p-value is *exact* (not an approximation!) --- for any
value of N --- when interpreted with respect to the "total observation
procedure" of observing both the real and simulated data.  (But see
below.) That is, the probability, under the null hypothesis, of
observing a test statistic "as extreme as" what you actually observed is
*exactly* (m+1)/(N+1).  See e.g.:

@article{Barnard1963,
author = {G. A. Barnard},
title  = {Discussion of ``{T}he spectral analysis of point processes''
by {M}. {S}. {B}artlett},
journal = {J. Royal Statist. Soc.},
series  = {B},
volume  = {25},
year = {1963},
pages = {294}
}

or

@article{Hope1968,
author =  {A.C.A. Hope},
title =  {A simplified {M}onte {C}arlo significance test procedure},
journal =  {Journal of the Royal Statistical Society, series {B}},
year =  1968,
volume = 30,
pages = {582--598}
}

Taking N=99 (or 999) is arithmetically convenient.

However I exaggerate when I say that the p-value is exact.  It would be
exact if you *knew* the parameters of the null model.  Since you have to
estimate these parameters the test is (a bit?) conservative.  Note that
the conservatism would be present even if you eschewed the "exact" test
and an "approximate" test using a (very) large value of N.

Generally conservatism (in this context! :-) ) is deemed to be no bad thing.

cheers,

Rolf Turner

P. S.  I think that the mixing parameter must *always* be estimated.
I.e. even if you knew mu_1, mu_2, sigma_1 and sigma_2 you would still
have to estimate "lambda".  So you have 5 parameters in your full model.
Not that this is particularly relevant.

R. T.

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

```