[R] Help finding the proper function

Thu Oct 23 21:15:48 CEST 2008

On 24/10/2008, at 1:37 AM, Tom.O wrote:

>
> Ok, I'll try to be clearer.  I'll start from the beginning. I have  
> a set of
> samples that I'm going to use to model a proxy for a common  
> property. This
> property is that the samples are either in a "quiet" or "chaotic"  
> state.
> Both the quiet and chaotic state is modelled to be normal  
> distributed. So
> these samples are believed to be from a mixture of univariate normal
> distributions. But some samples do not have this property and are  
> believed
> only to come from a "quiet" state and is believed to be from a  
> univariate
> normal distribution.
>
>
> What I also know or assume to know is that when the samples that  
> are drawn
> from a mixture distribution change between distributions they do that
> simultaneously or near simultaneously. So each sample have a  
> probability "p"
> of being in either state. But since some samples are from a univariate
> distribution and some of the samples that are from a mixture  
> distribution
> don’t show a clear change they are no good at estimating the overall
> probability of being in the "quiet" or "chaotic" state.
>
> What I'm looking for is the combination of samples that would give  
> med the
> best proxy to model the overall state, some sort of optimizer.
>
>
> So hopefully this clarifies my problem.

Yes it does.  Bivariate has nothing to do with your problem.  Mixtures
has everything to do with it.

Essentially your problem is testing k=2 versus k=1 where k is the number
of components in the model.  I believe there is a substantial literature
about this; you could start with G. J. McLachlan and K. E. Basford  
``Mixture
Models:  Inference and Applications to Clustering'', Dekker, New  
York, 1988.

I believe that the story is roughly that the test you want can be  
carried
out via a likelihood ratio test, but that the null distribution of  
the test
statistic is problematic.  It is ***not*** asymptotically chi-squared.
As far as I know the actual null distribution is impossible to determine
analytically, hence one is left with a single option:  Bootstrapping.

The mixtools package on CRAN may provide the facilities you need;  
there are
other packages on CRAN which relate to mixtures.  In particular look at
Peter MacDonald's mixdist package.  I would conjecture (I haven't  
looked)
that the latter package would provide useful pointers to the literature,
including Peter's own substantial contributions.

HTH

	cheers,

		Rolf Turner
######################################################################
Attention: 
This e-mail message is privileged and confidential. If you are not the 
intended recipient please delete the message and notify the sender. 
Any views or opinions presented are solely those of the author.

This e-mail has been scanned and cleared by MailMarshal 
www.marshalsoftware.com
######################################################################