[R] Categories or clusters for univariate data

Tue Feb 22 18:37:27 CET 2005

On Tue, 22 Feb 2005 08:41:07 -0800 Berton Gunter wrote:

> > > bounds for each group.  My question is, is there a function 
> > > in R that can do 
> > > the same thing for more complex and subtle groupings in 
> > > univariate data, and 
> 
> >>    ** provide a statistical basis for the result? **
> 
> No. Others have suggested useful ways to **generate** reasonable
> hypotheses about "subtle groupings" in the data; however, by the
> nature and logic of hypothesis testing, one cannot then evaluate the
> statistical "significance" of any groupings that one purports to have
> found.

Just one more remark on this:
The above is, of course, true if standard inference would be applied
on the same data that was used for finding the groupings. But it is also
possible to test for the existence of such groupings using non-standard
inference.

For example, in a structural change context the supF test of Andrews
(1993, Econometrica) is very popular in econometrics. It is essentially
the LR statistic of the model without a breakpoint vs. the optimally
segmented model with one breakpoint. But the distribution is then no
longer Chi-squared as it has to be accounted for the selection of the
breakpoint (i.e., the groupings).

Hence, the standard approach in econometrics for this would be:
  1. test for the existence of breaks
  2. estimate breakpoints (if not already  implicitely done in 1.)

Of course, using a (cross-)validation approach is not a bad idea,
either!
Z