[R] Categories or clusters for univariate data

Tue Feb 22 02:14:50 CET 2005

On Mon, 21 Feb 2005, Allen Hathaway wrote:

> If I have a vector, x, such that
>
> x <- c(1,2,3,4,5,8,9,10,11,12,15,16,17,18,19,22,23,24,33,34,35)
>
> if I plot that vector
>
> plot(x)
>
> it is visibly obvious that the data "groups" or "clusters" into distinct
> groupings.  The data trends along a more-or-less linear path, and then an
> abrupt jump.  For a trivial case, such as I have given, you can pick out the
> groups or categories visually, and manually derive the upper and lower
> bounds for each group.  My question is, is there a function in R that can do
> the same thing for more complex and subtle groupings in univariate data, and
> provide a statistical basis for the result?

Maybe breakpoints() in package strucchange can be of help. It looks for
breaks in linear regression relationships over a certain ordering of the
variables.

For the data above:

## setup index variable
idx <- seq(along = x)
## find breaks in linear trend model
library(strucchange)
bp <- breakpoints(x ~ idx)

## visualize fitted model
plot(x)
lines(fitted(bp))

See help(breakpoints) for further information and references for the
underlying theory.

hth,
Z