[R] Categories or clusters for univariate data

Tue Feb 22 04:17:53 CET 2005

Allen Hathaway <hathaway <at> sover.net> writes:

: 
: If I have a vector, x, such that
: 
: x <- c(1,2,3,4,5,8,9,10,11,12,15,16,17,18,19,22,23,24,33,34,35)
: 
: if I plot that vector
: 
: plot(x)
: 
: it is visibly obvious that the data "groups" or "clusters" into distinct 
: groupings.  The data trends along a more-or-less linear path, and then an 
: abrupt jump.  For a trivial case, such as I have given, you can pick out the 
: groups or categories visually, and manually derive the upper and lower 
: bounds for each group.  My question is, is there a function in R that can do 
: the same thing for more complex and subtle groupings in univariate data, and 
: provide a statistical basis for the result?

If the actual data is exactly linear and increasing, as with this example, 
then the breakpoints are at points of positive acceleration, thus

   which(diff(x, diff = 2)>0) + 2

gives the indices of the breakpoints.