[R-sig-Geo] Natural Breaks Classification

David Bitner osgis.lists at gmail.com
Fri Feb 24 21:27:41 CET 2006


I was actually already using that paper as a resource, but I am still
not quite understanding how to get just the breaks that I can use for
classification in an external mapping application.

Right now I am trying out some samples just to understand what's going on.

Using this array:
a<-c(11,1,3,3,4,4,4,4,4,4,5)
This gives me the vector (3,1,1,1,2,2,2,2,2,2,2)

ideally I want the output for this to be just (3,5,11)

Taking this with the min/max of my data I can create the classes
1-3,4-5,6-11



On 2/24/06, Roger Bivand <Roger.Bivand at nhh.no> wrote:
> On Fri, 24 Feb 2006, David Bitner wrote:
>
> > I am trying to create some type of a Natural Breaks Classification in
> > PL/R  to classify data that I have in a PostgreSQL/PostGIS database.
> > All that I really need so that I can pass information on to Mapserver
> > to display this data is the class break values (ie an array [3,5,7,9]
> > would mean show values 1-3 in blue, 4-5 in red, etc.).  I am
> > completely new to R, but have a fair bit of experience with the other
> > PL's in PostgreSQL, so the PL part shouldn't be too hard to figure
> > out.
> >
> > I am thinking that kmeans should give me something close to what I
> > want.  My problem is that I am not quite sure to massage the output to
> > get my class breaks or what format to input the data.
> >
> > For arguments, kmeans takes a matrix, number of classes, and a max
> > number of iterations --
> > last two are easy, but how do I convert (do I need to convert) an
> > array into a matrix (is matrix just R speak for an array?)  I will be
> > starting with a one dimension array ie ([1,4,1,1,1,6,4,9,9]).
> >
> > The next issue is spitting the data out.  The docs tell me that I get
> > the centers of the clusters where what I really want are the
> > boundaries of the clusters, how could I get at the "break points" that
> > I am after?
>
> I think you may find some of the code in:
>
> http://spatial.nhh.no/papers/aag04.pdf
>
> useful (though dated). kmeans() and - my preference - bclust() in the
> e1071 package may fail when given too few values, but there are ways round
> that. You'll see code (for example at the top of p. 15) on how to get the
> class centres out - but they - as done there - still leave gaps between
> classes as can be seen from the ECDF plot below. As you can see, kmeans()
> manages with a vector OK.
>
> It would be nice to follow this up as a single function - input the data
> vector and some preferences wrt. number of classes, and output as a number
> of (list of) class intervals with some fitness criterion, something like
> that?
>
> I guess your colour palette is on the PL side?
>
> Roger
>
> >
> > Any help is appreciated,
> > David
> >
> > _______________________________________________
> > R-sig-Geo mailing list
> > R-sig-Geo at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >
>
> --
> Roger Bivand
> Economic Geography Section, Department of Economics, Norwegian School of
> Economics and Business Administration, Helleveien 30, N-5045 Bergen,
> Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
> e-mail: Roger.Bivand at nhh.no
>
>




More information about the R-sig-Geo mailing list