[R-sig-Geo] Natural Breaks Classification

Roger Bivand Roger.Bivand at nhh.no
Fri Feb 24 21:03:46 CET 2006


On Fri, 24 Feb 2006, David Bitner wrote:

> I am trying to create some type of a Natural Breaks Classification in
> PL/R  to classify data that I have in a PostgreSQL/PostGIS database. 
> All that I really need so that I can pass information on to Mapserver
> to display this data is the class break values (ie an array [3,5,7,9]
> would mean show values 1-3 in blue, 4-5 in red, etc.).  I am
> completely new to R, but have a fair bit of experience with the other
> PL's in PostgreSQL, so the PL part shouldn't be too hard to figure
> out.
> 
> I am thinking that kmeans should give me something close to what I
> want.  My problem is that I am not quite sure to massage the output to
> get my class breaks or what format to input the data.
> 
> For arguments, kmeans takes a matrix, number of classes, and a max
> number of iterations --
> last two are easy, but how do I convert (do I need to convert) an
> array into a matrix (is matrix just R speak for an array?)  I will be
> starting with a one dimension array ie ([1,4,1,1,1,6,4,9,9]).
> 
> The next issue is spitting the data out.  The docs tell me that I get
> the centers of the clusters where what I really want are the
> boundaries of the clusters, how could I get at the "break points" that
> I am after?

I think you may find some of the code in:

http://spatial.nhh.no/papers/aag04.pdf

useful (though dated). kmeans() and - my preference - bclust() in the 
e1071 package may fail when given too few values, but there are ways round 
that. You'll see code (for example at the top of p. 15) on how to get the 
class centres out - but they - as done there - still leave gaps between 
classes as can be seen from the ECDF plot below. As you can see, kmeans() 
manages with a vector OK.

It would be nice to follow this up as a single function - input the data 
vector and some preferences wrt. number of classes, and output as a number 
of (list of) class intervals with some fitness criterion, something like 
that?

I guess your colour palette is on the PL side?

Roger

> 
> Any help is appreciated,
> David
> 
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> 

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no




More information about the R-sig-Geo mailing list