[R-sig-Geo] Natural Breaks Classification

David Bitner osgis.lists at gmail.com
Fri Feb 24 23:28:53 CET 2006


Most of my problem is that this is my first time using R, so I just
don't know the functions (or even really the semantic differences) for
dealing with vectors,arrays, and matrices.

Thank you very much for the code sample.  I have been looking around,
is there a good basic reference online to find out basic function
information so I can reference what things like order, match, unlist,
tapply, etc. are doing?  I have yet to find just a good functional
reference doc for R.

Given that I am not plotting my data in R, I think that you are going
further than what I actually need.  Given the small example data set,
all that I really need is just the two numbers 4 and 11 (before I had
said 3, but I realized the third is unnecessary). From these I plan to
construct my classes as <4,>=4 and <11, >=11 when I create my
classification in MapServer where I am doing the plotting.

David



On 2/24/06, Roger Bivand <Roger.Bivand at nhh.no> wrote:
> On Fri, 24 Feb 2006, David Bitner wrote:
>
> > I was actually already using that paper as a resource, but I am still
> > not quite understanding how to get just the breaks that I can use for
> > classification in an external mapping application.
> >
> > Right now I am trying out some samples just to understand what's going on.
> >
> > Using this array:
> > a<-c(11,1,3,3,4,4,4,4,4,4,5)
> > This gives me the vector (3,1,1,1,2,2,2,2,2,2,2)
> >
> > ideally I want the output for this to be just (3,5,11)
> >
> > Taking this with the min/max of my data I can create the classes
> > 1-3,4-5,6-11
> >
>
> Next line on p. 15:
>
> > a<-c(11,1,3,3,4,4,4,4,4,4,5)
> > av <- kmeans(a, 3)
> > str(av)
> List of 4
>  $ cluster : int [1:11] 3 1 1 1 2 2 2 2 2 2 ...
>  $ centers : num [1:3, 1]  2.33  4.14 11.00
>   ..- attr(*, "dimnames")=List of 2
>   .. ..$ : chr [1:3] "1" "2" "3"
>   .. ..$ : NULL
>  $ withinss: num [1:3] 2.667 0.857 0.000
>  $ size    : int [1:3] 3 7 1
>  - attr(*, "class")= chr "kmeans"
> > cols <- match(av$cluster, order(av$centers))
> > cols
>  [1] 3 1 1 1 2 2 2 2 2 2 2
> > stbrks <- matrix(unlist(tapply(a, factor(cols), range)), ncol = 2,
> + byrow = TRUE)
> > stbrks
>      [,1] [,2]
> [1,]    1    3
> [2,]    4    5
> [3,]   11   11
>
> which then needs unpacking:
>
> > apply(stbrks, 1, function(x) paste(x, collapse="-"))
> [1] "1-3"   "4-5"   "11-11"
>
> but which isn't pretty, because there is a gap. As plot(ecdf(jitter(a)),
> verticals=TRUE) shows, the gap is there in the data, so it is only a
> matter of convention that class intervals are shown for values that are
> not in the data, and my preference is to let the data do the talking (but
> I'm in a minority!), and cartographic convention will probably mean
> shifting the internal gaps to "look" smaller, and rounding the values to
> make them easier to handle. For larger numbers of data items, it may be
> necessary to get the breaks and re-assign the cols because of rounding.
>
> The stbrks matrix is just the range within the class for each class for
> the cols resulting from the classification.
>
> With kmeans, note that it can get stuck, so I'd put it inside try() and
> copy jittered values onto the end of the data vector if it fails - simply
> because kmeans seeds its classes at random first, and can be unlucky.
>
> Please let the list know how you get on, I'm sure class intervals are an
> interesting topic.
>
> Roger
>
> >
> >
> > On 2/24/06, Roger Bivand <Roger.Bivand at nhh.no> wrote:
> > > On Fri, 24 Feb 2006, David Bitner wrote:
> > >
> > > > I am trying to create some type of a Natural Breaks Classification in
> > > > PL/R  to classify data that I have in a PostgreSQL/PostGIS database.
> > > > All that I really need so that I can pass information on to Mapserver
> > > > to display this data is the class break values (ie an array [3,5,7,9]
> > > > would mean show values 1-3 in blue, 4-5 in red, etc.).  I am
> > > > completely new to R, but have a fair bit of experience with the other
> > > > PL's in PostgreSQL, so the PL part shouldn't be too hard to figure
> > > > out.
> > > >
> > > > I am thinking that kmeans should give me something close to what I
> > > > want.  My problem is that I am not quite sure to massage the output to
> > > > get my class breaks or what format to input the data.
> > > >
> > > > For arguments, kmeans takes a matrix, number of classes, and a max
> > > > number of iterations --
> > > > last two are easy, but how do I convert (do I need to convert) an
> > > > array into a matrix (is matrix just R speak for an array?)  I will be
> > > > starting with a one dimension array ie ([1,4,1,1,1,6,4,9,9]).
> > > >
> > > > The next issue is spitting the data out.  The docs tell me that I get
> > > > the centers of the clusters where what I really want are the
> > > > boundaries of the clusters, how could I get at the "break points" that
> > > > I am after?
> > >
> > > I think you may find some of the code in:
> > >
> > > http://spatial.nhh.no/papers/aag04.pdf
> > >
> > > useful (though dated). kmeans() and - my preference - bclust() in the
> > > e1071 package may fail when given too few values, but there are ways round
> > > that. You'll see code (for example at the top of p. 15) on how to get the
> > > class centres out - but they - as done there - still leave gaps between
> > > classes as can be seen from the ECDF plot below. As you can see, kmeans()
> > > manages with a vector OK.
> > >
> > > It would be nice to follow this up as a single function - input the data
> > > vector and some preferences wrt. number of classes, and output as a number
> > > of (list of) class intervals with some fitness criterion, something like
> > > that?
> > >
> > > I guess your colour palette is on the PL side?
> > >
> > > Roger
> > >
> > > >
> > > > Any help is appreciated,
> > > > David
> > > >
> > > > _______________________________________________
> > > > R-sig-Geo mailing list
> > > > R-sig-Geo at stat.math.ethz.ch
> > > > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> > > >
> > >
> > > --
> > > Roger Bivand
> > > Economic Geography Section, Department of Economics, Norwegian School of
> > > Economics and Business Administration, Helleveien 30, N-5045 Bergen,
> > > Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
> > > e-mail: Roger.Bivand at nhh.no
> > >
> > >
> >
>
> --
> Roger Bivand
> Economic Geography Section, Department of Economics, Norwegian School of
> Economics and Business Administration, Helleveien 30, N-5045 Bergen,
> Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
> e-mail: Roger.Bivand at nhh.no
>
>




More information about the R-sig-Geo mailing list