[R-sig-Geo] classInt::classIntervals only for very small data sets?

Roger Bivand Roger.Bivand at nhh.no
Fri Feb 26 22:51:31 CET 2010


On Fri, 26 Feb 2010, Jochen Albrecht wrote:

>
> I am trying to do a natural breaks (Jenks) classification on a data set with 
> some 300,000 observations. I started with
>    salescat=classIntervals(sales, 100, style="jenks")

The "jenks" method is programmed in R, the "fisher" method in Fortran. On 
my laptop, 10000 values into 100 classes with "fisher" ran in under a 
minute. The fact that Arc claims to do things (which are not documented in 
code) doesn't mean that what you see is what is actually happening. I 
believe that in an earlier thread it was suggested that Arc samples the 
input data and adds the range to be sure to include everyone. You could do 
the same if you like - "fisher" and "jenks" are very similar.

Hope this helps,

Roger

> and cut this process off after it ran for 12 hours. Then I tried it with just 
> ten classes but this made no difference.
> With a subset of just 300 observations, it runs for 36 seconds.
> For 1,000 records, it runs seven minutes and then throws the following error:
>    Error in if (mat2[l, j] >= (v + mat2[i4, j - 1])) { :
>      missing value where TRUE/FALSE needed
>    In addition: Warning message:
>    In val * val : NAs produced by integer overflow
> I checked the data interactively; there are no missing or non-integer values.

PS: Integer overflow is caused by the implementation, not by the data

> ArcMap classifies it instantaneously without hiccups and takes about ten 
> seconds for all 300,000 records (though limiting itself to a maximum of 32 
> classes).

PPS: It gives a classification among many possible "natural breaks" 
classifications, for a version of Jenks. Whether this is misleading or 
not is a different question, as is the legibility of a graphic with 100 
classes and 300000 discrete entities (unless raster, where entities do 
not differ in shape.

> Do you have any suggestions as to what I am doing wrong or what could be done 
> to resolve my problem?
> Cheers,
>    Jochen
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list