[R-sig-Geo] classInt::classIntervals only for very small data sets?
Roger Bivand
Roger.Bivand at nhh.no
Fri Feb 26 22:51:31 CET 2010
On Fri, 26 Feb 2010, Jochen Albrecht wrote:
>
> I am trying to do a natural breaks (Jenks) classification on a data set with
> some 300,000 observations. I started with
> salescat=classIntervals(sales, 100, style="jenks")
The "jenks" method is programmed in R, the "fisher" method in Fortran. On
my laptop, 10000 values into 100 classes with "fisher" ran in under a
minute. The fact that Arc claims to do things (which are not documented in
code) doesn't mean that what you see is what is actually happening. I
believe that in an earlier thread it was suggested that Arc samples the
input data and adds the range to be sure to include everyone. You could do
the same if you like - "fisher" and "jenks" are very similar.
Hope this helps,
Roger
> and cut this process off after it ran for 12 hours. Then I tried it with just
> ten classes but this made no difference.
> With a subset of just 300 observations, it runs for 36 seconds.
> For 1,000 records, it runs seven minutes and then throws the following error:
> Error in if (mat2[l, j] >= (v + mat2[i4, j - 1])) { :
> missing value where TRUE/FALSE needed
> In addition: Warning message:
> In val * val : NAs produced by integer overflow
> I checked the data interactively; there are no missing or non-integer values.
PS: Integer overflow is caused by the implementation, not by the data
> ArcMap classifies it instantaneously without hiccups and takes about ten
> seconds for all 300,000 records (though limiting itself to a maximum of 32
> classes).
PPS: It gives a classification among many possible "natural breaks"
classifications, for a version of Jenks. Whether this is misleading or
not is a different question, as is the legibility of a graphic with 100
classes and 300000 discrete entities (unless raster, where entities do
not differ in shape.
> Do you have any suggestions as to what I am doing wrong or what could be done
> to resolve my problem?
> Cheers,
> Jochen
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no
More information about the R-sig-Geo
mailing list