[R-sig-Geo] dotsInPolys sometimes not generating correct number of points

Roger Bivand Roger.Bivand at nhh.no
Sat Sep 24 20:11:26 CEST 2016


On Sat, 24 Sep 2016, Nick Eubank wrote:

> Hi Roger,
>
> Thanks for explaining. This isn't for maps -- it's for statistical analyses.
> My interest is in converting my polygons to points so I can do some
> point-statistic calculations on the distribution of people (the points I'm
> adding in each polygon). I have a lot of small polygons (with large
> variation in the number of points in each), so I think the point pattern
> will be relatively well preserved (at least on scales I care about) if I
> uniformly distribute points within each polygon.  Thus the magnitudes of
> points that don't make sense for visualization.
>
> Is there a magnitude of points where the function would perform reasonably?
> i.e. is this an integer overflow problem or something where, if I stay
> below a given number of points I can count on a set number of truly
> randomly distributed points? Or do I need a different tool?

Try to use spsample directly on a small subset of polygons to see whether 
you need to increase the iter= argument radically. If you have lots of 
oddish polygons that are very different in shape to 
vertically-oriented rectangles, you are going to need lots of tries. Find 
the answer for the small subset of polygons and scale up from there. 
sp::spsample() takes a sample in the bounding box of the polygon, drops 
points falling outside the polygon, and goes on trying until the effort 
feels pointless - you'd need to tell it to carry on to the bitter end. 
Look at the relevant code for the methods, and work out where it is 
bailing out - probably not scale, more likely oddish polygons.

Roger

>
> Relatedly, might I suggested that this is something that should be made
> more clear in the documentation of the function? Right now, there's nothing
> in the docs suggesting this only appropriate for visualization, or that it
> only *approximately* creates a set of randomly distributed points, and
> can't handle large N.
>
> Thanks again,
>
> Nick
>
> On Sat, Sep 24, 2016 at 2:39 AM Roger Bivand <Roger.Bivand at nhh.no> wrote:
>
>> On Sat, 24 Sep 2016, Nick Eubank wrote:
>>
>>> Hi All,
>>>
>>> Trying to run dotsInPolys from maptools on a set of polygons. I have one
>>> set of polygons for each US state, along with a vector of desired points
>>> for each polygon.
>>>
>>> However, when I check the number of dots created after running the code,
>> I
>>> find that for some states -- but not all -- the number of points doesn't
>>> match the number of dots that were specified. In particular, I seem to
>> have
>>> fewer dots than expected.
>>
>> maptools::dotsinpolys() was written for visualization only (so the number
>> of points could be apprpoximate), a long time ago, and wraps looped calls
>> to sp::spsample() methods. Look at sp::spsample() to see that it tries to
>> sub-divide the number of random sample points among rings, but when a
>> Polygons object has many member Polygon objects, and they have awkward
>> geometries, it will struggle to get to the right number, iterating by
>> default 4 times to increase the density within the bounding box of the
>> Polygon object. maptools::dotsinpolys() doesn't pass through the tuning
>> arguments to sp::spsample(), but you could. I'm really uncertain how
>> visiulizing thousands of polygons works, same for millions of random
>> points.
>>
>> Hope this clarifies,
>>
>> Roger
>>
>>>
>>> Paraphrasing somewhat (haven't come up with a minimal replicating
>> example I
>>> can post, sorry), I have a polygon file called `my.polygons`, and I'm
>>> effectively doing the following:
>>>
>>>    my.dots <- dotsInPolys(my.polygons, my.polygons[,'count.vector'])
>>>
>>> I've confirmed that `typeof(my.polygons[,'count.vector'])` is
>> `"integer"`,
>>> and get no warnings running dotsInPolys. Nevertheless, for many states, I
>>> get FALSE from:
>>>
>>>    length(my.dots) == sum(my.polygons$count.vector)
>>>
>>> There are no NAs in the count vector.
>>>
>>> No zero-area polygons. using rgeos, one state that is having problems
>>> reports areas of:
>>>
>>>    summary(gArea(my.polygons, byid=TRUE))
>>>        Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
>>>        1.474e+04 9.858e+05 4.264e+06 3.189e+07 3.159e+07 1.144e+09
>>>
>>> I also get `gIsValid(my.polygons)` is TRUE, so it's not that...
>>>
>>> It seems like I get more problems in states with more polygons / counts,
>> if
>>> that's at all informative. For example, one problem state has 3,671
>>> polygons, and should have 634,397 points, but only ends up with 632,474.
>>> Another with 6,983 polygons should have 3,942,589 points, but has
>>> only 3,840,180. By contrast, no problems in a state with 343 polygons
>>> and 120,397 points.
>>>
>>> Any suggestions on possible cause?
>>>
>>> Thanks,
>>>
>>> Nick
>>>
>>>       [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>
>> --
>> Roger Bivand
>> Department of Economics, Norwegian School of Economics,
>> Helleveien 30, N-5045 Bergen, Norway.
>> voice: +47 55 95 93 55; fax +47 55 95 91 00
>> e-mail: Roger.Bivand at nhh.no
>> http://orcid.org/0000-0003-2392-6140
>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>> http://depsy.org/person/434412
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 91 00
e-mail: Roger.Bivand at nhh.no
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
http://depsy.org/person/434412



More information about the R-sig-Geo mailing list