[R-sig-Geo] split() function on SpatialPolygonsDataFrame increases file size
mdsumner at gmail.com
Tue Mar 22 22:44:56 CET 2016
On Wed, 23 Mar 2016, 05:19 Luke Macaulay <luke.macaulay at gmail.com> wrote:
> I have a large SpatialPolygonsDataFrame of 500,000 polygons named sub
> that I am splitting into 8 separate objects using split() to perform
> multicore processing on.
> xx<-split(sub, rep(1:cores, len=nrow(sub at data)))
> The original file size in R's environment shows 4gb, but after the
> split, the list size increases to 7gb, which seems like a really big
> Is this normal? I wonder if there's increased file size due to the
> reproduction of polygon borders and vertices that were previously
> shared in the unsplit data, or is something else is going on?
These objects never share vertices. If you think that can help there are
ways to store these objects as tables that removes redundancy.
Can you set up a clear demonstration that is reproducible? I think advice
here needs much more info, particularly on what kind of shapes your
polygons are and what the processing is to do.
> suspect that the split files are retaining some of the entire
> dataset's characteristics, but I'm not sure.
> I thought this post
> would solve my problem: the poster split by a numeric ID variable that
> was used as an index in the created list, leading to the creation of
> many empty lists. But I'm not splitting on a column, and after trying
> the split in various ways, including trying to split on a created
> column that was a factor, I still have the same problem.
> The problem ultimately is that when I try to process this on multiple
> cores, I max out my memory.
> Much thanks,
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
Dr. Michael Sumner
Software and Database Engineer
Australian Antarctic Division
203 Channel Highway
Kingston Tasmania 7050 Australia
[[alternative HTML version deleted]]
More information about the R-sig-Geo