[R-sig-Geo] split() function on SpatialPolygonsDataFrame increases file size

Luke Macaulay luke.macaulay at gmail.com
Tue Mar 22 19:18:59 CET 2016


I have a large SpatialPolygonsDataFrame of 500,000 polygons named sub
that I am splitting into 8 separate objects using split() to perform
multicore processing on.

xx<-split(sub, rep(1:cores, len=nrow(sub at data)))

The original file size in R's environment shows 4gb, but after the
split, the list size increases to 7gb, which seems like a really big
increase.

Is this normal?  I wonder if there's increased file size due to the
reproduction of polygon borders and vertices that were previously
shared in the unsplit data, or is something else is going on? I
suspect that the split files are retaining some of the entire
dataset's characteristics, but I'm not sure.

I thought this post
(http://stackoverflow.com/questions/29137914/r-split-function-size-increase-issue)
would solve my problem: the poster split by a numeric ID variable that
was used as an index in the created list, leading to the creation of
many empty lists. But I'm not splitting on a column, and after trying
the split in various ways, including trying to split on a created
column that was a factor, I still have the same problem.

The problem ultimately is that when I try to process this on multiple
cores, I max out my memory.

Much thanks,
Luke



More information about the R-sig-Geo mailing list