[R-sig-Geo] Slow writing of point features to SpatialLite-DB or Geopackage

Loïc Dutrieux loic.dutrieux at conabio.gob.mx
Thu Aug 24 23:44:59 CEST 2017



On 24/08/17 10:23, Roger Bivand wrote:
> On Thu, 24 Aug 2017, manuel.schneider at agroscope.admin.ch wrote:
> 
>> Dear list
>>
>> I am searching alternatives to ESRI shapefiles for the storage of GPS 
>> data, i.e. tagged point features, and came across SpatialLite or 
>> Geopackage. Unfortunately writing to both formats is very slow 
>> compared to shapefiles making practical use impossible.
>>
>> library(sf)
>> library(rgdal)
>> library(RSQLite)
>>
>> n<- 1000
>> d <-data.frame(a=1:n, X=rnorm(n,1,1), Y=rnorm(n,1,1))
>> mp1 <- st_as_sf(d, coords=c("X","Y"))
>>
>> t1 <- system.time(st_write(mp1, dsn = 'C:/Temp/data1.shp', driver = 
>> 'ESRI Shapefile'))
>> t2 <- system.time(st_write(mp1, dsn = 'C:/Temp/test.sqlite', layer = 
>> 'data1', driver = 'SQLite'))
>> t3 <- system.time(st_write(mp1, "C:/Temp/data1.gpkg"))
>>
>> rbind(t1,t2,t3)[,1:3]
>>
>>   user.self sys.self elapsed
>> t1      0.03     0.03    0.09
>> t2      0.53     5.04   29.33
>> t3      0.48     4.29   32.19
>>
>> As n increases, processing time explodes for SpatialLite and 
>> Geopackage, and I usually have a couple of 10000 points to store. Any 
>> experiences of others would be highly appreciated.
> 
> Fedora 26 64-bit:
> 
> n 1000
> 
>> rbind(t1,t2,t3)[,1:3]
>     user.self sys.self elapsed
> t1     0.007    0.001   0.010
> t2     0.067    0.035   0.103
> t3     0.029    0.042   0.073
> 
> n 25000
> 
>> rbind(t1,t2,t3)[,1:3]
>     user.self sys.self elapsed
> t1     0.120    0.032   0.153
> t2     0.412    0.829   1.247
> t3     0.645    0.834   1.487
> 
> R version 3.4.1 (2017-06-30)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Fedora 26 (Workstation Edition)
> other attached packages:
> [1] sf_0.5-3
> 
> loaded via a namespace (and not attached):
> [1] compiler_3.4.1 magrittr_1.5   tools_3.4.1    DBI_0.7
>    units_0.4-5
> [6] Rcpp_0.12.12   udunits2_0.13  grid_3.4.1
> 

I also get large differences on ubuntu 16.04 64-bits with ssd; 
particularly when writing a second layer to an existing geopackage

library(sf)

n <- 1000
d <- data.frame(a=1:n, X=rnorm(n,1,1), Y=rnorm(n,1,1))
mp1 <- st_as_sf(d, coords=c("X","Y"))

td <- tempdir()
file.remove(list.files(td, full.names = TRUE))

t1 <- system.time(st_write(mp1, dsn = file.path(td, 'data1.shp'), driver 
= 'ESRI Shapefile'))
t2 <- system.time(st_write(mp1, dsn = file.path(td, 'data2.sqlite'), 
layer = 'layer1', driver = 'SQLite'))
t3 <- system.time(st_write(mp1, dsn = file.path(td, 'data2.sqlite'), 
layer = 'layer2', driver = 'SQLite'))
t4 <- system.time(st_write(mp1, dsn = file.path(td, 'data3.gpkg'), layer 
= 'layer1'))
t5 <- system.time(st_write(mp1, dsn = file.path(td, 'data3.gpkg'), layer 
= 'layer2'))

rbind(t1,t2,t3,t4,t5)[,1:3]

    user.self sys.self elapsed
t1     0.012    0.000   0.010
t2     0.180    0.456   8.993
t3     0.220    0.460  10.637
t4     0.016    0.064   0.082
t5     0.200    0.472   9.199

R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

other attached packages:
[1] sf_0.5-3     raster_2.5-8 sp_1.2-4

loaded via a namespace (and not attached):
  [1] compiler_3.4.0  magrittr_1.5    DBI_0.6-1       tools_3.4.0 
units_0.4-5     yaml_2.1.14     Rcpp_0.12.10    udunits2_0.13 
grid_3.4.0      lattice_0.20-35

Cheers,
Loïc

> There is no need to load rgdal or RSQLite, neither are needed or used. 
> For portability use tempdir():
> 
> t1 <- system.time(st_write(mp1, dsn = paste0(td, 'data1.shp')))
> t2 <- system.time(st_write(mp1, dsn = paste0(td, 'test.sqlite'), layer = 
> 'data1', driver = 'SQLite'))
> t3 <- system.time(st_write(mp1, paste0(td, 'data1.gpkg')))
> 
> Maybe an order of magnitude difference because the databases need 
> initialising, but nothing like your scale; does 32/64 bit make a 
> difference?
> 
> I'm assuming that you installed sf as a Windows binary from CRAN?
> 
> Consider using a github issue when others have tried tis out on other 
> platforms.
> 
> Roger
> 
>> Many thanks
>> Manuel
>>
>>
>> ------
>> R version 3.4.1 (2017-06-30)
>> Platform: i386-w64-mingw32/i386 (32-bit)
>> Running under: Windows 7 (build 7601) Service Pack 1
>>
>> Matrix products: default
>>
>> locale:
>> [1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
>> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
>> [5] LC_TIME=German_Switzerland.1252
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] sf_0.5-3    RSQLite_2.0 rgdal_1.2-8 sp_1.2-5
>>
>> loaded via a namespace (and not attached):
>> [1] Rcpp_0.12.12    lattice_0.20-35 digest_0.6.12   grid_3.4.1      
>> DBI_0.7
>> [6] magrittr_1.5    units_0.4-5     rlang_0.1.2     blob_1.1.0      
>> tools_3.4.1
>> [11] udunits2_0.13   bit64_0.9-7     bit_1.1-12      compiler_3.4.1  
>> memoise_1.1.0
>> [16] tibble_1.3.4
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>



More information about the R-sig-Geo mailing list