[R-sig-Geo] Slow writing of point features to SpatialLite-DB or Geopackage

Roger Bivand Roger.Bivand at nhh.no
Thu Aug 24 17:23:23 CEST 2017


On Thu, 24 Aug 2017, manuel.schneider at agroscope.admin.ch wrote:

> Dear list
>
> I am searching alternatives to ESRI shapefiles for the storage of GPS data, i.e. tagged point features, and came across SpatialLite or Geopackage. Unfortunately writing to both formats is very slow compared to shapefiles making practical use impossible.
>
> library(sf)
> library(rgdal)
> library(RSQLite)
>
> n<- 1000
> d <-data.frame(a=1:n, X=rnorm(n,1,1), Y=rnorm(n,1,1))
> mp1 <- st_as_sf(d, coords=c("X","Y"))
>
> t1 <- system.time(st_write(mp1, dsn = 'C:/Temp/data1.shp', driver = 'ESRI Shapefile'))
> t2 <- system.time(st_write(mp1, dsn = 'C:/Temp/test.sqlite', layer = 'data1', driver = 'SQLite'))
> t3 <- system.time(st_write(mp1, "C:/Temp/data1.gpkg"))
>
> rbind(t1,t2,t3)[,1:3]
>
>   user.self sys.self elapsed
> t1      0.03     0.03    0.09
> t2      0.53     5.04   29.33
> t3      0.48     4.29   32.19
>
> As n increases, processing time explodes for SpatialLite and Geopackage, 
> and I usually have a couple of 10000 points to store. Any experiences of 
> others would be highly appreciated.

Fedora 26 64-bit:

n 1000

> rbind(t1,t2,t3)[,1:3]
    user.self sys.self elapsed
t1     0.007    0.001   0.010
t2     0.067    0.035   0.103
t3     0.029    0.042   0.073

n 25000

> rbind(t1,t2,t3)[,1:3]
    user.self sys.self elapsed
t1     0.120    0.032   0.153
t2     0.412    0.829   1.247
t3     0.645    0.834   1.487

R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Fedora 26 (Workstation Edition)
other attached packages:
[1] sf_0.5-3

loaded via a namespace (and not attached):
[1] compiler_3.4.1 magrittr_1.5   tools_3.4.1    DBI_0.7
   units_0.4-5
[6] Rcpp_0.12.12   udunits2_0.13  grid_3.4.1

There is no need to load rgdal or RSQLite, neither are needed or used. For 
portability use tempdir():

t1 <- system.time(st_write(mp1, dsn = paste0(td, 'data1.shp')))
t2 <- system.time(st_write(mp1, dsn = paste0(td, 'test.sqlite'), layer = 
'data1', driver = 'SQLite'))
t3 <- system.time(st_write(mp1, paste0(td, 'data1.gpkg')))

Maybe an order of magnitude difference because the databases need 
initialising, but nothing like your scale; does 32/64 bit make a 
difference?

I'm assuming that you installed sf as a Windows binary from CRAN?

Consider using a github issue when others have tried tis out on other 
platforms.

Roger

> Many thanks
> Manuel
>
>
> ------
> R version 3.4.1 (2017-06-30)
> Platform: i386-w64-mingw32/i386 (32-bit)
> Running under: Windows 7 (build 7601) Service Pack 1
>
> Matrix products: default
>
> locale:
> [1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
> [5] LC_TIME=German_Switzerland.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] sf_0.5-3    RSQLite_2.0 rgdal_1.2-8 sp_1.2-5
>
> loaded via a namespace (and not attached):
> [1] Rcpp_0.12.12    lattice_0.20-35 digest_0.6.12   grid_3.4.1      DBI_0.7
> [6] magrittr_1.5    units_0.4-5     rlang_0.1.2     blob_1.1.0      tools_3.4.1
> [11] udunits2_0.13   bit64_0.9-7     bit_1.1-12      compiler_3.4.1  memoise_1.1.0
> [16] tibble_1.3.4
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no
Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en



More information about the R-sig-Geo mailing list