[R-sig-Geo] Slow writing of point features to SpatialLite-DB or Geopackage
Loïc Dutrieux
loic.dutrieux at conabio.gob.mx
Thu Aug 24 23:44:59 CEST 2017
On 24/08/17 10:23, Roger Bivand wrote:
> On Thu, 24 Aug 2017, manuel.schneider at agroscope.admin.ch wrote:
>
>> Dear list
>>
>> I am searching alternatives to ESRI shapefiles for the storage of GPS
>> data, i.e. tagged point features, and came across SpatialLite or
>> Geopackage. Unfortunately writing to both formats is very slow
>> compared to shapefiles making practical use impossible.
>>
>> library(sf)
>> library(rgdal)
>> library(RSQLite)
>>
>> n<- 1000
>> d <-data.frame(a=1:n, X=rnorm(n,1,1), Y=rnorm(n,1,1))
>> mp1 <- st_as_sf(d, coords=c("X","Y"))
>>
>> t1 <- system.time(st_write(mp1, dsn = 'C:/Temp/data1.shp', driver =
>> 'ESRI Shapefile'))
>> t2 <- system.time(st_write(mp1, dsn = 'C:/Temp/test.sqlite', layer =
>> 'data1', driver = 'SQLite'))
>> t3 <- system.time(st_write(mp1, "C:/Temp/data1.gpkg"))
>>
>> rbind(t1,t2,t3)[,1:3]
>>
>> user.self sys.self elapsed
>> t1 0.03 0.03 0.09
>> t2 0.53 5.04 29.33
>> t3 0.48 4.29 32.19
>>
>> As n increases, processing time explodes for SpatialLite and
>> Geopackage, and I usually have a couple of 10000 points to store. Any
>> experiences of others would be highly appreciated.
>
> Fedora 26 64-bit:
>
> n 1000
>
>> rbind(t1,t2,t3)[,1:3]
> user.self sys.self elapsed
> t1 0.007 0.001 0.010
> t2 0.067 0.035 0.103
> t3 0.029 0.042 0.073
>
> n 25000
>
>> rbind(t1,t2,t3)[,1:3]
> user.self sys.self elapsed
> t1 0.120 0.032 0.153
> t2 0.412 0.829 1.247
> t3 0.645 0.834 1.487
>
> R version 3.4.1 (2017-06-30)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Fedora 26 (Workstation Edition)
> other attached packages:
> [1] sf_0.5-3
>
> loaded via a namespace (and not attached):
> [1] compiler_3.4.1 magrittr_1.5 tools_3.4.1 DBI_0.7
> units_0.4-5
> [6] Rcpp_0.12.12 udunits2_0.13 grid_3.4.1
>
I also get large differences on ubuntu 16.04 64-bits with ssd;
particularly when writing a second layer to an existing geopackage
library(sf)
n <- 1000
d <- data.frame(a=1:n, X=rnorm(n,1,1), Y=rnorm(n,1,1))
mp1 <- st_as_sf(d, coords=c("X","Y"))
td <- tempdir()
file.remove(list.files(td, full.names = TRUE))
t1 <- system.time(st_write(mp1, dsn = file.path(td, 'data1.shp'), driver
= 'ESRI Shapefile'))
t2 <- system.time(st_write(mp1, dsn = file.path(td, 'data2.sqlite'),
layer = 'layer1', driver = 'SQLite'))
t3 <- system.time(st_write(mp1, dsn = file.path(td, 'data2.sqlite'),
layer = 'layer2', driver = 'SQLite'))
t4 <- system.time(st_write(mp1, dsn = file.path(td, 'data3.gpkg'), layer
= 'layer1'))
t5 <- system.time(st_write(mp1, dsn = file.path(td, 'data3.gpkg'), layer
= 'layer2'))
rbind(t1,t2,t3,t4,t5)[,1:3]
user.self sys.self elapsed
t1 0.012 0.000 0.010
t2 0.180 0.456 8.993
t3 0.220 0.460 10.637
t4 0.016 0.064 0.082
t5 0.200 0.472 9.199
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS
other attached packages:
[1] sf_0.5-3 raster_2.5-8 sp_1.2-4
loaded via a namespace (and not attached):
[1] compiler_3.4.0 magrittr_1.5 DBI_0.6-1 tools_3.4.0
units_0.4-5 yaml_2.1.14 Rcpp_0.12.10 udunits2_0.13
grid_3.4.0 lattice_0.20-35
Cheers,
Loïc
> There is no need to load rgdal or RSQLite, neither are needed or used.
> For portability use tempdir():
>
> t1 <- system.time(st_write(mp1, dsn = paste0(td, 'data1.shp')))
> t2 <- system.time(st_write(mp1, dsn = paste0(td, 'test.sqlite'), layer =
> 'data1', driver = 'SQLite'))
> t3 <- system.time(st_write(mp1, paste0(td, 'data1.gpkg')))
>
> Maybe an order of magnitude difference because the databases need
> initialising, but nothing like your scale; does 32/64 bit make a
> difference?
>
> I'm assuming that you installed sf as a Windows binary from CRAN?
>
> Consider using a github issue when others have tried tis out on other
> platforms.
>
> Roger
>
>> Many thanks
>> Manuel
>>
>>
>> ------
>> R version 3.4.1 (2017-06-30)
>> Platform: i386-w64-mingw32/i386 (32-bit)
>> Running under: Windows 7 (build 7601) Service Pack 1
>>
>> Matrix products: default
>>
>> locale:
>> [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252
>> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
>> [5] LC_TIME=German_Switzerland.1252
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] sf_0.5-3 RSQLite_2.0 rgdal_1.2-8 sp_1.2-5
>>
>> loaded via a namespace (and not attached):
>> [1] Rcpp_0.12.12 lattice_0.20-35 digest_0.6.12 grid_3.4.1
>> DBI_0.7
>> [6] magrittr_1.5 units_0.4-5 rlang_0.1.2 blob_1.1.0
>> tools_3.4.1
>> [11] udunits2_0.13 bit64_0.9-7 bit_1.1-12 compiler_3.4.1
>> memoise_1.1.0
>> [16] tibble_1.3.4
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>
More information about the R-sig-Geo
mailing list