[R-sig-Geo] Using duckdb spatial module from R (with sf)?
Carl Boettiger
cboett|g @end|ng |rom gm@||@com
Wed Aug 16 07:22:49 CEST 2023
thanks Dewey, good call on st_aswkb. yup I was puzzled as to what that
internal format is too! from the docs (
https://github.com/duckdblabs/duckdb_spatial#multi-tiered-geometry-type-system
)
> The internal binary format is very similar to the one used by PostGIS -
basically double aligned WKB, and we may eventually look into enforcing the
format to be properly compatible with PostGIS (which may be useful for the
PostGIS scanner extension)
so maybe one day this coercion won't be necessary.
The docs detail which operations are currently 'native' threaded duckdb and
which aren't
https://github.com/duckdblabs/duckdb_spatial#supported-functions
Looks like their roadmap might be of interest to users on this sig as well
(looks like spatial indexing is still on there
<https://github.com/duckdblabs/duckdb_spatial/issues/7>, but that's way
over my head).
So far this works pretty well for me! Using this trivial example of a
spatial polygon filter, I'm able to run a query against a GBIF parquet
snapshot (~2000+ parquet partitions, about 175 GB compressed) on an S3
bucket from duckdb without downloading in about 17 minutes. (doing
approximately the same filtering using only the bounding box of the
polygon, ie. in pure duckdb, w/o spatial extension, takes about 8 minutes,
so the additional overhead of casting lat/long columns to geometry and
filtering with the polygon really isn't that bad!). Note that RAM use is
minimal, as expected.
Cheers,
Carl
---
Carl Boettiger
http://carlboettiger.info/
---
Carl Boettiger
http://carlboettiger.info/
On Tue, Aug 15, 2023 at 5:58 PM Dewey Dunnington <dewey using dunnington.ca>
wrote:
> I think you probably want ST_AsWKB() (almost certainly faster than hex).
> The raw result looks like the internal geometry representation which is
> probably something more like lwgeom's serialized form than WKB.
> As far as I know, they have not yet implemented spatial index or
> prepared
> geometry support; however, plug-and-chug computations like length, area,
> and distance are likely to be much faster (if you turn on threading).
> I haven't played with it much yet but am looking forward to it!
>
> -dewey
>
> library(duckdb)
> #> Loading required package: DBI
>
> conn <- DBI::dbConnect(duckdb::duckdb())
> # status <- DBI::dbExecute(conn, "INSTALL 'spatial';")
> status <- DBI::dbExecute(conn, "LOAD 'spatial';")
>
> test <- data.frame(site = letters[1:10], latitude = 1:10, longitude =
> 1:10)
> DBI::dbWriteTable(conn, "test", test)
>
> # Ok, let's try and make a geometry column
> query <- paste(
> 'SELECT *, ST_AsWKB(ST_Point(longitude, latitude)) AS geom',
> 'FROM "test"'
> )
>
> sf::st_read(conn, query = query, geometry_column = "geom", EWKB = FALSE)
> #> Simple feature collection with 10 features and 3 fields
> #> Geometry type: POINT
> #> Dimension: XY
> #> Bounding box: xmin: 1 ymin: 1 xmax: 10 ymax: 10
> #> CRS: NA
> #> site latitude longitude geom
> #> 1 a 1 1 POINT (1 1)
> #> 2 b 2 2 POINT (2 2)
> #> 3 c 3 3 POINT (3 3)
> #> 4 d 4 4 POINT (4 4)
> #> 5 e 5 5 POINT (5 5)
> #> 6 f 6 6 POINT (6 6)
> #> 7 g 7 7 POINT (7 7)
> #> 8 h 8 8 POINT (8 8)
> #> 9 i 9 9 POINT (9 9)
> #> 10 j 10 10 POINT (10 10)
>
> On 2023-08-15 20:02, Carl Boettiger wrote:
> > Hi folks,
> >
> >
> > I'm curious if anyone has explored the relatively new spatial
> > extension in duckdb (https://duckdb.org/docs/extensions/spatial.html)
> > or has any pointers/tutorials regarding its use from R?
> >
> > Consider the following minimal example that seeks to use the sf
> > library to speak to duckdb:
> >
> > library(duckdb)
> > library(sf)
> > conn <- DBI::dbConnect(duckdb::duckdb())
> > status <- DBI::dbExecute(conn, "INSTALL 'spatial';")
> > status <- DBI::dbExecute(conn, "LOAD 'spatial';")
> >
> > test <- data.frame(site = letters[1:10], latitude = 1:10, longitude =
> > 1:10)
> > DBI::dbWriteTable(conn, "test", test)
> >
> > # Ok, let's try and make a geometry column
> > query <- paste(
> > 'SELECT *, ST_Point(longitude, latitude) AS geom',
> > 'FROM "test"'
> > )
> >
> > ex <- st_read(con, query=query, geometry_column = "geom")
> > ## error: reading wkb type 0 is not supported
> >
> >
> > ex <- st_read(con, query=query, geometry_column = "geom", EWKB =
> > FALSE)
> > ## prints: wkbType: 1572864
> > ## Error in CPL_read_wkb(x, EWKB, spatialite) : unsupported wkbType
> > dim in switch
> >
> > We seem to get closer than I might have hoped (sf doesn't just insist
> > that conn isn't postgresgis), but is getting stuck on the encoding of
> > the wkb. Is this something we can work around? The queries seem to
> > run successfully, I just seem to be getting some unsupported ecoding
> > of the WKB geometry column....
> >
> > ---
> > Carl Boettiger
> > http://carlboettiger.info/
> >
> > _______________________________________________
> > R-sig-Geo mailing list
> > R-sig-Geo using r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
[[alternative HTML version deleted]]
More information about the R-sig-Geo
mailing list