[R-sig-Geo] Using duckdb spatial module from R (with sf)?

Carl Boettiger cboett|g @end|ng |rom gm@||@com
Wed Aug 16 02:56:35 CEST 2023


thanks!  of course the polygon can be a triangle, but I think the
difference here is that (a) I wanted st_within() if I'm giving my
points first and then my filter, and I think st_contains / st_within
have slightly different opinions about being on the exact boundary
than st_filter().

Anyway, fixing my filter does indeed work!  Thanks Josiah for the help
here.  This is very appealing to me thanks to the ability of duckdb to
open very large remote csv / parquet files without downloading them,
which often contain lat / long columns in my work.


I'll stop bugging the rest of you :-)




here's the working filter for comparison in case anyone is following along.

## a trivial search polygon to filter:
p2 <- rbind(c(0,0), c(0,3), c(3,3), c(3,0), c(0,0))
pol <-st_polygon(list(p2))
txt <- st_as_text(pol)

## Can we use this to filter duckdb?
sql <- tbl(conn, "test") |>
  mutate(geometry = ST_Point(longitude, latitude),
         geom = ST_AsHEXWKB(geometry)) |>
  filter(ST_Within(geometry, ST_GeomFromText({txt}))) |>
  dbplyr::sql_render()
sql
ex <- st_read(conn, query=sql, geometry_column = "geom", EWKB=FALSE)
ex

(note this still differs with sf::st_filter() regarding the 3,3 point,
which I believe is well documented I just overlooked it).

---
Carl Boettiger
http://carlboettiger.info/

On Tue, Aug 15, 2023 at 5:39 PM Josiah Parry <josiah.parry using gmail.com> wrote:
>
> Quick note on the contains: a polygon has 5 points! The first has to be the same as the last. And they shuold be going in Counter clock wise order if memory serves ! :)
>
> On Tue, Aug 15, 2023 at 8:34 PM Carl Boettiger <cboettig using gmail.com> wrote:
>>
>> Hi list,
>>
>> One more go at this and I'll stop for other ideas.  Below is a
>> slightly modified example using dbplyr::sql_render() to construct the
>> query, which works and to me feels a little cleaner, though maybe is
>> ill-advised.  Then I try to construct a spatial filter (for points in
>> a polygon) this way with ST_Contains, but my effort comes back empty.
>> Not sure where I went wrong.  Any ideas?
>>
>>
>> ## boilerplate setup
>> library(duckdb)
>> conn <- DBI::dbConnect(duckdb::duckdb())
>> status <- DBI::dbExecute(conn, "INSTALL 'spatial';")
>> status <- DBI::dbExecute(conn, "LOAD 'spatial';")
>> test <- data.frame(site = letters[1:10], latitude = 1:10, longitude = 1:10)
>> DBI::dbWriteTable(conn, "test", test)
>>
>> ## Here we go, works!
>> sql <- tbl(con, "test") |>
>>   mutate(geom = ST_AsHEXWKB(ST_Point(longitude, latitude))) |>
>>   dbplyr::sql_render()
>> ex <- st_read(conn, query=sql, geometry_column = "geom", EWKB=FALSE)
>> ex
>>
>> ## a trivial search polygon to filter:
>> p2 <- rbind(c(1,1), c(1,2), c(2,2), c(1,1))
>> pol <-st_polygon(list(p2))
>> txt <- st_as_text(pol)
>>
>> ## Can we use this to filter duckdb?
>> sql <- tbl(conn, "test") |>
>>   mutate(geometry = ST_Point(longitude, latitude),
>>          geom = ST_AsHEXWKB(geometry)) |>
>>   filter(ST_Contains(geometry, ST_GeomFromText({txt}))) |>
>>   dbplyr::sql_render()
>> sql
>> ex <- st_read(conn, query=sql, geometry_column = "geom", EWKB=FALSE)
>> ex
>> ## oh no! our result comes back empty?  did I need CRS on this?  Or do
>> I missunderstand "ST_Contains()"?
>>
>> ## Here's what the desired result would be from pure sf:
>> sf <- st_as_sf(test, coords = c(3,2))
>> filter <- st_as_sf(tibble(sites = "A", geometry = list(pol)))
>> sf |> st_filter(filter)
>>
>>
>>
>>
>> ---
>> Carl Boettiger
>> http://carlboettiger.info/
>>
>> On Tue, Aug 15, 2023 at 4:42 PM Carl Boettiger <cboettig using gmail.com> wrote:
>> >
>> > Ah ha.  if we ask duckdb to coerce it's geometry format to hex, it
>> > appears sf can understand it just fine!
>> >
>> > replacing the above query with the following we are good to go.
>> >
>> >
>> >
>> > query <- paste(
>> >   'SELECT *, ST_AsHEXWKB(ST_Point(longitude, latitude)) AS geom',
>> >   'FROM "test"'
>> > )
>> > ex <- st_read(con, query=query, geometry_column = "geom", EWKB=FALSE)
>> > ex
>> >
>> >
>> > I'll experiment further.  I'm terrible at SQL, and to my eyes it looks
>> > needlessly verbose.  I'm keen to understand how I can better leverage
>> > sf's notation to compose the sql queries to duckdb....  but it seems
>> > to work!   I'm also still trying to determine if duckdb is using EWKB
>> > or vanilla WKB....
>> >
>> > ---
>> > Carl Boettiger
>> > http://carlboettiger.info/
>> >
>> > On Tue, Aug 15, 2023 at 4:02 PM Carl Boettiger <cboettig using gmail.com> wrote:
>> > >
>> > > Hi folks,
>> > >
>> > >
>> > > I'm curious if anyone has explored the relatively new spatial
>> > > extension in duckdb (https://duckdb.org/docs/extensions/spatial.html)
>> > > or has any pointers/tutorials regarding its use from R?
>> > >
>> > > Consider the following minimal example that seeks to use the sf
>> > > library to speak to duckdb:
>> > >
>> > >   library(duckdb)
>> > >   library(sf)
>> > >   conn <- DBI::dbConnect(duckdb::duckdb())
>> > >   status <- DBI::dbExecute(conn, "INSTALL 'spatial';")
>> > >   status <- DBI::dbExecute(conn, "LOAD 'spatial';")
>> > >
>> > >   test <- data.frame(site = letters[1:10], latitude = 1:10, longitude = 1:10)
>> > >   DBI::dbWriteTable(conn, "test", test)
>> > >
>> > > # Ok, let's try and make a geometry column
>> > >   query <- paste(
>> > >     'SELECT *, ST_Point(longitude, latitude) AS geom',
>> > >     'FROM "test"'
>> > >   )
>> > >
>> > >   ex <- st_read(con, query=query, geometry_column = "geom")
>> > >   ## error: reading wkb type 0 is not supported
>> > >
>> > >
>> > >   ex <- st_read(con, query=query, geometry_column = "geom", EWKB = FALSE)
>> > >   ## prints: wkbType: 1572864
>> > >   ## Error in CPL_read_wkb(x, EWKB, spatialite) : unsupported wkbType
>> > > dim in switch
>> > >
>> > >  We seem to get closer than I might have hoped (sf doesn't just insist
>> > > that conn isn't postgresgis), but is getting stuck on the encoding of
>> > > the wkb.  Is this something we can work around?  The queries seem to
>> > > run successfully, I just seem to be getting some unsupported ecoding
>> > > of the WKB geometry column....
>> > >
>> > > ---
>> > > Carl Boettiger
>> > > http://carlboettiger.info/
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo using r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo



More information about the R-sig-Geo mailing list