[R] RSQLite slowness
Ivan Krylov
kry|ov@r00t @end|ng |rom gm@||@com
Thu Oct 7 10:58:09 CEST 2021
On Wed, 6 Oct 2021 16:23:15 +0000
Rasmus Liland <jral using posteo.no> wrote:
> "SELECT * FROM gene2refseq
> LEFT JOIN gene_info ON
> gene_info.GeneID = gene2refseq.GeneID
> WHERE gene2refseq.`RNA_nucleotide_accession.version`
> LIKE ?"
<...>
> x1 <- DBI::dbGetQuery(conn=conn,
> statement=statement,
> param=list(Håkan20210914$RNANucleotideAccession))
I think that the problem here is that you pass a vector as a bound
parameter to LIKE, when parameter placeholders usually expect a scalar.
DBI transparently handles this:
>> The elements of the `params` argument do not need to be scalars,
>> vectors of arbitrary length (including length 0) are supported. For
>> queries, calling dbFetch() binding such parameters returns
>> concatenated results, equivalent to binding and fetching for each
>> set of values and connecting via rbind().
I think this means that DBI runs a SELECT for each value in
Håkan20210914$RNANucleotideAccession, which is understandably slower
than a single query. Unfortunately, it's hard to pass vectors of values
to queries with bound parameters; the SQL engines I know don't have a
syntax for "WHERE param IN (:multi_placeholder:)". SQLite comes with
carray [1], but I don't know whether it's exposed by RSQLite (could be
hard to do in a pointer-safe way), and you're already aware of the
traditional way of doing that: create a temporary table, populate it
and JOIN with the rest of the query.
--
Best regards,
Ivan
[1] https://www.sqlite.org/carray.html
More information about the R-help
mailing list