[R] RSQLite slowness
Rasmus Liland
jr@| @end|ng |rom po@teo@no
Wed Oct 6 18:23:15 CEST 2021
Dear r-help readers,
why is it so much slower to query an
sqlite database using RSQlite «from the
outside» using param like
statement <-
"SELECT * FROM gene2refseq
LEFT JOIN gene_info ON
gene_info.GeneID = gene2refseq.GeneID
WHERE gene2refseq.`RNA_nucleotide_accession.version`
LIKE ?"
db <- "gene_info.sqlite"
conn <- DBI::dbConnect(RSQLite::SQLite(), db)
x1 <- DBI::dbGetQuery(conn=conn,
statement=statement,
param=list(Håkan20210914$RNANucleotideAccession))
compared to querying «from the inside»
of sqlite, by writing your search terms
as a table first, and then calling it
statement <-
"SELECT * FROM H
LEFT JOIN gene2refseq R ON
R.`RNA_nucleotide_accession.version`
LIKE '%' || H.RNANucleotideAccession || '%'
LEFT JOIN gene_info I ON I.GeneID = R.GeneID"
DBI::dbWriteTable(conn, "H", Håkan20210914)
x2 <- DBI::dbGetQuery(conn=conn, statement=statement)
DBI::dbDisconnect(conn)
On my system (E5-2603 v4), the first
query took more than an hour, while the
second took only a few minutes ...
Do you guys know of any faster (but also
nice) way to dig around in very large
tsv files like
https://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq.gz
and
https://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz
?
Best,
Rasmus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20211006/671429f1/attachment.sig>
More information about the R-help
mailing list