[R] RSQLite slowness
Rasmus Liland
jr@| @end|ng |rom po@teo@no
Thu Oct 7 16:27:19 CEST 2021
Dear Martin Morgan,
Thanks for all those links! Yes, my
question can be characterized like that
I think, traditional way writing
a temporary table into the database and
left JOINing the others vs.
parameterized query.
A relevant example would be to first
create the database from the compressed
tsv files:
for i in gene_info gene2refseq; do
wget -c https://ftp.ncbi.nlm.nih.gov/gene/DATA/$i.gz
gzip -d $i.gz
sqlite3 gene_info.sqlite ".mode tabs" ".import $i $i"
rm $i; done
then run this R code:
H <- data.frame(Group = c(1, 1, 2, 2),
NM = c("NM_001267695", "NM_001007636",
"NM_001003706", "NM_001353612"))
conn <- DBI::dbConnect(RSQLite::SQLite(), "gene_info.sqlite")
DBI::dbWriteTable(conn, "H", H, overwrite=T)
statement.1 <-
"SELECT * FROM gene2refseq R
LEFT JOIN gene_info I ON I.GeneID = R.GeneID
WHERE R.`RNA_nucleotide_accession.version`
LIKE '%' || ? || '%'"
time.1 <- proc.time()
x1 <- DBI::dbGetQuery(
conn=conn,
statement=statement.1,
param=list(H$NM))
time.1 <- proc.time() - time.1
statement.2 <-
"SELECT * FROM H
LEFT JOIN gene2refseq R ON
R.`RNA_nucleotide_accession.version`
LIKE '%' || H.NM || '%'
LEFT JOIN gene_info I ON I.GeneID = R.GeneID"
time.2 <- proc.time()
x2 <- DBI::dbGetQuery(
conn=conn,
statement=statement.2)
time.2 <- proc.time() - time.2
DBI::dbDisconnect(conn)
saveRDS(object=x1, file="ex1_x1.rds", compress="xz")
saveRDS(object=x2, file="ex1_x2.rds", compress="xz")
saveRDS(object=list("Time x1"=list(time.1),
"Time x2"=list(time.2)),
file="ex1_t.rds", compress="xz")
I got these timings in the ex1_t.rds
file:
$`Time x1`
user system elapsed
571.731 182.006 772.199
$`Time x2`
user system elapsed
200.068 90.529 295.086
As you can see, statement.1 takes a lot
longer to process compared to
statement.2 ... When I add the rest of
the 31 search terms, the difference gets
a lot bigger like I pointed out
initially, beyond the full hour vs. only
a few minutes.
Best,
Rasmus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20211007/c25ba4ab/attachment.sig>
More information about the R-help
mailing list