[R] importing data to SQLite database with sqldf
Stephen Tucker
brown_emu at yahoo.com
Sat Feb 21 08:36:15 CET 2009
Thanks yet another time, Gabor -
I think I am slowly understanding - particularly I was confused by persistence of connections.
So starting with some parts of your example 12,
##
gc()
rm(list=ls())
unlink("mydb")
sqldf("attach 'mydb' as new")
irishead <- file("irishead.dat")
iristail <- file("iristail.dat")
If I just wanted to merge the two files within SQL and return some part of the result, I would do
sqldf('select count(*) from (select * from irishead
union
select * from iristail)',dbname="mydb")
and the tables exist in mydb only for the duration of the computation
> sqldf("select * from sqlite_master",dbname="mydb")$name
NULL
(but why is the size of mydb > 0 afterwards, if it contains no tables...?)
...is the above the same as
sqldf('select count(*) from (select * from irishead
union
select * from iristail)',dbname=tempfile())
except that I don't create 'mydb'?
If I wanted to save the merged table (for use in a later session):
sqldf('create table fulliris as select * from irishead
union
select * from iristail',dbname="mydb")
> sqldf("select * from sqlite_master",dbname="mydb")$name
[1] fulltable
Levels: fulltable
If I want copies of all three tables,
sqldf(dbname="mydb")
sqldf('create table fulltable as select * from irishead
union
select * from iristail')
sqldf()
> sqldf("select * from sqlite_master",dbname="mydb")$name
[1] irishead iristail fulltable
Levels: fulltable irishead iristail
? ...I'll try to go figure a few more things out in the in the meantime (like using sep="\t" ?) and using connections with sqldf().
But thanks for the help!
Stephen
----- Original Message ----
From: Gabor Grothendieck <ggrothendieck at gmail.com>
To: Stephen Tucker <brown_emu at yahoo.com>
Cc: R-help <r-help at stat.math.ethz.ch>
Sent: Friday, February 20, 2009 5:22:09 AM
Subject: Re: [R] importing data to SQLite database with sqldf
Have just added an example 12 on the home page:
http://sqldf.googlecode.com
that shows an example. Note use of notation
main.mytable to refer to an existing table in the
main database (as opposed to a data frame in R).
On Thu, Feb 19, 2009 at 11:55 PM, Stephen Tucker <brown_emu at yahoo.com> wrote:
> Hi all,
>
> I am attempting to learn SQL through sqldf...
>
> One task I am particularly interested in is merging separate
> (presumably large) files into a single table without loading these
> files into R as an intermediate step (by loading them into SQLite and
> merging them there).
>
> Taking a step back, I've considered these alternatives:
>
> 1) I know if I use straight SQLite commands I might use the 'IMPORT'
> or 'INSERT INTO' command, which is not terribly flexible... (I can
> read large files line-by-line in Python and use the 'INSERT INTO'
> command, which is reasonably fast; I could do this in R as well but my
> experience with R's input/output is that it's much slower...? and
> sometimes setting up the table column definitions can be tedious if
> there are many variables).
>
> 2) dbWriteTable() with append=TRUE is very convenient except that it
> requires I load the data into R first...
>
> 3) sqldf's capability to put data directly into a database is
> something I'd like to work out.
>
> So in this case I have a series of tab-delimited text file with the
> first line being a header.
>
> For some reason I cannot seem to get it working. Combining examples 6
> and 9 from the Google Code page (and R-help archives), I tried
>
> source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R")
> (do I need it for SQLite?)
> ##
> sqldf("attach 'mydb.db' as new")
> f <- file("myexample.txt")
> attr(f,"file.format") <- list(header=TRUE,sep="\t")
> sqldf("create table myexample as select * from f",
> stringsAsFactors=FALSE,
> dbname="mydb.db")
> ## or
> f <- file(fi)
> sqldf("create table myexample as select * from f",
> stringsAsFactors=FALSE,file.format=list(header=TRUE,sep="\t"),
> dbname="mydb.db")
> ##
> sqldf("select * from myexample",dbname="mydb.db")
> gives me tables with 0 rows and 0 columns...
>
> So in any case I have a few questions:
>
> === 1 ====
> Would this be scalable to files with few GBs of data in them (I guess
> I am uncertain of the underlying mechanism for transporting data from
> the .txt file to the .db file... I see there is a call the
> dbWriteTable() internally in sqldf but through the connection)? And
> is there anything obviously doing wrong above?
>
> === 2 ===
> Since I cannot 'append' rows to existing tables in SQLite (or any SQL
> database), I think what I would have to do is to load each of the
> files into the database and then 'rbind' them into a new table using
> 'union all'? The following code is what I have (the part where I read
> in each file before dumping into the database file would hopefully be
> replaced by the method above, if it can be made to work).
>
> ## (1) create a database file
> sqldf("attach 'alltables.db' as new")
> ## (2) convenience function
> sql <- function(...) sqldf(...,dbname="alltables.db")
> ## (3) load data as separate tables
> for( fi in list.files(".","txt$") ) {
> mydata <- read.delim(fi)
> sql(sprintf("create table %s as select * from mydata",sub("\\.txt","",fi)))
> }
> rm(fi,mydata)
> ## (4) merge tables
> tablenames <- as.character(sql("select * from sqlite_master")$name)
> sql(paste("create table myfulltable as",
> paste(sprintf("select * from %s",tablenames),
> collapse=" union all ")))
> ## (5) delete separate tables since we have a merged one
> for( nm in tablenames ) sql(sprintf("drop table %s",nm))
>
> === 3 ===
> The following command
> sqldf("attach 'mydb.db' as new")
> DF <- read.delim(fi)
> sqldf("create table myexample as select * from DF",dbname="mydb.db")
>
> will usually create a .db file twice the size of the .txt file (for
> now I am playing with a files ~500KB so the .db files are around
> ~1MB). When I create a .db file using SQLite's 'import' command,
> RSQLite's dbWriteTable(), or inserting values from the same .txt file
> from Python's SQLite interface, I get .db files that are approximately
> the same size as the .txt file (~500KB). Is the larger file size for
> sqldf's method expected?
>
> Many thanks in advance!
>
> Stephen Tucker
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list