[R] Advantages of using SQLite for data import in comparison to csv files

Magnus Torfason zulutime.net at gmail.com
Thu Jan 14 23:36:32 CET 2010


RSQLite is actually much more than a "driver for data import". It 
basically allows any manipulation of SQLite databases, including 
reading, writing, or updating and munging data within the database.


Regarding the original question of data import:

I use csv and sqlite files interchangeably to store my data, and 
converting between one and the other is generally trivial (read one 
format into a data.frame and then write it into the other).

For me, the key determinant is whether a given data set is so big that 
reading it into a data.frame with read.csv() every time I need it is a 
pain. I usually keep all my original data in csv files and write 
routines to write the contents of huge files into sqlite databases.

sqldf is a great package as well, but it definitely does not obviate the 
need to actually be able to manipulate data base files. For one thing, 
you cannot index a csv file or a data.frame. If you have to repeatedly 
select subsets of your large data set, creating an index on the relevant 
column in the sqlite table is an absolute life saver.

(However, with a 1000x20 data set, you will probably not have to worry 
about the speed of selecting from the table. Unless you need to use 
bigger data sets, the simplest way is probably to just use csv files, 
read the contents into a data.frame with read.csv, and then use sqldf on 
the data.frame if you need to do complicated subsetting)

Best,
Magnus

On 1/14/2010 2:12 AM, Juliet Jacobson wrote:
> But then why has the R driver for data import from a SQLite database
> been written?
>
> Gabor Grothendieck wrote:
>> You could look at read.csv.sql in sqldf (http://sqldf.googlecode.com) as well.



More information about the R-help mailing list