[R] How to Read a Large CSV into a Database with R

Gabor Grothendieck ggrothendieck at gmail.com
Mon Nov 15 23:56:25 CET 2010


On Mon, Nov 15, 2010 at 1:28 PM, Anthony Damico <ajdamico at gmail.com> wrote:
> Hi Gabor,
>
> Thank you for your willingness to help me through this.  The code you sent
> works on my machine exactly the same way as it does on yours.
> Unfortunately, when I run the same code on the 1.3GB file, it creates the
> table structure but doesn't read in a single line [confirmed with
> sqldf("select * from mytab",dbname="mydb")]  Though I don't expect anyone to
> download it, the file I'm using is ss09pusa.csv from
> http://www2.census.gov/acs2009_1yr/pums/csv_pus.zip.  I tested both sets of
> code on my work desktop and personal laptop, so it's not machine-specific
> (although it might be Windows- or 64 bit-specific).
>
> Do you have any other ideas as to how I might diagnose what's going on
> here?  Or, alternatively, is there some workaround that would get this giant
> CSV into a database?  If you think there's a reasonable way to use the
> IMPORT command with RSQLite, that seems like it would import the fastest,
> but I don't know that it's compatible with DBI on Windows.
>
> Thanks again!
> Anthony
>
>

I tried it and the sqlite driver issued an error.   Seems like a
driver or sqlite issue.  Here are some other approaches:

1. Create an sqlite sql script using this R code and then the shell
command runs it using sqlite3 command line processor.

fn <- "ss09pusa.csv"

# read first line
DF <- read.csv(fn, nrow = 1)

# create statement
s <- sprintf("
create table %s (%s);
.separator ,
.import %s %s
", "mytab", toString(names(DF)), fn, "mytab")

# output sql file
fn.out <- paste(fn, "sql", sep = ".")
cat(s, file = fn.out)

# run import using generated sql file
# Use sqlite3 command line processor.
shell(paste("sqlite3 mydb <", fn.out))

2. Try sqldf with the h2 database rather than sqlite.  See H2 section
on sqldf home page.

library(sqldf)
library(RH2)

sqldf("create table mytab as select * from CSVREAD('ss09pusa.csv')",
dbname = "mydb2")


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list