[R] How to Read a Large CSV into a Database with R
Gabor Grothendieck
ggrothendieck at gmail.com
Mon Nov 15 23:56:25 CET 2010
On Mon, Nov 15, 2010 at 1:28 PM, Anthony Damico <ajdamico at gmail.com> wrote:
> Hi Gabor,
>
> Thank you for your willingness to help me through this. The code you sent
> works on my machine exactly the same way as it does on yours.
> Unfortunately, when I run the same code on the 1.3GB file, it creates the
> table structure but doesn't read in a single line [confirmed with
> sqldf("select * from mytab",dbname="mydb")] Though I don't expect anyone to
> download it, the file I'm using is ss09pusa.csv from
> http://www2.census.gov/acs2009_1yr/pums/csv_pus.zip. I tested both sets of
> code on my work desktop and personal laptop, so it's not machine-specific
> (although it might be Windows- or 64 bit-specific).
>
> Do you have any other ideas as to how I might diagnose what's going on
> here? Or, alternatively, is there some workaround that would get this giant
> CSV into a database? If you think there's a reasonable way to use the
> IMPORT command with RSQLite, that seems like it would import the fastest,
> but I don't know that it's compatible with DBI on Windows.
>
> Thanks again!
> Anthony
>
>
I tried it and the sqlite driver issued an error. Seems like a
driver or sqlite issue. Here are some other approaches:
1. Create an sqlite sql script using this R code and then the shell
command runs it using sqlite3 command line processor.
fn <- "ss09pusa.csv"
# read first line
DF <- read.csv(fn, nrow = 1)
# create statement
s <- sprintf("
create table %s (%s);
.separator ,
.import %s %s
", "mytab", toString(names(DF)), fn, "mytab")
# output sql file
fn.out <- paste(fn, "sql", sep = ".")
cat(s, file = fn.out)
# run import using generated sql file
# Use sqlite3 command line processor.
shell(paste("sqlite3 mydb <", fn.out))
2. Try sqldf with the h2 database rather than sqlite. See H2 section
on sqldf home page.
library(sqldf)
library(RH2)
sqldf("create table mytab as select * from CSVREAD('ss09pusa.csv')",
dbname = "mydb2")
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
More information about the R-help
mailing list